Professional Documents
Culture Documents
net/publication/321739561
CITATIONS READS
42 1,114
3 authors, including:
Some of the authors of this publication are also working on these related projects:
Call for Papers: Secure Data Analytics for Emerging Internet of Things View project
All content following this page was uploaded by Sachin Shetty on 17 July 2018.
srv_serror_rate
learning-based approaches are designed to detect attacks, which
rely on features extracted from network data. The problem is 0.6
caused by different distribution of features in the training and 0.4
testing datasets, which affects the performance of the learned 0.2
models. Moreover, generating labeled datasets is very time- 0.0
consuming and expensive, which undercuts the effectiveness of −0.2 class
supervised learning approaches. In this paper, we propose using 1.2 normal
dos
transfer learning to detect previously unseen attacks. The main 1.0 r2l
idea is to learn the optimized representation to be invariant 0.8
serror_rate
to the changes of attack behaviors from labeled training sets 0.6
and non-labeled testing sets, which contain different types of 0.4
attacks and feed the representation to a supervised classifier. 0.2
To the best of our knowledge, this is the first effort to use 0.0
a feature-based transfer learning technique to detect unseen −0.2
variants of network attacks. Furthermore, this technique can be −0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 −0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2
used with any common base classifier. We evaluated the technique srv_serror_rate serror_rate
on publicly available datasets, and the results demonstrate the
effectiveness of transfer learning to detect new network attacks. Fig. 1. Flow-based features are changing for different attacks. DoS (Denial
of service) and R2L (Remote to local) exhibits different distributions in the
Index Terms—Network attack detection, Machine learning,
network features. A decision boundary learned by distinguishing DoS from
Transfer learning normal traffic can not be applicable in detecting R2L. Features: Serror rate
and srv serror rate describe the percent of connections with ’SYN’ errors to
I. I NTRODUCTION the same host, and connections to the same destination port in the past two
seconds respectively.
Cyber-attacks are a growing concern in military and com-
mercial networks due to the increased sophistication and
variants of attacks, such as Denial of Service (DoS) tactics
and zero-day threats. Conventional signature-based detection tackle evolving attacks, we cannot directly apply the learned
approaches are unable to address the increased variability of classifier on the testing data with a different feature space.
today’s Cyber-attacks. Thus, there is a great need for develop- To address the problems, we propose using feature-based
ing an intelligent anomaly detection techniques to learn, adapt transfer learning techniques to enhance the adaptation and
and detect threats in diverse network environments. robustness of classifiers on detecting new threats. Transfer
Machine learning techniques have been applied to improve learning is a machine learning technique that can improve
the detection rate for malicious traffic based on establishing an the prediction accuracy of the test (target) domain that has
explicit or implicit model that enables the patterns analyzed few or no labeled data by transferring the learned knowledge
to be categorized [1][2][3]. However, they are still facing from a related training (source) domain [5]. The intuition
many challenges on detecting evolving attacks. Unsupervised behind transfer learning is inspired by the ability of human
anomaly detection used to detect new threats suffers from transitive inference and learning to extend what has been
lower precision [4]. Data-driven supervised classifiers can learned in one domain to a new similar domain [6]. Our
achieve better precision, but they rely on the known malicious study is motivated by two facts: 1) Most network attacks are
samples used in the training data. Once attacks change behav- variants of known network attack families that share similar
iors, the classifiers cannot work well and need to be retrained, traits [7]. Fig.2 illustrates an example of common traits, which
as shown in Fig.1. With the continuously rising number is that network attacks need to scan a network of computers
of variants, this becomes a major bottleneck, as collecting resulting in a smaller number of connections to the same
sufficient datasets need great efforts. Moreover, when we need destination compared with the connections in normal traffic,
to incorporate new features from various network layers to and 2) As networking protocols are standard, common features
0.030
normal 0.06
dos The rest of this paper is organized as follows: Section II
0.025 0.05 reviews the related work. Section III presents the transfer
0.020 0.04
learning-enabled detection framework and Section IV details
0.015 0.03
0.010 0.02 the proposed transfer learning approach. We demonstrate our
0.005 0.01 experimental settings and results in Section V. Finally, we
0.000 0.00
−100 −50 0 50 100 150 200 250 300 350 −50 0 50 100 150 200 250 300 conclude the work in Section VI.
dst_host_srv_count dst_host_srv_count
r2l probe
0.06 0.06 II. R ELATED WORK
0.05 0.05
0.04 0.04 One of the well-known techniques for network attack de-
0.03 0.03
0.02 0.02
tection is the signature-based detection [10], which is based
0.01 0.01 on an extensive knowledge of the particular characteristics
0.00 0.00 of each attack, referred to as its signature. Another type
−50 0 50 100 150 200 250 300 −50 0 50 100 150 200 250 300
dst_host_srv_count dst_host_srv_count
of technique for network attack detection is the supervised
learning-based technique [1][11]. Both of the study suffers
Fig. 2. DoS, R2L and Probe share similar distribution on feature from lower precision on detecting the new attacks since they
dst host srv count, which is the sum of connections to the same destination.
mainly rely on the known examples of attacks. Transfer learn-
ing has recently gained attention due to several benefits: less
can be extracted for detecting network attacks. These common effort in learning a new task and generation of robust learned
attributes viewed as a common latent structure suggests a good models. Transfer learning has many successful applications in
fit for applying transfer learning techniques. natural language processing and visual recognition and it is
viewed as the promising area of machine learning. Bekerman
In this paper, we present a transfer learning-enabled detec-
et al. [2] mentioned transfer learning could improve the
tion framework to detect previously unseen variants of attacks
robustness in detecting unknown malware between non-similar
based on the learned attacks. We propose a feature-based
environment. However, they did not present much detailed
heterogeneous transfer learning approach, called HeTL, which
and formal work on this idea. The study in [12] applied an
is an improvement on HeMap [8] to implement the framework.
instance-based transfer learning approach in network intrusion
The main idea is to find optimized feature representations from
detection. However, they required plenty of labeled data from
both train and test network data. These representations can be
target domain. Gao et al. [13] proposed a model-based transfer
achieved via the spectral transformation onto a common latent
learning approach and applied it to the KDD99 cup network
space where the difference of distributions can be reduced and
dataset. Both of these instance-based and model-based trans-
the dimensions of the feature space can be equal. The new
fer learning approaches, unlike the feature-based approach,
optimized feature representations are then fed to supervised
however, are heavily depending on the assumption of the
learning algorithms such as decision trees, Naive Bayes, KNN,
homogeneous feature set. This is often not applicable to the
and SVM to train a more robust classifier. We will show that
network attack detection that typically exhibits heterogeneous
the classier trained on malicious samples from one category
features. Another advantage of feature-based approaches is its
can successfully detect new samples from a different category.
flexibility of adopting different base classifiers according to
This way, the knowledge of the attack behavior is correctly
different cases, which motivated us to focus on deriving a
transferred to the new domain. Compared to the baseline
feature-based transfer learning approach for the network attack
supervised classication without transfer learning, the proposed
detection study. To our best knowledge, this paper is the first
approach shows considerable improvements in classifying new
effort to apply feature-based transfer learning approach for
types of network threats that were not part of the training data.
improving the robustness of network attack detection.
In short, the main contributions of our work are as follows:
1) We present a framework for detecting new unseen III. OVERVIEW OF THE T RANSFER L EARNING
attacks by applying transfer learning techniques based F RAMEWORK
on the known attacks. Using transfer learning is an In this section, we present a transfer learning-enabled net-
important contribution on its own to enhance the adapt- work attack detection framework (Fig.3) to enhance detecting
ability of detection models. new network attacks by transferring the knowledge learned
2) To the best of our knowledge, we are the first effort from known network attacks. We use terms source domain and
to propose and evaluate feature-based transfer learning target domain, to denote the training and testing dataset in a
approaches for detecting new network attacks. machine learning task respectively. Both source domain and
3) We evaluate the proposed technique on a benchmark target domain data consist of normal as well as anomalous
network dataset NSL-KDD [9]. We compared our traffic records. We assume that the attack in the source
techniques with several baseline approaches. The ex- domain is already known and labeled, and attacks in the
perimental results show that the proposed technique target domain are new and unlabeled. The goal of the transfer
achieves much better performance than any other base- learning framework is to use the source domain data to help
line approach in detecting new attacks. differentiate new attacks from the target domain.
18
Milcom 2017 Track 3 - Cyber Security and Trusted Computing
and (3) Supervised classification. In the first stage, features where (∗, ∗) is a distortion function that evaluates the
are extracted from the raw network trace data with statistic difference between the original data and projected data.
calculation of the network flow. Second, a good new feature D(VS , VT ) denotes the difference between the projected data
representation is learned from both source and target domain of the source and target domain. β is a trade-off parameter
via the feature-based transfer learning algorithms. The new which controls the similarity between the two datasets.
representation will be feed to a common base classifier. The Thus, the first two elements of (1) ensures the projected
choice of a common base classifier here can be decision trees, data preserve the structures of the original data as much as
random forest trees, SVM, KNN or Naive Bayes. The classifier possible. We define (∗, ∗) as follows:
trained on known samples of one category can detect the new
samples from another different category. (VS , S) = S − VS PS 2 , (VT , T ) = T − VT PT 2 , (2)
where VS and VT are achieved by a linear transformations
IV. T RANSFER LEARNING APPROACH VIA SPECTRAL
with linear mapping matrices denoted as PS ∈ Rk×m and
TRANSFORMATION
PT ∈ Rk×n to the source and target respectively. X2
Problem formulation: We model the network attack de- is the Frobenius norm which can also be expressed as ma-
tection as a binary classification problem, which is to classify trix trace norm. In a different view, PST ∈ Rm×k and
19
Milcom 2017 Track 3 - Cyber Security and Trusted Computing
PTT ∈ Rn×k project the original data S and T into a k- Algorithm 1: Heterogeneous Transfer Learning via Spec-
dimensional space where the projected data are comparable tral Transformation
((VS , S) = SPS T − VS 2 ). But in this case, this will lead Input: T, S, β, k, learning rates α, steps = 1000
to a trivial solution PS = 0, VS = 0. We thus apply (2). It can Output: VT , VS
be viewed as a Matrix Factorization, which is widely known 1 Normalize T, S Initialize: VT , VS , PT , PS , step
as an effective tool to extract latent subspaces while preserving 2 while Optimized Function 4 not converge or step < steps
the original datas structures. do
We define D(VS , VT ) in terms of l(∗, ∗) as: 3 Update VT by gradient descent with Eq. (5),
∂G
VT = VT − α ∂V T
D(VS , VT ) = (VS , VT ), (3) 4 Update VS by gradient descent with Eq. (6),
∂G
VS = VS − α ∂V S
which is the difference between the projected target data 5 Update PT by gradient descent with Eq. (7),
and the projected source data. Hence, the projected source PT = PT − α ∂P ∂G
T
and target data are constrained to be similar by minimizing
6 Update PS by gradient descent with Eq. (8),
the difference function (3) . ∂G
PS = PS − α ∂P
Optimization S
7 step++
Substituting (2) and (3) into (1), we obtain the following
8 end
optimization objective to minimize w.r.t VS , VT , PS and PT
as follows:
TABLE I
min G(VS , VT , PS , PT ) N UMBER OF I NSTANCES IN NSL-KDD
VST VS =I,VTT VT =I
20
Milcom 2017 Track 3 - Cyber Security and Trusted Computing
Accuracy
target domains, which results in different feature spaces for 0.7
source and target domains. Traditionally, machine learning 0.6
approaches are not effective in dealing with heterogeneous
feature space. Therefore, we use manually mapping approach 0.5
to transform the target data into the source feature space
0.4
and evaluate its performance by using the classifiers, as Decision Tree Random Linear SVM Naive Bayes KNN
Forests
the baseline approaches. Fig.6 shows the comparison of the
transfer learning approach with baselines on DoS→R2L. Fig. 6. Comparison on heterogeneous spaces on DoS→R2L
TABLE II
ACCURACY OF UNSEEN NETWORK ATTACK DETECTION base classifiers without transfer learning approaches on three
Random Naive detection tasks. We have chosen decision tree (CART), random
Datasets Approach CART SVM KNN
Forests Bayes forests, linear SVM, Naive Bayes and KNN as the common
No TL 0.53 0.50 0.49 0.36 0.49 base classifiers. From the results in Table II and Table III,
DoS→R2L
HeTL 0.81 0.78 0.81 0.72 0.78
No TL 0.63 0.63 0.74 0.54 0.73 we can see that the HeTL has significantly improved the
DoS→Probe performance in all cases. Compared with the baselines, HeTL
HeTL 0.78 0.78 0.85 0.76 0.78
No TL 0.52 0.50 0.47 0.65 0.51 improved accuracy of 35% - 50% and achieved much higher
Probe→R2L
HeTL 0.73 0.75 0.75 0.71 0.78
improvements in F1 -scores. From the ROC Curve as shown
in Fig.5, we can see that HeTL improved the detection rates
against the baselines. For the second experimental setting,
TABLE III
F1 SCORE OF UNSEEN NETWORK ATTACK DETECTION
Fig.6 shows the potential benefit of taking advantage of the
heterogeneous feature space.
Random Naive The results show that HeTL has a steady performance in
Datasets Approach CART SVM KNN
Forests Bayes
No TL 0.12 0.0 0.0 0.0 0.0
the five common base classifiers, which means HeTL can work
DoS→R2L with various supervised learning classifiers. Moreover, We plot
HeTL 0.83 0.80 0.83 0.74 0.81
DoS→Probe
No TL 0.26 0.45 0.65 0.54 0.68 the distribution of the new feature representations in Fig.7 to
HeTL 0.76 0.74 0.84 0.73 0.75 demonstrate why HeTL can improve the peformance. We can
No TL 0.12 0.01 0.04 0.57 0.08
Probe→R2L see that HeTL successfully find a subspace that can make the
HeTL 0.67 0.77 0.79 0.77 0.78
distributions of different attacks similar, while still apart far
from the normal behaviors, which helps finding the decision
C. Evaluation boundary to distinguish malicious from the normal.
1) Transfer learning vs non-transfer learning: For the 2) Comparison of feature-based transfer learning ap-
first experimental setting, which aims to detect the new and proaches: We have evaluated HeTL with other feature-based
unseen attacks, we compare the performance of the proposed transfer learning approaches, which are HeMap [8] and COR-
transfer learning detection technique HeTL with common relation Alignment (CORAL) [14] on network attack detection
21
Milcom 2017 Track 3 - Cyber Security and Trusted Computing
0 VII. ACKNOWLEDGMENTS
−10
This work was supported by Office of the Assistant Sec-
retary of Defense for Research and Engineering (OASD (R&
−20 E)) agreement FA8750-15-2-0120 and Boeing Data Analytics
−30
agreement BRT-L1015-0006.
−40 R EFERENCES
−40 −30 −20 −10 0 10 20 30 40
[1] R. Perdisci, W. Lee, and Feamster, “Behavioral clustering of
HTTP-based malware and signature generation using malicious
Fig. 7. Projected data on Probe→R2L. The source data keeps the original data
network traces,” in Prof. of the 7th USENIX conference on
structure and becomes similar to the target data. They share similar decision
boundary in the projected space. Networked systems design and implementation, 2010, pp. 26–
26.
[2] D. Bekerman, B. Shapira, L. Rokach et al., “Unknown malware
detection using network traffic classification,” in Communica-
No_TL CORAL HeMap HeTL
0.9 tions and Network Security (CNS), 2015 IEEE Conference on,
Sep 2015, pp. 134–142.
0.8 [3] P. Garca-Teodoro, J. Daz-Verdejo, G. Maci-Fernndez, and
E. Vzquez, “Anomaly-based network intrusion detection: Tech-
0.7 niques, systems and challenges,” Computers and Security,
Accuracy