You are on page 1of 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/329559416

Transfer Learning Based Intrusion Detection

Conference Paper · October 2018


DOI: 10.1109/ICCKE.2018.8566601

CITATIONS READS
3 306

5 authors, including:

Zahra Taghiyarrenani Ehsan Mahdavi


Halmstad University Isfahan University of Technology
2 PUBLICATIONS   5 CITATIONS    6 PUBLICATIONS   16 CITATIONS   

SEE PROFILE SEE PROFILE

Abdolreza Mirzaei Hamed Farsi


Isfahan University of Technology Isfahan University of Technology
65 PUBLICATIONS   597 CITATIONS    2 PUBLICATIONS   5 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Hierarchical Clustering Combination View project

A New Automated Design Method Based on Machine Learning for CMOS Analog Circuits View project

All content following this page was uploaded by Zahra Taghiyarrenani on 27 December 2019.

The user has requested enhancement of the downloaded file.


8th International Conference on Computer and Knowledge Engineering (ICCKE 2018), October 25-26
2018, Ferdowsi University of Mashhad

Transfer Learning based Intrusion Detection


Zahra Taghiyarrenani, Ali Fanian, Ehsan Mahdavi, Abdolreza Mirzaei, Hamed Farsi
Department of Electrical and Computer Engineering
Isfahan University of Technology
Isfahan, Iran, 84156–83111
z.taghiyarrenani@ec.iut.ac.ir, a.fanian@cc.iut.ac.ir, e.mahdavi@ec.iut.ac.ir, mirzaei@cc.iut.ac.ir, h.farsi@ec.iut.ac.ir

Abstract—In the past decades, machine learning based intru- samples with anomaly label are available in network N1. We
sion detection systems have been developed. This paper discloses a want to learn a predictive model in network N2 in which
new aspect of machine learning based intrusion detection systems. there are no labeled samples. We are interested to use labeled
The proposed method detects normal and anomaly behaviors in
the desired network where there are not any labeled samples samples of the network N1 in the network N2. As a result, the
as training dataset. That is while a plenty of labeled samples cost of labeling samples to provide a training dataset is reduced
may exist in another network that is different from the desired as well as improving the performance of intrusion detection
network. Because of the difference between two networks, their system in detecting anomalies. Because of the difference
samples produce in different manners. So, direct utilizing of between the networks N1 and N2, labeled samples of N1
labeled samples of a different network as training samples does
not provide acceptable accuracy to detect anomaly behaviors in directly do not provide a suitable training data set for the
the desired network. In this paper, we propose a transfer learning network N2. So, the faced challenge is how to use different but
based intrusion detection method which transfers knowledge related labeled samples from a network to learn a predictive
between the networks and eliminates the problem of providing model in another network. This challenge can be solved by the
training samples that is a costly procedure. Comparing the transfer learning methods. It is notable that may exist some
experimental results with the results of a basic machine learning
method (SVM) and also baseline method(DAMA) shows the labeled samples in a network but because of the dynamic
effectiveness of the proposed method for transferring knowledge property of the network, the samples are outdated. In such
for intrusion detection systems. a situation, the direct use of these training samples cannot
provide an accurate predictive model.
keywords- machine learning, intrusion detection, transfer In this paper, we propose a transfer learning based intrusion
learning, training samples detection method. The proposed method transfers information
about anomalies occurred in a network to another network for
I. I NTRODUCTION improving the performance of intrusion detection systems in
In recent years, the use of machine learning methods has detecting anomalies and ignoring the need to provide a training
been causing a lot of improvements in network security fields, dataset in the network.
such as intrusion detection systems [1]. These methods fall The rest of the paper is organized as follows. In section II
into three categories including supervised, unsupervised and we present a background in machine learning based intrusion
semisupervised methods. Supervised methods use available detection systems and transfer learning methods. In addition,
data set named training data set to learn a predictive model. related works to our proposd method are mentiond in this
The training data set is composed of samples defined by a section. In section III we describe the proposed transfer
number of features and at least one label. The trained model learning based intrusion detection method. Sections IV and
is used to predict the label of future samples named test V contain the results of expeiments and conclusion.
data set. That is while, preparation of the labeled dataset is a
costly procedure. Unsupervised and semi-supervised methods II. BACKGROUND AND R ELATED WORKS
do not use a full collection of the training data set. Therefore, There are plenty of researches about machine learning base
these methods reduce the cost of preparation of the labeled intrusion detection that are gathered in several surveyed papers
dataset, but consequently, their prediction accuracy is less than [1]–[3]. All of the mentioned methods in these surveyed
supervised methods. However, all of these methods for training papers have a same point of view in which training and test
a predictive model in a network, use labeled/unlabeled samples samples are sampled from the same environment. However,
that belong to the same network. the purpose of this paper is transferring information from one
Suppose anomaly samples are detected in network N1. So, network to another network that is done by transfer learning.

978-1-5386-9569-2/18/$31.00 2018 IEEE

92
In this context, the first network is named source and the TABLE I
second one is named target network. It is mentioned that MANIFOLD ALIGNMENT PARAMETERS [14]
[4] is a transfer learning method that is evaluated on the parameter decription
intrusion detection dataset. Suppose XS = (x1s , x2s ..., xns ) K the number of datasets(manifolds)
is a set of source network samples whereas xis is the ith ma the number of samples in dataset a
sample defined in d1-dimensional feature space X S. Y S = fa mapping function of dataset a
xia ith data of dataset xa
(ys1 , ys2 , ..., ysn ) is the label vector of the source network
wsa,b similarity matrix of two datasets a and b
samples. XT = (x1t , x2t ..., xm t ) is a vectort of target network
wda,b dissimilarity matrix of two datasets a and b
samples which are defined in d2-dimensional feature space wk structural matrix of dataset k
X T . Y T = (yt1 , yt2 , ..., ytm ) is the label vector of the target μ weight parameter
network samples and the final purpose is to estimate the
label vector Y T that is correspond to the target network
samples. By these definitions, the source network domain is the proposed method, DAMA is extended to a transfer learning
defined as Ds = {X S, P (XS)} where P (XS) is marginal method for the intrusion detection problem. DAMA finds a
distribution of samples. In this way, the target network domain common feature space between source and target by using a
is Dt = {X T , P (XT )} [5]. If label space of source and target manifold alignment process [15]. In the new feature space,
networks are YS and YT and predictive function of them are DAMA uses available homogenous transfer learning methods
fs (XS) and ft (XT ) respectively, source and target tasks will to transfer information. In fact, DAMA is a preprocessing for
be defined as T s = {YS, fs (XS)} and T t = {YT , ft (XT )} transfer learning. In our proposed method, we extend DAMA
respectively [5]. It is notable that the predictive function f (.) to a one-step method to solve the heterogeneous problem
can be denoted as the conditional distribution P (Y |X). Given of intrusion detection. DAMA considers every domain as a
domains and tasks of the source and target networks, transfer manifold and it is assumed that source and target domains have
learning improves the learning of predictive function in the same label space and there are a number of labeled samples
target network by using information from domain and task of in the target domain. Unifying the feature space of source and
the source network, that is while the domains and tasks may be target in DAMA is performed by minimizing equation 2 [14].
different [5]. One of the most important issues in the transfer A, B and C in the equation 2 have defined in equation 3, 4 and
learning methods is unifiying distribution of the source and 5 [14]. Parameters used in these equations are summarized in
target samples [5]. Maximum mean disparency (MMD) [6] is the table I.
one of the successful distribution distance estimators that is
used frequently in transfer learning methods [7]–[11]. c(f1 , ..., fK ) = (A + C)/B (2)
MMD estimates distance between two disrtibutions based
on Reproducing Kernel Hilbert Space (RKHS) [6] as follow:

K 
K 
ma 
mb
 
1
n
1 
m
A = 0.5 f T xi − f T xj 2 Wsa,b (i, j) (3)
M M D(XS, XT ) = φ(xis ) − φ(xjt ) 2 (1) a a
a=1 b=1 i=1 j=1
b b
n i=1 m j=1

where φ(x) : X → H and H is a universal RKHS [6]. 


K 
K 
ma 
mb
 
B = 0.5 f T xi − f T xj 2 W a,b (i, j) (4)
Transfer learning methods categorized to semisupervised a a b b d
a=1 b=1 i=1 j=1
and unsupervised transfer learning [12]. A method will be
semisupervised if there are some labeled data in the target
domain and unsupervised if there are not any labeled data 
K 
mk 
mk
 
C = 0.5μ f T xi − f T xj 2 Wk (i, j) (5)
in the target domain. In this paper, we suppose there are not k k k k
k=1 i=1 j=1
any labeled data in the desired network and just labeled data
from another network is used to learn a predictive function to In DAMA method, matrices Ws , Wd and Wk are similarity,
detect anomalies. So, our proposed method is an unsupervised dissimilarity, and structural matrices that are constructed using
method. In transfer learning context, if the feature space of available labeled samples in the source and target domains.
source and target is same, the problem will be homogenous Since our proposed method is an unsupervised heterogeneous
transfer learning otherwise the problem will be heterogeneous transfer learning based intrusion detection, we do not have
transfer learning [13]. In intrusion detection context, since any labeled data in the target network. So, constructing the
information is transferred from one network to another net- similarity and dissimilarity matrices based on the labeled data,
work, feature spaces of the source and target networks may like what the DAMA does, is impossible. Consequently, what
be different. So, our proposed intrusion detection is designed matters is the construction of these matrices. In our proposed
to transfer information in such a situation. To this end, we method, the construction of these matrices is in such a way
propose a new transfer learning method that is based on that, while the feature space of the source and target networks
manifold alignment. Domain Adaptation Manifold Alignment become same, the distribution of the samples will be as
(DAMA) is a manifold alignment method proposed by [14]. In similar as possible. It is remarkable that in this paper it is

93
supposed to have one source network(XS) and one target
network(XT ). So, in equations 2 to 5, K is 2 (X1 and X2 are
XS and XT consequently). Two mapping functions f1 and f2
are obtained from equation 2 that mapped source and target
network samples into new p-dimensional space according to
6 and 7. It is reminded that the source and target network
samples are defined in d1-dimensional and d2-dimensional
feature spaces, respectively.

XSn×d1 × (f1 )d1 ×p = XSn×p (6)

XTm×d2 × (f2 )d2 ×p = XTm×p (7)


XS and XT are the source and target samples in new p-
dimensional feature space. In equation 3, Ws is similarity
matrix that is defined as follow [14]:
⎡ src,tar ⎤
0 Wsimilarity
Ws = ⎣ ⎦ (8)
tar,src
Wsimilarity 0
In the definition of the similarity matrix, src identifies source
src,tar
network and tar identifies target network. Wsimilarity speci-
fies corresponding instances from source and target networks.
src,tar
If Wsimilarity (i, j) = 1, then two samples xis and xjt will adapt
tar,src
in new space. Similarly, the matrix Wsimilarity is defined.
Wd is dissimilarity matrix that is defined as 9.
⎡ src,tar ⎤
0 Wdis
Wd = ⎣ ⎦ (9)
tar,src
Wdis 0
src,tar Fig. 1. main structure of proposed method
Wdis specifies non-corresponding samples from source
src,tar
and target networks. If Wdis (i, j) = 1, then tow samples
i j
xs and xt will not be adapted in new space. Similarly, the traditional machine learning methods are performed to predict
src,tar
matrix Wdis is defined. the label of target network samples. It is notable that in this
W1 that is defined in equation 5 is n × n source structural paper, intrusion detection is done as a classification problem
matrix. W1 (i, j) determines the similarity of the ith and jth that classifies samples to normal and anomaly classes. So,
samples from the source domain. The more similar the two every supervised machine learning methods can be used in
samples, the smaller the distance between the two samples in the ’Traditional machine learning’ step in the figure 1. In fact,
the new space. W2 that is also defined in equation 5 is m × m samples of target network will be classified to the anomaly and
target structural matrix. W2 (i, j) determines the similarity of normal samples while there is not any labeled sample in the
the ith and jth samples from the target domain. target network. The proposed method bypass this challenge
(lack of labeled samples in the target network) by preparing
III. PROPOSED INTRUSION DETECTION METHOD
normal and anomaly samples from another network to be
First, a qualitative description of the proposed method is used in target network as training dataset. It is remembered
presented. The main proposed structure is shown in figure 1. that because of the difference between the source and target
By performing a manifold alignment, the source and target networks, there is no assumption about feature space and
data sets that are considered as two manifolds, are mapped into distributions of samples of networks to be same. But label
a common feature space. Source/target structure, similarity and spaces of two networks are the same included normal and
dissimilarity matrices specify how the samples of the source anomaly classes.
and target networks are placed in new space. It is remembered
that the definition of these matrices in this paper is different A. Proposed components of transfer learning based intrusion
from DAMA. detection
Four mentioned matrices are used as input of manifold align- Similarity matrix: For constructing the similarity matrix,
ment step. Outputs of the manifold alignment step are mapping correspond samples of the source and target networks that are
functions that map samples to the new space. In the new space, supposed to be aligned with each other must be found. As

94
regards, in transfer learning, the purpose is to unify marginal used in equation 9 are defined as:
and conditional distributions of the source and target samples. ⎡ ⎤
0 ... 0 0
So, the mapping of the samples should follow this purpose. ⎢ ... ... ... ... ⎥
When two data sets are aligned with each other, the mean of
src,tar
wdis =⎢


0 ... 0 0 ⎦
them will also be aligned with each other. Therefore, one of d(xmeant , x1t ) ... d(xmeant , xm
t ) 0
the obvious alignments is the alignment of the mean of the ⎡ ⎤
source and target samples. So, the mean of each data set is 0 ... 0 0
⎢ ... ... ... ... ⎥
added to the data set as a new sample and the corresponding tar,src
wdis =⎢


tar,src
elements of Wsimilarity src,tar
, Wsimilarity take one. We will use 0 ... 0 0 ⎦
this new data in the next steps. d(xmeans , x1s ) ... d(xmeans , xm
s ) 0
For aligning the source and target samples with the same src,tar
Dissimilarity mtatrix Wd is defined by two matrices wdis
classes to each other, it is proposed to find similarity based tar,src
and wdis and used in the equation 4. By maximizing the
on clustering methods (it is reminded that there is no labeled
equation 4, the samples of the source and target networks
sample in the target network, so constructing the similarity
will move away from each other in proportion to the distance
matrix based on labels as what is done in the baseline method
indicated in dissimilarity matrix.
[14] is impossible). In this regard, every data set is clustered
Dataset structural matrix: This matrix identifies how the
individually and cluster centroids of them are aligned to each
samples will be mapped in the new space in proportion to
other according to what will be explained.
each other. W1 and W2 matrices are calculated for the source
Clustering is done based on Kmediod method. After the
and target samples respectively. W1 (i, j) identify the similar-
clustering process, the cluster centroids in two data sets are
ity between xis and xjs and W2 (i, j) identify the similarity
calculated and added to the beginning of the corresponding
between xit and xjt . The similarity between two samples is
data sets. The next step is to find the corresponding clusters.
calculated based on k-nearest neighbor ,Nk (.) , and Euclidean
If two clusters in the source and target data sets correspond,
distance d(., .). The reverse of the Euclidean distance between
the related points of their cluster centroids in W s will change
each sample x and the k nearest neighbor of it, is calculated
to 1. Since there are no labeled data in the target network, we
and used in W1 and W2 matrices according to equation 11.
cannot specify the corresponding clusters directly. Therefore,
1
clusters correspond to each other randomly and the best one if xz i ∈ Nk (xz j)
is selected in the validation step that will be explained in next. WZ (i, j) = d(xz i,xz j)
0 otherwise (11)
The similarity matrix is constructed in the manner described.      
This matrix is used in the equation 3. The correspond samples z ∈ 1, 2 , i ∈ 1, 2, ..., n , j ∈ 1, 2, ..., m .
identified in the similarity matrix will be near to each other These two matrices are used in the equation 5. By minimizing
by minimizing the equation 2. the equation 5 the neighborhood between samples is main-
Dissimilarity matrix: Non-correspond source and target tained. It is noteworthy that k must be less than the minimum
samples that must not be aligned to each other are specified in number of data in each cluster constructed for the similarity
the dissimilarity matrix. This means that dissimilarity matrix matrix.
prevents samples to get together in the same place. To this end, Manifold alignment: Four constructed matrices Wsimilarity ,
we are interested that the samples of the source\target network Wdis , W1 and W2 are used for the manifold alignment process.
do not get close to the mean of target\source samples. So, The outputs of the manifold alignment step are two mapping
we must calculate the distance between every source\target functions f1 and f2 that map source and target samples to a
sample to mean of the source\target samples. But, the feature new space respectively. f1 and f2 functions are calculated by
space of the source and target may be different. For this reason, minimizing the equation 2. By the proposed matrices in this
calculating mentioned distances are impossible. Remember paper, the f1 and f2 functions will align source and target
that mean of each dataset was added to the end of data set as a samples so that the transferring knowledge from the source to
new sample and according to the similarity matrix, they will be target, done as well, without any labeled data in the target.
aligned with each other. Accordingly, we can estimate mean of Applying traditional machine learning methods: In this
source\target samples by mean of the target\source samples step, the source and target samples have been processed in
and calculate the distance between every source\target sam- such a way that in the new space, the machine learning meth-
ples and mean of source\target samples in order to construct ods can be applied. So, the labeled source network samples
dissimilarity matix. The distance between two samples that can be used as training samples to train a predictive function
are defined in d-dimensional feature space, is calculated by and unlabeled target network samples can be labeled by this
Euclidean distance mentioned in equation 10. function. In this paper, we use support vector machine(SVM)

d [16], as a supervised machine learning method with the

d(x1 , x2 ) =
(x1i − x2i )2 . (10) Gaussian kernel.
i=1
Validation : This step is done to find the best output of
our proposed method. It is reminded that for constructing
src,tar tar,src
Accordingly, two dissimilarity matrices wdis and wdis the similarity matrix, we need to specify correspond clusters

95
and because of the absence of labeled samples in the target TABLE II
network, we can’t specify it directly. So, because of random R ESULTS OF OUR METHOD AND SVM
alignments of clusters, we validate it. Validation step can find Our method(transfer+SVM) SVM
Source Target
the best alignment of the source and target samples. This stage Acc Recall FAR Acc Recall FAR
is performed in two steps. In the first step, the acceptable N,Probe N,Dos 98.4375 96.02 0 65.8633 13.16 0.0
alignments are found. In the second step, the best alignment N,Probe N,R2L 92.3112 89.9705 6.6265 69.2485 0 0
N,Dos N,Probe 94.7808 1 6.6502 83.2985 22.3321 0
among acceptable alignments is found. These two steps will N,Dos N,R2L 89.9171 86.1357 8.3668 70.9544 0 0
be described. N,R2L N,Dos 96.5918 91.5525 0 59.6565 0 0
In the proposed method, we are interested to find an alignment N,R2L N,Probe 96.9364 88.1023 0.7021 79.2423 0 0
of the source and target samples in a new feature space in
which we can train a predictive function by labeled source TABLE III
samples to label the target samples. When we achieve such R ESULTS OF OUR METHOD AND LWE
an adaptation, if the source and target samples are used Source Target LWE [4] Proposed method
conversely, the acceptable accuracy must be obtained. So, N,Probe,Dos N, R2L 80.24 90.01
in the first step of validation, the target samples that have N,Probe,R2L N,Dos 96.23 98.78
been labeled by the proposed method are used to train an- N,Dod,R2L N,Probe 96.36 97.47
other learning model. The source samples are relabeled by
this model. Since the actual labels of the source samples
are available, the accuracy of the constructed model can be features. Every sample has a label including normal or one
calculated. If the result obtained from the source samples of the 24 type of attack. These attacks are grouped into 4
is not acceptable, the alignment will be rejected and if an categories including Dos, R2L, U2R, and Probe. In order to
acceptable accuracy is obtained, the alignment will probably form source and target datasets, we do like [4] and use the 34
be appropriate alignment. These appropriate adaptations are features with continued values. Since the numbers of samples
evaluated in the second step of validation and the best one with U2R label are rare, these samples are eliminated. Using
is detected. The idea of this stage of validation is based the remaining samples, we construct 3 data sets. Every data
on the validation method presented in [17]. The second step set includes normal samples and one of the attacks included
of validation is performed based on source and target date Probe, DOS or R2L. Using these 3 data sets, we form 6 sets
sets distributions in the new space. MMD criteria is used to of source and target datasets that are shown in the table II.
compare the distribution of the source and target samples. It is noticeable that the type of attacks does not use in any
Among the alignments performed, an alignment is chosen data sets. This means that the source samples will consider
which results in the least distortion of distribution between with normal and attack labels and the target samples are
the source and the target. without any label. The purpose is to labeling target samples
with normal or attack. In each series of the source and target
IV. E XPERIMENTS data sets, it can be seen that although they have examples
The performance evaluation of the experiments is carried with the attack label, due to the different types of attacks,
out in terms of accuracy, true positive rate or recall and false the conditions of production the source and target samples
alarm rate (FAR) using the following equations: differ from each other. So, transfer learning algorithms can be
TP + TN evaluated by them.
Accuracy = (12) The results of the proposed method on the 6 sets of source and
TP + TN + FP + FN
target data sets are compared with SVM method and shown in
TP
Recall = (13) table II. The purpose of this comparison is to see the effect of
TP + FN transferring knowledge from source to target instead of using
FP traditional machine learning algorithms. It is obvious that in
F AR = (14)
FP + TN this experiment, the source and target samples are defined in
Where, TP means attack is detected as attack(True Positive), the same feature space.
FP means normal is detected as attack(False Positive), TN In [4], LWE method is proposed as a transfer learning
means normal is detected as normal (True Negative) and method and evaluated on KDD dataset. The source and target
FN means attack is detected as normal(false normal). For data sets are constructed in 3 manners. The comparison be-
the evaluation of the proposed method, we have to construct tween our proposed method with LWE method is summarized
the source and target network data sets. To this end, Two in table III. It is mentioned that proposed method can be
well-known data sets KDD99 [18] and Kyoto2006+ [19] are performed when feature space of the source and target network
used. In the first experiment, we are interested that the source are different that is an advantage over the LWE method.
and target samples are in the same feature space. For this In the next experiment, we are interested in transferring
purpose, we only use the KDD99 dataset to form source and knowledge where the source and target networks are different
target dataset. Each sample of KDD99 dataset specifies the and consequently defined in different feature spaces. In this
connection between two IP addresses that are defined with 41 experiment, we want to detect attacks from a network by

96
TABLE IV V. C ONCLUSION
R ESULT OF K YOTO 2006 USING KDD99
In this paper, a new intrusion detection method based on
Source Target Acc Recall FAR transfer learning is proposed . The proposed method employs
KDD(N,Dos) Kyoto2006 92.3336 92.378 7.6834 the available labeled samples from a different network in the
KDD(N,Probe) Kyoto2006 93.5973 88.7195 4.5402 desired network. These different samples are processed so that
KDD(N,R2L) Kyoto2006 95.0295 92.6829 4.0745
KDD(N,Dos,Probe) Kyoto2006 97.1356 96.9512 2.7939 can be used as a training dataset to train a learning model
KDD(N,R2L,Probe) Kyoto2006 96.8829 96.9512 3.1432 in the desired network to predict anomalous samples. The
KDD(N,Dos,R2L) Kyoto2006 97.1356 98.1707 3.2596 experiment results show that our method effectively transfers
KDD(N,Dos,R2L,Probe) Kyoto2006 99.8567 99.9651 1.8626
knowledge between different networks. Briefly, the proposed
method can eliminate the challenge of providing training
dataset and provide an acceptable accuracy to detect anomaly
behaviors in the desired network using samples from another
network.
R EFERENCES
[1] Singh J, Nene MJ. A survey on machine learning techniques for intru-
sion detection systems. International Journal of Advanced Research in
Computer and Communication Engineering. 2013 Nov;2(11):4349-55.
[2] Haq NF, Onik AR, Hridoy MA, Rafni M, Shah FM, Farid DM. Appli-
cation of machine learning approaches in intrusion detection system: a
survey. IJARAI-International Journal of Advanced Research in Artificial
Intelligence. 2015;4(3):9-18.
[3] Agrawal S, Agrawal J. Survey on anomaly detection using data mining
techniques. Procedia Computer Science. 2015 Jan 1;60:708-13.
Fig. 2. Accuracy results of Kyoto2006 using KDD99 [4] Gao J, Fan W, Jiang J, Han J. Knowledge transfer via multiple model
local structure mapping. InProceedings of the 14th ACM SIGKDD
international conference on Knowledge discovery and data mining 2008
Aug 24 (pp. 283-291). ACM.
transferring knowledge about attacks from another different [5] Pan SJ, Yang Q. A survey on transfer learning. IEEE Transactions on
knowledge and data engineering. 2010 Oct;22(10):1345-59.
network. So, we use two different data sets KDD99 and [6] Borgwardt, K.M., Gretton, A., Rasch, M.J., Kriegel, H.P., Schlkopf, B.
Kyoto2006 as source and target data sets to transfer knowledge and Smola, A.J., 2006. Integrating structured biological data by kernel
between them. Kyoto2006 dataset is traffic data from Kyoto maximum mean discrepancy. Bioinformatics, 22(14), pp.e49-e57.
[7] Pan, S.J., Kwok, J.T. and Yang, Q., 2008, July. Transfer learning via
University’s Honeypots contained 24 features. The features dimensionality reduction. In AAAI (Vol. 8, pp. 677-682).
with continued values are selected in both data sets. In this [8] Pan, S.J., Tsang, I.W., Kwok, J.T. and Yang, Q., 2011. Domain adaptation
way, 14 features of the Kyoto2006 dataset and 34 features of via transfer component analysis. IEEE Transactions on Neural Networks,
22(2), pp.199-210.
KDD99 data set are selected. The label spaces of both data [9] Long, M., Wang, J., Ding, G., Sun, J. and Philip, S.Y., 2013, December.
sets are considered as normal and attack. So, these two data Transfer feature learning with joint distribution adaptation. In Computer
sets include different features and the same labels. Therefore, Vision (ICCV), 2013 IEEE International Conference on (pp. 2200-2207).
IEEE.
they are can be used to evaluate the proposed method. 7 source [10] Long, M., Wang, J., Ding, G., Pan, S.J. and Philip, S.Y., 2014. Adap-
data sets are constructed by KDD99 dataset and Kyoto2006 tation regularization: A general framework for transfer learning. IEEE
is considered as target dataset. So, we will have 7 sets of the Transactions on Knowledge and Data Engineering, 26(5), pp.1076-1089.
[11] Tao, J., Chung, F.L. and Wang, S., 2012. On minimum distribution
source and target data sets. The purpose of this experiment is discrepancy support vector machine for domain adaptation. Pattern
to label Kyoto2006 samples using KDD99 samples. Recognition, 45(11), pp.3962-3984.
In the next experiment, the proposed method is compared [12] Blitzer J, McDonald R, Pereira F. Domain adaptation with structural
correspondence learning. InProceedings of the 2006 conference on em-
with the baseline method, DAMA. Since DAMA just unifies pirical methods in natural language processing 2006 Jul 22 (pp. 120-128).
source and target feature spaces, after performing DAMA, Association for Computational Linguistics.
one of the best in transfer learning named ARRLS is applied [13] Weiss K, Khoshgoftaar TM, Wang D. A survey of transfer learning.
Journal of Big Data. 2016 Dec 1;3(1):9.
to transfer knowledge between the networks. We use AR- [14] Wang C, Mahadevan S. Heterogeneous domain adaptation using man-
RLS method because the ARRLS is a very powerful general ifold alignment. InIJCAI proceedings-international joint conference on
framework for transfer learning that considers all differences artificial intelligence 2011 Jul 16 (Vol. 22, No. 1, p. 1541).
[15] Ham JH, Lee DD, Saul LK. Learning high dimensional correspondences
between the source and target samples including the difference from low dimensional manifolds,2003.
between marginal and conditional distributions. For unifying [16] Vapnik V. The nature of statistical learning theory. Springer science &
the feature space of the source and target samples by DAMA, business media; 2013 Jun 29.
[17] Bruzzone L, Marconcini M. Domain adaptation problems: A DASVM
we are forced to use some labeled data in the target network, classification technique and a circular validation strategy. IEEE transac-
while in our method we need not any labeled data in the tions on pattern analysis and machine intelligence. 2010 May;32(5):770-
target. So, it is the first superiority of the proposed method. 87.
[18] Cup, K. D. D. Available on: http://kdd. ics. uci.
Anyway, we compare our method without any labeled data to edu/databases/kddcup99/kddcup99. html. 2007.
DAMA+ARRLS method with 10% labeled data in the target [19] Dataset, Kyoto2006+, [Online]. Available:
dataset. The results are shown in figure 2. http://www.takakura.com/Kyoto data,

97
View publication stats

You might also like