An Effective Network Traffic Classification Method With Unknown Flow Detection

IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. 10, NO.
2, JUNE 2013 133
An Effective Network Traffic Classification Method

with Unknown Flow Detection
Jun Zhang, Member, IEEE, Chao Chen, Student Member, IEEE, Yang Xiang, Senior Member, IEEE,
Wanlei Zhou, Senior Member, IEEE, and Athanasios V. Vasilakos Senior Member, IEEE
Abstract—Traffic classification technique is an essential tool flow level statistical properties [1], [10]. Substantial attention
for network and system security in the complex environments has been paid on the application of machine learning tech-
such as cloud computing based environment. The state-of-the- niques to flow statistical features based traffic classification
art traffic classification methods aim to take the advantages
of flow statistical features and machine learning techniques, [2]. However, the performance of the existing flow statistical
however the classification performance is severely affected by feature based traffic classification is still unsatisfied in real
limited supervised information and unknown applications. To world environments.
achieve effective network traffic classification, we propose a A number of supervised classification algorithms and un-
new method to tackle the problem of unknown applications
in the crucial situation of a small supervised training set. The
supervised clustering algorithms have been applied to net-
proposed method possesses the superior capability of detecting work traffic classification. In supervised traffic classification
unknown flows generated by unknown applications and utilizing [10], [4], [11], [12], [13], [14], [15], the flow classification
the correlation information among real-world network traffic model is learned from the labelled training samples of each
to boost the classification performance. A theoretical analysis predefined traffic class. The supervised methods classify any
is provided to confirm performance benefit of the proposed
method. Moreover, the comprehensive performance evaluation
flows into predefined traffic classes, so they cannot deal with
conducted on two real-world network traffic datasets shows that unknown flows generated by unknown applications. More-
the proposed scheme outperforms the existing methods in the over, to achieve high classification accuracy, the supervised
critical network environment. methods need sufficient labelled training data. By contrast,
Index Terms—Traffic classification, unknown flow detection, the clustering-based methods [16], [17], [18], [19], [20], [21]
compound classification, network security. can automatically group a set of unlabelled training samples
and apply the clustering results to construct a traffic classifier.
In these methods, however, the number of clusters have to
I. I NTRODUCTION
be set large enough to obtain high-purity traffic clusters. It
T RAFFIC classification technique plays an important role

in modern network security and management architec-
tures [1], [2], [3]. For instance, traffic classification is normally
is a difficult problem of mapping from a large number of
traffic clusters to a small number of real applications without
supervised information.
an essential component in the products for QoS control [4] The existing traffic classification methods suffer from poor
and intrusion detection [5], [6]. With the popularity of cloud performance in the crucial situation where supervised infor-
computing [7], [8], the amount of applications deployed on mation is insufficient and considerable unknown flows are
the Internet is quickly increasing and many applications adopt present. Particularly, more and more new/unknown applica-
the encryption techniques. This situation makes it harder to tions are emerging in the cloud computing based environment.
classify traffic flows according to their generation applications. Robust traffic classification is a big challenge in the real-
Traditional traffic classification techniques rely on checking world complex network. For instance, as the amount of new
the specific port numbers used by different applications, or in- applications quickly increases we can only collect and analyse
specting the applications’ signature strings in the payload of IP an uncompleted training data set. Moreover, if the emerging
packets [9]. These techniques encounter a number of problems applications are encrypted, it is almost impossible to analyse
in the modern network such as dynamic port numbers, data sufficient training samples through deep inspection in a limited
encryption and user privacy protection. Currently, the state- time. These observations become the motivation of our work.
of-the-art methods tend to conduct classification by analysing
In this paper, we aim to tackle the problem of unknown
Manuscript received February 15, 2012; revised August 22, 2012 and Jan- flows in a semi-supervised framework. This work considers
uary 7, 2013; accepted February 12, 2013. The associate editor coordinating very few labelled training samples and investigates flow cor-
the review of this manuscript and approving it for publication was O. Festor.
This work was supported in part by ARC Discovery Project DP1095498,
relation in real world network environment, which makes it
in part by ARC Linkage Project LP100100208, and in part by ARC Linkage different to previous works. The major contributions of this
Project LP120200266. paper are as follows.
J. Zhang, C. Chen, Y. Xiang, and W. Zhou are with the School of
Information Technology, Deakin University, Melbourne, Australia, 3125 (e- • We develop a system model to incorporate flow correla-
mail: {jun.zhang, chao.chen, yang.xiang, wanlei}@deakin.edu.au). tion into a semi-supervised method, which possesses the
A. V. Vasilakos is with Kuwait University, Kuwait (e-mail: vasi-
lako@ath.forthnet.gr). capability of unknown flow detection.
Digital Object Identifier 10.1109/TNSM.2013.022713.120250 • We propose flow label propagation to automatically label
1932-4537/13/$31.00
c 2013 IEEE
Authorized licensed use limited to: CHANNA BASAVESHWARA INSTITUTE OF TECHNOLOGY. Downloaded on April 02,2024 at 09:21:58 UTC from IEEE Xplore. Restrictions apply.
134 IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. 10, NO. 2, JUNE 2013
relevant flows from a large unlabelled dataset in order to the clusters to applications by using a payload analysis tool.
address the problem of small supervised training set. Wang et al. [20] proposed to integrate statistical feature based
• We proposed the compound classification to jointly iden- flow clustering with a payload signature matching method, so
tify the correlated flows in order to further boost the as to eliminate the requirement of supervised training data.
classification accuracy. Finamore et al. [21] combined flow statistical feature based
• We provide the theoretical justification on performance clustering and payload statistical feature based clustering for
benefit of applying these two new techniques to network mining unidentified traffic. However, the number of clusters
traffic classification. have to be set large enough to obtain high-purity traffic
We perform a large number of experiments to comprehensively clusters, which results in a problem of mapping from a
evaluate the proposed method and compare it with five state- large number of traffic clusters to a small number of real
of-the-art traffic classification methods. applications. This problem is very difficult to solve without
The remainder of the paper is structured as follows. Section knowing any information about real applications. Erman et al.
II reviews some related works. The new traffic classification [27] proposed to use a small set of supervised training data in
method is proposed in Section III which is followed by a an unsupervised approach to address the problem of mapping
theoretical analysis on the performance benefit in Section IV. from flow clusters to real applications. The proposed method
Section V presents the experimental results. Some further suffers from a large proportion of inaccurately predicted un-
discussions are provided in Section VI. Finally, the paper is known flows due to a lot of unknown clusters. Casas et al. [28]
concluded in Section VII. proposed a new semi-supervised method by firstly introducing
ensemble clustering technique into traffic classification. Their
method can achieve high performance on known classes by
II. R ELATED W ORK
combining sub-space clustering, evidence accumulation, and
The goal of network traffic classification is to classify traffic hierarchical clustering.
flows according to their generation applications. The current Some empirical study evaluated the traffic classification
research of traffic classification concentrates on the application performance of different methods for practical usage. Roughan
of machine learning techniques into flow statistical feature et al. [4] have tested NN and LDA methods for traffic classifi-
based classification methods [2]. The flow statistical feature cation using five categories of statistical features. Williams et
based traffic classification can avoid the problems suffered al. [29] compared the supervised algorithms including naive
by previous approaches such as dynamic ports, encrypted Bayes with discretization, naive Bayes with kernel density
applications and user privacy protection. estimation, C4.5 decision tree, Bayesian network and naive
Many supervised classification algorithms have been ap- Bayes tree. Kim et al. [11] extensively evaluated ports-based
plied to traffic classification by taking into account various CorelReef method, host behaviour-based BLINC method and
network applications and situations. In early works, Moore seven common statistical feature based methods using super-
and Zuev [10] applied the naive Bayes techniques to classify vised algorithms on seven different traffic traces. Lim et al.
network traffic based on the flow statistical features. Later, [30] identified the role of feature discretization for different
several well-known algorithms were also applied to traffic supervised classification algorithms during empirical study.
classification, such as Bayesian neural networks [12] and However, these empirical studies have not investigated traffic
support vector machines [15]. Erman et al. [22] proposed classification with unknown applications.
to use unidirectional statistical features to facilitate traffic Recent research shows that flow correlation can be discov-
classification in the network core. For real-time traffic clas- ered by inspecting IP packet headers. Ma et al. [31] proposed
sification, several supervised classification methods [23], [13] a payload-based clustering method for protocol inference, in
were proposed, which only use the first few packets. Other which they grouped flows into equivalence clusters using a 3-
existing works include the Pearson’s chi-Square test based tuple heuristic, i.e., the flows sharing the same destination IP,
technique [14], probability density function (PDF) based pro- destination port and transport layer protocol are generated by
tocol fingerprints [24], and small time-windows based packet the same application. Canini et al. [32] tested the correctness
count [25]. The supervised traffic classification approach can of the 3-tuple heuristic with real-world traces. In the previous
achieve high classification performance for known applications work [33], we applied the heuristic to improve unsupervised
when there is sufficient pre-labelled data, while it cannot traffic clustering. However, it is unclear why flow correlation
handle unknown applications. is helpful to traffic classification and how to effectively utilize
By contrast, the clustering-based approach has the potential flow correlation in flow statistical feature based methods.
to deal with unknown applications. It applies the unsupervised
clustering algorithms to categorize a set of unlabelled training
samples and uses the produced clusters to construct a traffic III. A C OMPOUND T RAFFIC C LASSIFICATION METHOD
classifier. McGregor et al. [16] proposed to group traffic flows
into a small number of clusters using the expectation maxi- In this section, we present the details of the proposed traffic
mization (EM) algorithm and manually label each cluster to classification method with unknown flow detection. First, a
an application. Some other well-known clustering algorithms new system model is proposed for clustering-based traffic
such as AutoClass [17], k-means [26] and DBSCAN [18] classification. Then, we introduce three important components
were also applied to traffic classification. Bernaille et al. [19] in the proposed system model; flow label propagation, nearest
applied the k-means algorithm to traffic clustering and labeled cluster based classifier and compound classification.
ZHANG et al.: AN EFFECTIVE NETWORK TRAFFIC CLASSIFICATION METHOD WITH UNKNOWN FLOW DETECTION 135
;;
Labeled Unlabeled input : small flow set A and its corresponding label
Flows Flows
set La ; large unlabelled flow set B
;; ;;
output: extended set of labeled flows E and its
Unsupervised
corresponding label set Le
Flow Label
Merge Training Set Clustering
Propagation 1 E ← A; // create output flow set
2 Le ← La ; // create output label set
Extended Flow
3 for i ← 1 to |A| do
Cluster-App
for j ← 1 to |B| do
;;
Supervised Set Mapping Clusters 4
5 check and compare 3-tuple of xai and xbj ;
6 // xai ∈ A and xbj ∈ B
;;
App-Based
NCC Classifier
7 if xai and xbj share same 3-tuple then
Flow Classes
8 put xbj into E;
9 put yai into Le ;
Testing Classification 10 // yai is determined as the
BoFs Compound
Flows Construction Classification Results label of xbj
11 end
Fig. 1: System model. 12 end
13 end
Algorithm 1: Flow label propagation
A. System Model
Fig. 1 shows a novel system model with the capability of
detecting unknown flows generated by unknown applications. {source ip, source port, destination ip, destination port,
The proposed system model can utilize flow correlation to transport protocol}. Traffic flows are constructed by inspect-
effectively improve traffic classification. At the training stage, ing the headers of IP packets captured by the system on
a small number of labelled traffic flows and a large number a computer network. For the purpose of classification, each
of unlabelled traffic flows are combined to constitute an flow can be represented using a set of flow level statistical
unsupervised training data set for traffic clustering. Mean- properties such as number of packets and packet size.
while, flow label propagation extends the labelled traffic set We start with a small set of pre-labelled traffic flows to cre-
by automatically searching for flows which are correlated to ate a supervised data set for cluster-application mapping and
the pre-labelled flows in the unlabelled traffic set. Then, the a training set for traffic clustering. Suppose the labelled flow
clusters, as the output of traffic clustering on the unsupervised set is A = {xa1 , xa2 , . . .} with the labels La = {ya1 , ya2 , . . .},
training data, are mapped into the application-oriented traffic where each flow is a real vector in the statistical feature space
classes with the assistance of labelled flows. The categorized and the dimension of the vector is determined by the number
traffic flows are used to train a traffic classifier such as of flow statistical properties. Following the Erman’s idea [27],
nearest neighbour. In the testing stage, we perform compound we randomly collect a large set of unlabelled traffic flows,
classification on the correlated flows instead of classifying B = {xb1 , xb2 , . . .}, in the target network. By merging the
individual traffic flows. labelled and unlabelled flow sets, we obtain the training set T
The proposed system model innovatively incorporates flow for traffic clustering.
correlation into the Erman’s semi-supervised approach [27].
As a semi-supervised learning, our system model involves a T =A ∪ B (1)
small number of pre-labelled traffic flows and a large number Moreover, an automatic process is applied to extend the
of unlabelled traffic flows in the training stage. Unknown labelled flow set by searching the correlated flows between
applications can be handled by assigning their flows to un- A and B.
known clusters. Since the number of pre-labelled traffic flows
is small, it will cause two problems, false unknown clusters • For each flow xa in A, the automatic process searches
and weak traffic classifier. Flow label propagation is proposed for its correlated flows in B with the same 3-tuple:
to extend the labelled traffic set, so as to significantly reduce {destination ip,destination port,transport protocol}.
the amount of clusters which are related to known applications The 3-tuple based heuristic has been applied in previous
but inaccurately labelled as unknown in the training stage. supervised and unsupervised traffic identification [31], [32],
Compound classification is proposed to classify the correlated [33], [34]. Let us take a real example to explain its motivation.
flows instead of classifying individual traffic flows in the test- For example, two flows, xa and xb , initiated by two different
ing stage, which can further boost the classification accuracy. hosts are all connecting to the same destination host at TCP
port 80 in a short period of time. These flows are very likely
generated by the same application since the destination host
B. Flow Label Propagation will not change its application in a short period of time. If xa
The proposed method aims to classify traffic flows is identified as a flow generated by a web browser, we can
based on the flow level statistical properties. A flow con- determine that xb is also generated by the web browser.
sists of successive IP packets having the same 5-tuple: This process results in a new set of labelled flows, E, which
is larger than A. input : large flow set T ; label set Le for E (E ⊂ T )

output: semi-supervised classifier f (x)
E = A ∪ {xbj : xbj ∈ B; ∃xai ∈ A is related to xbj } (2)
1 create k clusters C = {C1 , . . . , Ck } and obtain their
From (1) and (2), we have centroids {m1 , . . . , mk } by performing k-means on
E ⊂ T. (3) flow set T ;
2 for i ← 1 to k do
The rule of determining the label of flow xem ∈ E is 3 // ni labelled flows present in

yai , if xem ∈ A and xem = xai cluster i
yem = (4) 4 if ni == 0 then label Ci as ‘unknown’;
yaj , if xem ∈ B and xem is related to xaj
5 else
The detailed process for automatic flow labelling is described 6 for j ← 1 to q do
in Algorithm 1. 7 // nij flows were with label j
in cluster i
C. Nearest Cluster Based Classifier 8 compute P (Y = yej |Ci ) = nij /ni
9 end
In this paper, we chose the k-means algorithm [35], [27] 10 label Ci according to
to construct a Nearest Cluster based Classifier (NCC) due y = arg max P (yej |Ci );
to its good performance. The k-means clustering aims to j
partition the traffic flows into k clusters (k ≤ |T |), C = 11 end
{C1 , C2 , . . . , Ck }, so as to minimize the within-cluster sum 12 end
of squares: 13 foreach traffic class Mi do
k 14 Mi = {mj : Cj ∈ ωi };
arg min xj − mi , (5) 15 // a new traffic class description
C i=1 xj ∈Ci 16 end
where mi denotes the centroid of Ci and it is the mean of flows 17 Finally, construct the NCC classifier
in Ci . The traditional k-means algorithm uses an iterative f (x) = arg min( min x − m)
j m∈Mj
refinement technique. Given an initial set of randomly selected
Algorithm 2: Nearest cluster based classifier
k centroids {m01 , m02 , . . . , m0k }, the algorithm proceeds by
alternating between the assignment step and the update step
[36]. In the assignment step, each flow is assigned to the
The final maximum a posterior decision function for mapping
cluster with the closest mean.
clusters is
Cit = {xj : xj − mti ≤ xj − mtl for all l = 1, . . . , k} y = arg max P (yej |Ci ). (9)
j
(6)
In the update step, the new means are calculated to be the Consequently, all flows in Ci will be labelled as y, i.e., they
centroid of the flows in the cluster. are classified into the j-th traffic class.
1 We utilize the output of the cluster-class mapping to con-
mt+1
i = t xj . (7) struct a traffic classifier for individual testing flows. Suppose
|Ci | t
xj ∈Ci the traffic classes produced by the cluster-class mapping are
Recent research on traffic classification [18], [33] indicates denoted by Ω = {ω1 , . . . , ωq }. We represent the traffic classes
that k-means can create the high-purity traffic clusters by using the results of k-means clustering and the flow statistical
setting a large k. features. For class ωi , it can be described by a set of cluster
To support application-oriented traffic classification, we centroids,
apply a probabilistic assignment mechanism [27] to map the Mi = {mj : Cj ∈ ωi }. (10)
clusters created by k-means to the different application-based For an individual testing flow, the classification rule is
traffic classes. Note that we use the extended labelled flow set
E to replace the small set of pre-labelled flow set A for map- y = arg min( min x − m), (11)
j m∈Mj
ping. This replacement can significantly reduce the amount of
unknown clusters and produce more complete traffic classes. which follows the principle of nearest neighbour classification.
Let us study the posterior probability, P (Y = yej |Ci ), where Algorithm 2 summarizes the process of constructing a NCC
yej denotes a traffic class. P (Y = yej |Ci ) is the probability classifier. Note that the classification process needs only k
of correctly mapping cluster Ci to the j-th traffic class. We times distance calculation.
can use the flows in E to estimate the probabilities,
D. Compound Classification
P (Y = yej |Ci ) = nij /ni , (8)
At the classification stage, we applies the compound clas-
where nij is the number of flows that were assigned to cluster sification on the correlated flows modelled by a bag-of-flows
i with label j, and ni is the total number of labelled flows (BoF) instead of classifying individual traffic flows. In a BoF
that were assigned to cluster i. The clusters that do not have X = {x1 , . . . , xg }, all flows sharing the same 3-tuple are gen-
any labelled flows assigned to them are defined as unknown. erated by the same application and should belong to the same
input : Testing flows; NCC classifier f (x) will confirm that flow label propagation can help map traffic
output: Classification results clusters more accurately.
According to the Bayesian decision theory [37], the
1 construct BoFs according to flows’ 3-tuple;
maximum-a-posteriori (MAP) classifier aims to minimize the
2 // modeling correlation among
average classification error. For a traffic flow x, the optimal
traffic flows
class given by the MAP classifier is ω ∗ = arg maxω P (ω|x).
3 foreach BoF X = {x1 , . . . , xg } do
With the assumption of uniform prior P (ω), we have the
4 for i ← 1 to g do
Maximum-Likelihood (ML) classifier:
5 yxi = f (xi ) ;
6 // flow prediction using NCC ω∗ = arg max P (ω|x)
ω
classifier
P (ω) · p(x|ω)
7 if yxj indicates the i-th class then vij ← 1; = arg max
8 else ω P (x)
9 vij ← 0; = arg max p(x|ω). (14)
ω
10 end
Eq.(14) implies that the estimation of class-conditional flow
11 end
distribution determines the performance of a traffic classifier.
12 assign all flows in Xinto class ωl according to
g g Generally, we can use mixture Gaussian to model the flow
j=1 vlj = max
i=1,...,q j=1 vij ; distribution of a traffic class,
13 // compound classification based Nci
on majority vote
p(x|ωi ) = wij Gij , (15)
14 end j=1
Algorithm 3: Compound classification
where Nci is the number of Gaussian component and wij is
the weight of the j-th Gaussian Gij with the mean of μij and
2
the variance of σij ,
traffic class. The correlation information can be utilized to
2

improve the classification accuracy. This observation becomes p(x|Gij ) ∼ N μij , σij . (16)
the motivation of conducting compound classification.
We develop a compound classification method for BoFs In the proposed method, we employ the k-means cluster-
by aggregating flow predictions of the semi-supervised clas- ing algorithm which is able to produce a large number of
sifier with the majority vote rule [35]. Given a BoF X = Gaussian-like traffic clusters. Therefore, each produced cluster
{x1 , . . . , xg }, we have g flow predictions yx1 , . . . , yxg accord- is proximately corresponding to a component Gaussian of a
ing to (11). The flow predictions can be straightforwardly traffic class. For example, if Ck is mapped to ωi , it should
transformed into votes, be linked to a ωi ’s component Gaussian, say Gij . Then, the
mean and standard deviation of p(x|Gij ) can be estimated by
1, if yxj indicates the i-th class, using the flows in Ck .
vij = (12)
0, otherwise. 1
μij = x (17)
Then, the compound decision rule using the majority vote rule |Ck |
x∈Ck
is
1
assign X −→ ωl if σij = (x − μij )2 (18)
|Ck |
g g x∈Ck

vlj = max vij . (13) In this way, the estimation of class-conditional flow distribu-
i=1,...,q
j=1 j=1 tion can be equivalent to cluster mapping. According to the
It means that all flows in X are classified into ωl . Bayesian decision theory, the amount of accurately identified
clusters can affect the estimation of class-conditional flow
IV. P ERFORMANCE B ENEFIT OF F LOW C ORRELATION distribution so as to determine the performance of NCC traffic
classifier. Note that our proposed method does not need to
Why and how can the proposed approach improve traf-
directly estimate the complex flow distribution.
fic classification performance? In this section, we address
We observe that flow label propagation can improve the
this question by analysing the performance benefit of flow
accuracy of cluster mapping and increase the amount of
correlation. Specifically, the performance benefit is two-fold;
identified clusters. Given a pre-labelled flow xa for traffic
flow label propagation for NCC construction and compound
class ωi , flow label propagation would automatically label
classification for flow prediction.
the correlated flows of xa in the unlabelled flow set B.
From the statistical theory point of view, these correlated
A. Benefit to NCC Construction flows initiated from different IP addresses follow the mixture
We first study how flow label propagation is able to enhance Gaussian distribution p(x|ωi ) in the feature space. It means
the NCC classifier. From a theoretical perspective, it will that the correlated flows of xa could be located in any cluster
show that the capability of NCC is dependent on the accuracy corresponding to a component Gaussian of ωi . There are two
of mapping traffic clusters to traffic classes. Further analysis different cases about the locations of these correlated flows.
0.35 0.25
0.3
0.2
0.25
Error Probability
Error Probability
Error Probability
0.15 0.1
0.2
0.15 0.1
0.1 0.05
0.05
0.05
0 0 0
0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100
Bag Size Bag Size Bag Size
(a) p = 0.3 (b) p = 0.2 (c) p = 0.1
Fig. 2: Error probability.
• First, some correlated flows of xa may fall into a cluster assigned to class ω ∗ provided the a posteriori probability of
Ci which contains a number of pre-labelled flows, which that interpretation is maximum, i.e.,
can be described as
ω ∗ = arg max P (ω|x1 , . . . , xg ). (21)
ω
[(B ∩ E ∩ Ci = ∅) & (A ∩ Ci = ∅)] = 1. (19)
All flows in BoF X will be assigned into traffic class ω ∗ in
In this case, the number of labelled flows ni in cluster our method. Kittler et al. have proved that a set of combi-
Ci will increase due to adding the correlated flows of xa . nation rules including the majority vote can be derived from
A bigger ni will result in a more accurate estimation of the Bayesian decision theory [38]. Specifically, the complex
P (Y = yej |Ci ), so as to map Ci to a traffic class more prediction P (ω|x1 , . . . , xg ) can be computed by combining g
accurately according to Eq.(8). simple predictions, P (ω|x1 ), . . . , P (ωk |xg ). More generally,
• Second, some correlated flows of xa may be located in a classification of a BoF can be addressed by aggregating the
cluster Cj which does not contain any pre-labelled flow, flow predictions produced by a conventional classifier. In this
which can be described as paper, the aggregated classifier fA (X) can be expressed as
[(B ∩ E ∩ Ci = ∅) & (A ∩ Ci = ∅)] = 1. (20) fA (X) = Θx∈X (fncc (x)), (22)
In this case, the cluster Ci was inaccurately determined where fncc (x) denotes the NCC classifier and Θ is the
as unknown without flow label propagation, while it can majority vote method.
be identified due to the present of xa ’s correlated flows. Let us investigate why the majority vote based compound
Therefore, the amount of accurately identified traffic classification method outperforms the individual NCC classi-
clusters will increase significantly. fier. Consider the NCC traffic classifier with an error rate of
p < 12 . Since the flows in a BoF are diverse, the classifier
In summary, the capability of NCC classifier is based on
will make different errors in predicting the class of different
the amount of accurately identified traffic clusters which can
flows. We first talk about a two-class classification problem.
affect the estimation of class-conditional flow distribution
For the majority vote to be incorrect, it requires that g/2 or
from a theoretical perspective. Flow label propagation can
more flows in X are classified incorrectly. The probability that
be helpful to enhance NCC classifier through providing more
r flows are classified incorrectly is
labeled flows which are used to identify more traffic clusters
accurately. P (r errors) = Crg · pr (1 − p)g−r
g!
= pr (1 − p)g−r (23)
r!(g − r)!
B. Benefit to Flow Prediction
Therefore, the probability that the majority vote is incorrect is
Is compound classification optimal for BoF classification
g

from a theoretical perspective? How can compound classifi- g!
cation with majority vote rule improve the accuracy of flow P2 (error) = pr (1 − p)g−r . (24)
r!(g − r)!
r=(g+1)/2
prediction? We further study these questions and confirm the
advantage of compound classification. For multi-class classification problem, the majority vote could
Consider a traffic classification problem where pattern still produce a correct prediction even if more than g/2 flows
(BoF) X is to be assigned to one of the q possible traffic are incorrectly classified. It means that the error probability of
classes. Suppose we have a classifier, but the given pattern can the majority vote for multi-class is less than P2 (error). Our
be represented by using g distinct measurement vectors (flows empirical study shows that the typical size of BoF is large
in this BoF). This is a typical classifier combination architec- and the NCC classifier has a low error rate, so the majority
ture of repeated measurements [35]. In the measurement space, vote based compound classification can further improve the
each class ω is modelled by the probability density function performance. Fig. 2 shows that the error probability P2 (error)
p(x|ω) and its priori probability of occurrence is denoted decreases quickly as BoF size increases. When p < 0.5, we
by P (ω). According to the Bayesian decision theory, given always have
measurements xi , i = 1, . . . , g, the pattern (BoF) X should be P2 (error) < p. (25)
2% 4%1%
bt 15% 6%
15%
dns
bt
ftp < 1%
1% dns 33%
http
http
imap
15% smtp
msn 15%
ssh
pop3
ssl 38%
smtp
ssh 3%
ssl 3%
xmpp 15%
15% 19%
(a) isp (b) wide
Fig. 3: Two real-world traffic datasets.
For example, given a BoF with g = 30 and the semi- TABLE I: Unidirectional statistical features
supervised classifier with an error rate p = 0.3, the probability Type of features Feature description Number
of the majority vote being incorrect is 0.006, which is much Packets Number of packets transferred 2
in unidirection
less than the individual classification error rate. Bytes Volume of bytes transferred 2
in unidirection
V. E XPERIMENTAL E VALUATION Packet Size Min., Max., Mean and Std Dev. of 8
packet size in unidirection
In this section, we perform a large number of experiments Inter-Packet Min., Max., Mean and Std Dev. of 8
Time Inter Packet Time in unidirection
to comprehensively evaluate the proposed method. On the one Total 20
hand, the proposed method is compared to five state-of-the-art
methods for traffic classification: C4.5, k-NN, Bayes Network,
Naive Bayes (NB) [30], and Erman’s semi-supervised method isp dataset consists of 200k flows randomly sampled from
[27]. On the other hand, we deeply investigate the parameter 11 major classes. Fig. 3 shows the details of the datasets.
impact and unknown detection performance in order to ex- To avoid the dominating classes, we randomly select up
plain why the proposed method has superior performance. In to 30k flows from every big class. In the experiments, 20
all experiments, we consider the unknown applications and unidirectional flow statistical features are extracted and used
the small supervised training set. To simulate the problem to represent traffic flows, which are listed in Table I. In the
of unknown applications, we manually set some identified experiments, each dataset is separated into two parts: one
applications as unknown in the experiments. half for training and the other one for testing. We report the
To establish ground truth for the testing datasets, we have average performance of 100 random runs for each case.
developed a deep packet inspection (DPI) tool that matches
regular expression signatures against content of flow payload A. Comparison with Five State-of-The-Art Methods
[33]. A number of application signatures are developed based In this section, we first carry out a number of experiments
on previous experience and some well-known tools such as to study the impact of unknown applications to the supervised
l7-filter (http://l7-filter.sourceforge.net) and Tstat (http://tstat. classification methods. Then, the proposed method is com-
tlc.polito.it). Also, several encrypted and new applications are pared with five state-of-the-art traffic classification methods
investigated by manual inspection of the unidentified traffic. when unknown applications are considered in the traffic clas-
There are two network traffic traces, wide sification experiments. The five competing methods include
(http://mawi.wide.ad.jp/mawi/) and isp [33]. The wide C4.5 [29], KNN [4], Naive Bayes [10], Bayesian Network
trace is taken at US-Japan trans-Pacific backbone line (a [29], and Erman’s semi-supervised method [27].
150Mbps Ethernet link) that carries commodity traffic for Two common metrics are used to measure the classification
WIDE organizations. The original trace collected on March performance [11], overall accuracy and F-Measure.
2008 is 72-hours-long. Forty bytes of application layer • Overall accuracy is the ratio of the sum of all correctly
payload are kept for each packet while all IP addresses classified flows to the sum of all testing flows.
are anonymized in wide trace. The isp trace is captured
number of correctly classif ied f lows
by using a passive probe at a 100Mbps Ethernet edge link Accuracy =
of an Internet Service Provider (ISP) located in Australia. number of testing f lows
(26)
Full packet payloads are preserved in isp trace without any
This metric is used to measure the accuracy of a classifier
filtering. The original isp trace is 7-day-long starting from
on the whole testing data.
November 27 of 2010. We identity about 182k traffic flows
• F-measure is calculated by
from the wide trace which comprises the wide testing dataset.
All flows in the wide dataset are categorized into 6 classes. In 2 × precision × recall
F − M easure = (27)
the wide dataset, there is only a small number of classes and precision + recall
the HTTP flows dominate the whole dataset. The other one where precision is the ratio of correctly classified flows
is the isp dataset, which is created from our isp trace. The over all predicted flows in a class and recall is the ratio
Our Approach Erman’s Semi−supervised Naive Bayes C4.5 Bayes Network k−NN
1 1
without unknown
0.9
one−class unknown
0.8 two−class unknown 0.8
three−class unknown 0.7

Accuracy
0.6 0.6
Accuracy
0.5
0.4 0.4
0.3
0.2 0.2
0.1
0 0
Naive Bayes C4.5 Bayes Network Knn 1−class unknown 2−class unknown 3−class unknown
(a) on wide (a) wide
1 1
without unknown
one−class unknown 0.9
0.8 two−class unknown 0.8

three−class unknown 0.7
Accuracy
0.6 0.6
Accuracy
0.5
0.4 0.4
0.3
0.2 0.2
0.1
0 0
Naive Bayes C4.5 Bayes Network Knn 1−class unknown 2−class unknown 3−class unknown
(b) on isp (b) isp

Fig. 4: Impact of unknown applications. Fig. 5: Overall accuracy.
1
of correctly classified flows over all ground truth flows
in a class. F-Measure is used to evaluate the per-class
0.8
performance.
Fig. 4 reports the overall accuracy of four supervised
F−measure
0.6
methods when unknown applications are present in the traffic

dataset. In the experiments, we use 100 pre-label flows for 0.4
each traffic class. On the wide dataset, the settings of unknown

0.2
applications are: smtp for the case of ‘one-class unknown’;
smtp and dns for the case of ‘two-class unknown’; smtp, dns
0
and ssh for the case of ‘three-class unknown’. On the isp bt http ssh ssl unknown
dataset, the settings of unknown applications are: bt for the

case of ‘one-class unknown’; bt and pop3 for the case of ‘two- Fig. 6: F-Measure on wide.
class unknown’; bt, pop3 and smtp for the case of ‘three-class
unknown’. The results show that unknown applications can
significantly affect the classification accuracy of supervised but it affects the classification accuracy slightly. The proposed
methods. The reason is obvious that the supervised methods method displays the consistent superior performance with
do not consider unknown applications and they inaccurately the comparison to other competing methods when a small
classify all flows generated by unknown applications into the set of pre-labelled flows is available. The reason is that the
predefined known classes. Note that the settings of unknown proposed method possess the powerful capability of dealing
applications can be changed while different settings only with unknown applications by identifying their generation
lead to different degrees of negative influence of unknown flows. By taking unknown applications into account, Erman’s
applications. semi-supervised method outperforms other supervised meth-
Fig. 5 shows the overall accuracy of our proposed method ods when 2 unknown applications are present on the isp
is significantly higher than that of other methods when 50 dataset. However, its performance is not robust since a small
labelled training flows are available for each known class. pre-labelled flow set limits its ability to map sufficient clusters
For example, the accuracy of the proposed method is about to applications. In contrast, our proposed method is able to
10 percent over the semi-supervised method and about 25 utilize flow correlation to address the problem suffered by
percent over other four supervised methods when 2 unknown Erman’s semi-supervised method.
applications are considered on the isp dataset. We change the Fig. 6 and Fig. 7 show the F-Measures per-class in the con-
size of pre-labelled flows from 50 to 140 in the experiments, sideration of 2 unknown applications when 50 labelled training
1
0.9
0.8
0.7
0.6
F−measure
0.5
0.4
0.3
0.2
0.1
0
dns ftp http imap msn smtp ssh ssl xmpp unknown
Fig. 7: F-Measure on isp.
flows are available for each known class. The unknown flows defined as the ratio of the sum of flows correctly identified
identified by our method and Erman’s method form a special (including flows generated by unknown applications) over the
class in the experiments. Generally speaking, the results show sum of all training flows after cluster mapping,
that the proposed method can improve the F-Measure in every
class. For example, the F-Measure of the proposed method is number of correctly identif ied f lows
T raining P urity =
higher than the second best one, C4.5, about 15 percent for number of all training f lows
(29)
imap class on the isp dataset. The F-Measure of the proposed
Training purity is used to measure the effect of training
method is over 20 percent higher than the second best one,
information from the purity point of view.
Erman’s method, for bt class on the wide dataset. The reason
is that flow correlation can enhance the NCC classifier and 1) Impact of Number of Pre-Labelled Flows: In this paper,
improve flow prediction in the proposed method. Without we consider a small set of pre-labeled flows since manually
using flow correlation technique, Erman’s semi-supervised labeling traffic flows is time-consuming. The experiments
method will inaccurately produce a large number of unknown show that the effect to traffic classification performance is not
flows which results in poor classification performance. significant when the number of pre-labelled flows changes in
In summary, the proposed method can effectively deal with a small range of 50 per-class and 100 per-class. We pay more
unknown applications. The classification performance of the attention on the study of propagation rate and training purity
proposed method is significantly better than five state-of-the- with the setting of two unknown applications, 400 clusters and
art traffic classification methods. 5k unlabelled flows.
Fig. 8 reports the propagation rate of the proposed method
B. Impact of Parameters and Erman’s semi-supervised method versus various number
We further investigate the impact of parameters to the of pre-labelled flows. Our proposed method always displays
performance of the proposed method. Since the results on two higher propagation rate than that of Erman’s method. It is
datasets are similar, we only report the results on isp dataset. because our proposed method employs flow label propagation
In this paper, two new measures are introduced to conduct to accurately label a lot of flows from the unlabelled flow set
a quantitative study on the impact of parameters. Propagation B. More accurately labelled flows can be helpful to cluster
rate is defined as the ratio of the sum of flows labelled as mapping so as to enhance the NCC classifier. The propagation
known for training NCC to the sum of pre-labelled flows in rates of both methods decrease as the number of pre-labelled
A. flows increases on two datasets. It is reasonable since the
number of known f lows f or N CC propagation rate is inversely proportional to the number of
P ropagation Rate = pre-labelled flows according to Eq. 28.
number of prelabelled f lows
(28) Fig. 9 reports the training purity of the proposed method
Propagation rate is used to measure the effect of training and Erman’s semi-supervised method versus various number
information from the scale point of view. Training purity is of pre-labelled flows. The results show that our proposed
9 1
Our Approach
8 Erman’s Approach
7 0.9
Propagation Rate
Overall Accuracy
5 0.8
4
0.7
3
2
0.6
1
Our Approach
40 60 80 100 120 140 160 0.5
Pre−labeled Flows Per Class 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Unlabeled Flows for Training
Fig. 8: Propagation rate vs number of pre-labeled flows.
Fig. 10: Overall accuracy vs number of unlabelled flows.
1 16
Our Approach
0.8
12
Propagation Rate
Training Purity
0.6 10
8
0.4 6
4
0.2
Our Approach 2
Erman’s Approach
0 0
0 2000 4000 6000 8000 10000 12000
50 60 70 80 90 100 110 120 130 140 Unlabeled Flows for Training
NO. of Pre−labeled Flows Per Class
Fig. 9: Training purity vs number of pre-labeled flows. Fig. 11: Propagation rate vs number of unlabelled flows.
that of Erman’s method whatever the size of unlabelled flows

method can achieve higher training purity than that of Erman’s
is.
method. The reason is that flow label propagation can benefit
to map clusters to applications more accurately. Therefore, the 3) Impact of Number of Clusters: Fig. 13 reports the
overall accuracy versus number of traffic clusters produced
proposed method can provide more accurate information for
training NCC than that of Erman’s method. The training purity by k-means with the experimental setting of 2 unknown
of both method will slightly rise when more pre-labelled flows applications, 100 pre-labelled flows for each class, and 5k
unlabelled flows. The results show that the overall accuracy
are available. It is because more pre-labelled flows can be used
to accurately identify more known clusters. of both method will first increase and then decrease as the
number of clusters increases. There exists an optimal k for
2) Impact of Number of Unlabelled Flows: We also in-
different methods on different traffic datasets. However, the
vestigate the impact of number of unlabelled flow in training
proposed method is consistently superior to Erman’s method
stage to the proposed method. The experimental setting are 2
and k = 400 is a good choice for the two datasets.
unknown applications, 100 pre-labelled flows for each class
Fig. 14 reports the propagation rates of two methods when
and 400 traffic clusters.
the number of clusters changes from 100 to 1000. According
Fig. 10 shows the overall accuracy when the number of to the experimental results, one can realize that the proposed
unlabelled flows changes from 1k to 10k. The proposed
method has higher propagation rate than that of Erman’s
method significantly outperforms Erman’s method. The overall method and the propagation rates of both methods slightly
accuracy of the proposed method is higher than that of Erman decrease as the number of clusters increases. Furthermore, the
method about 15 percent on the isp dataset. The number of training purities of both method are also influenced by the
unlabelled flows can slightly affect the overall accuracy of both number of clusters as shown in Fig. 15. However, the proposed
methods. According to the experimental results, 5k unlabelled
method displays more robust capability over Erman’s method
flows are good enough for traffic classification on the two to the change of the number of clusters.
real-world traffic datasets.
Fig. 11 shows that the propagation rate of the proposed
method is always higher than that of Erman’s method. The C. Unknown Traffic Detection
propagation rates of both methods are proportional to the num- We carry out a number of experiments to evaluate the capa-
ber of unlabelled flows since more unlabelled flows available bility of unknown traffic detection which is proposed to tackle
means more unlabelled flows can be identified during cluster the problem of unknown applications in the proposed method
mapping. While the training purity may have a little decrease and Erman’s method. Two new measures are introduced to
as the number of unlabelled flows rises as shown in Fig. 12. study the accuracy of unknown traffic detection quantitatively.
However, the proposed method has better training purity than False detection rate is defined as the ratio of the sum of known
1 Our Approach
6
Erman’s Approach
0.8 5
Propagation Rate
Training Purity
4
0.6
3
0.4
2
0.2 1
Our Approach
Erman’s Approach
0 0
0 200 400 600 800 1000 1200
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 NO. of Traffic Clusters
Fig. 12: Training purity vs number of unlabelled flows. Fig. 14: Propagation rate vs number of clusters.
1 1
0.9 0.8
Training Purity
Overall Accuracy
0.8 0.6
0.7
0.4
0.2
0.6 Our Approach
Our Approach Erman’s Approach
Erman’s Approach 0
0.5 100 200 300 400 500 600 700 800 900 1000
100 200 300 400 500 600 700 800 900 1000 NO. of Traffic Clusters
NO. of Traffic Clusters
Fig. 15: Training purity vs number of clusters.

Fig. 13: Overall accuracy vs number of clusters.
flows inaccurately detected as unknown to the sum of known with unknown applications more effectively than Erman’s
flows. method.
2) Detection with Varying Number of Unlabelled Flows:
F alse Detection Rate (30) Fig. 17 shows that the number of unlabelled flows does not
number of unkonown f lows inaccurately detected affect false detection and true detection of the two methods
=
number of all testing known f lows significantly. The experiment setting is 2 unknown applica-
tions, 400 traffic clusters, and 100 pre-labelled flows per-class.
A low false detection rate means a better performance of The proposed method possesses the capability of obtaining
unknown traffic detection. True detection rate is defined as lower false detection rate and higher true detection rate than
the ratio of the sum of flows accurately detected as unknown Erman’s method. For instance, the false detection rate of
to the sum of all testing unknown flows. Erman’s method is higher than 10% on the isp dataset while
our method can achieve less than 3% false detection rate. The
T rue Detection Rate (31)
number of unknownf lows accurately detected difference between these method is more about 13 percent.
= Meanwhile, the true detection rate of our method is about 5
number of all testing unknown f lows
percent higher than that of Erman’s method.
A high true detection rate means a better performance of 3) Detection with Varying Number of Clusters: We also
unknown traffic detection. investigate the impact of the number of clusters produced
1) Detection with Varying Number of Labelled Flows: by k-means during a number of experiments. The experiment
Fig. 16 reports the detection accuracy of two methods versus setting is 2 unknown applications, 100 pre-labelled flows per-
number of pre-labelled flows. The experiment setting is 2 class, and 5k unlabelled flows. Fig. 18 shows that the number
unknown applications, 400 traffic clusters, and 5k unlabelled of clusters can dramatically affect the false detection rate and
flows. The results show that the number of pre-labelled flows true detection rate of the two methods. Generally, the false
has great influence to false detection and nearly no influence detection rate of Erman’s method increases quickly as the
to true detection. The proposed method demonstrates a much number of clusters rises while the change of our method is
lower false detection rate than that of Erman’s method from 20 slight. The true detection rate of both methods increases fast
percent to 30 percent with different numbers of pre-labelled when the number of clusters increases from 100 to 400. After
flows. The proposed method also displays higher true detection that, the increase becomes slow. There is a trade-off between
rate than Erman’s method no matter how many pre-labelled false detection and true detection when adapting the number
flow are available. Lower false detection rate and higher true of clusters. One can see that our proposed method possesses
detection rate guarantee that the proposed method can deal much lower false detection rate and higher true detection rate
0.35
Our Approach Our Approach
0.3 Erman’s Approach 0.3 Erman’s Approach
False Detection Rate

0.25 0.25
0.2 0.2
0.15 0.15
0.1 0.1
0.05 0.05
0 0
50 60 70 80 90 100 110 120 130 140 100 200 300 400 500 600 700 800 900 1000
No. of Pre−labeled Flows Per Class NO. of Traffic Clusters
1 1
0.8 0.8
True Detection Rate
True Detection Rate

0.6 0.6
0.4 0.4
0.2 0.2
Our Approach Our Approach
Erman’s Approach Erman’s Approach
0 0
50 60 70 80 90 100 110 120 130 140 100 200 300 400 500 600 700 800 900 1000
No. of Pre−labeled Flows Per Class No. of Pre−labeled Flows Per Class
Fig. 16: Detection accuracy vs number of pre-labelled flows. Fig. 18: Detection accuracy vs number of clusters.
VI. D ISCUSSIONS
Our Approach This section provides some discussions on the empirical
0.3 Erman’s Approach study and flow label propagation.
0.25
0.2 A. Remarks on Empirical Study

0.15
We empirically studied the superior performance of the
proposed method during a large number of experiments on
0.1
two different real-world traffic datasets. The empirical study
0.05 and the theoretical analysis in Section IV can supplement
0
to each other. Some major observations and analysis on the
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 experimental results are as follows.
• The proposed method outperforms previous supervised
1
traffic classification methods when unknown applications
0.8 are present in the real-world traffic dataset. It is because
True Detection Rate
the previous supervised methods would classify unknown

0.6 flows into pre-defined known classes which leads to poor
classification performances especially when the number
0.4 of unknown applications is large. However, the proposed
method can identify unknown flows and classify known
0.2
Our Approach flows accurately.
Erman’s Approach • The proposed method is superior to Erman’s semi-
0
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Unlabeled Flows for Training supervised method in terms of accuracy and F-Measure.
The reason is that the proposed method can effectively
Fig. 17: Detection accuracy vs number of unlabelled flows.
incorporate flow correlation information into the classi-
fication process. The benefit of flow correlation includes
flow label propagation for enhancing NCC classifier and
compound classification for improving flow prediction.
when compared to Erman’s method. Moreover, it is easy to • We observed that the parameters have different impacts.
find an optimal number of clusters for our proposed method For the number of pre-labelled flows, the changing be-
to obtain low false detection rate and high true detection rate. tween 50 to 140 has not significant influence to overall
For example, 400 clusters is a good choice for our proposed accuracy and training purity, but it can affect the prop-
method. However, Erman’s method is hard to achieve it. agation rate. The number of unlabelled flows shows a
little influence to overall accuracy and training purity, networks, for example, with training on wide and testing on
and remarkable positive influence to propagation rate. isp. In this work, all techniques are proposed to deal with
The influence of the number of clusters is significant to unknown applications on a network.
overall accuracy, propagation rate, and training purity.
• The proposed method demonstrated its superior capa- VII. C ONCLUSION
bility of unknown flow detection. One can see that the
number of pre-labelled flows and the number of clusters Traffic classification encounters more critical problems in
can affect the performance of unknown flow detection current advanced network and system, especially in cloud
especially false detection, while the number of unlabelled computing environment. In this paper, we proposed a novel
flows did not show remarkable influence to unknown traffic classification method to address the problem of un-
detection. Generally speaking, the proposed method out- known applications in the crucial situation of small super-
performs Erman’s method in terms of unknown flow vised training data. The proposed method introduced two new
detection. The key factor is that the proposed method techniques to sufficiently utilize flow correlation information.
can achieve much lower false detection rate than Erman’s One is flow label propagation which can automatically ac-
method whatever the parameters are. The reasons are that curately label more unlabelled flows to enhance the capa-
flow label propagation effectively reduced the number of bility of nearest cluster based classifier (NCC). The other
inaccurate unknown clusters, and compound classifica- is compound classification which can combine a number
tion further improved the detection accuracy. of flow predictions to make more accurate classification of
BoFs. We confirmed the superior performance of the proposed
method from both theoretical and empirical studies. A large
B. Remarks on Flow Label Propagation number of experiments were carried out on two real-world
A strength of the proposed approach is the flow label prop- traffic datasets to evaluate the proposed method. The results
agation, which is, basically, an automatic labelling procedure showed that the proposed method outperforms five state-of-
based on the 3-tuple. As analysed in Section IV, the flow label the-art traffic classification methods including C4.5, kNN,
propagation is one reason why the proposed approach works Naive Bayes, Bayesian Netowrk, and Erman’s semi-supervised
so well when only very few supervised samples are available. method. During more specific comparison with Erman’s semi-
The other reason is of compound classification which can supervised method, the proposed method displayed more
jointly classify correlated flows more accurately. In fact, robust ability to various parameters and superior unknown
conventional supervised methods perform very badly even detection performance especially on false detection.
without unknown traffic if the size of supervised training set
is too small. Since the flow label propagation is independent R EFERENCES
to classification algorithms, in the future, we can use it as
[1] T. Karagiannis, K. Papagiannaki, and M. Faloutsos, “BLINC: multilevel
a pre-processing step with any supervised methods in order traffic classification in the dark,” SIGCOMM Comput. Commun. Rev.,
to increase the size of the supervised training set. However, vol. 35, pp. 229–240, Aug. 2005.
it should be pointed out that the key concern of this paper [2] T. T. Nguyen and G. Armitage, “A survey of techniques for internet
traffic classification using machine learning,” IEEE Commun. Surveys
is unknown traffic. The flow label propagation can not deal & Tutorials, vol. 10, no. 4, pp. 56–76, Fourth Quarter 2008.
with unknown traffic straightforward. We proposed a semi- [3] A. Finamore, M. Mellia, M. Meo, and D. Rossi, “KISS: stochastic packet
supervised scheme by combining flow label propagation and inspection classifier for UDP traffic,” IEEE/ACM Trans. Netw., vol. 18,
no. 5, pp. 1505–1515, Oct. 2010.
compound classification to effectively handle unknown traffic.
[4] M. Roughan, S. Sen, O. Spatscheck, and N. Duffield, “Class-of-service
For some special cases, it is critical for flow label propa- mapping for QoS: a statistical signature-based approach to IP traffic
gation to require that the labelled flows and unlabelled flows classification,” in Proc. 2004 ACM SIGCOMM Conference on Internet
Measurement, pp. 135–148.
must be captured on the same network in a short period of [5] Y. Xiang, W. Zhou, and M. Guo, “Flexible deterministic packet marking:
time. A detailed reasoning is as follows. Flow label propaga- an IP traceback system to find the real source of attacks,” IEEE Trans.
tion involves looking at 3-tuple including IPs and ports. Hence, Parallel Distrib. Syst., vol. 20, no. 4, pp. 567–580, Apr. 2009.
during, say a 1-hour period we may catch many BitTorrent [6] Z. M. Fadlullah, T. Taleb, A. V. Vasilakos, M. Guizani, and N. Kato,
“DTRAB: combating against attacks on encrypted protocols through
peers swarming together, and if we label one flow, due to the traffic-feature analysis,” IEEE/ACM Trans. Netw., vol. 18, no. 4, pp.
meshedness nature of the BitTorrent overlay we can succeed 1234–1247, Aug. 2010.
in catching them all. However, if we move the next day, the [7] R. Buyya, C. S. Yeo, S. Venugopal, J. Broberg, and I. Brandic, “Cloud
computing and emerging it platforms: vision, hype, and reality for
peer whose initial flow we have labelled may be off-line, so no delivering computing as the 5th utility,” Future Generation Computer
other peer will connect to it and label propagation is useless for Systems, vol. 25, no. 6, pp. 599–616, June 2009.
this case. Same for servers, say, and FTP server is popular on [8] M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. Katz, A. Konwinski,
G. Lee, D. Patterson, A. Rabkin, I. Stoica, and M. Zaharia, “A view of
a network, then you will correctly propagate flow information cloud computing,” Commun. ACM, vol. 53, pp. 50–58, Apr. 2010.
to all other hosts connecting to that server. However, if the [9] P. Haffner, S. Sen, O. Spatscheck, and D. Wang, “ACAS: automated
server changes its IP address (without changing its DNS construction of application signatures,” in Proc. 2005 ACM SIGCOMM
Workshop on Mining Network Data., pp. 197–202.
name), then we will fail to propagate its label. In addition, [10] A. W. Moore and D. Zuev, “Internet traffic classification using Bayesian
the methodology cannot work across networks: i.e., a labelled analysis techniques,” SIGMETRICS Perform. Eval. Rev., vol. 33, pp. 50–
flow in isp cannot be useful in wide trace unless the server is 60, June 2005.
[11] H. Kim, K. Claffy, M. Fomenkov, D. Barman, M. Faloutsos, and K. Lee,
popular in both setups. It should be pointed out that this paper “Internet traffic classification demystified: myths, caveats, and the best
does not address the problem of traffic classification across practices,” in Proc. 2008 ACM CoNEXT Conference, pp. 1–12.
[12] T. Auld, A. W. Moore, and S. F. Gull, “Bayesian neural networks for [37] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification. Wiley,
Internet traffic classification,” IEEE Trans. Neural Netw., vol. 18, no. 1, 2001.
pp. 223–239, Jan. 2007. [38] J. Kittler, M. Hatef, R. Duin, and J. Matas, “On combining classifiers,”
[13] L. Bernaille and R. Teixeira, “Early recognition of encrypted applica- IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no. 3, pp. 226–239,
tions,” in Proc. 2007 International Conference on Passive and Active Mar. 1998.
Network Measurement, pp. 165–175.
[14] D. Bonfiglio, M. Mellia, M. Meo, D. Rossi, and P. Tofanelli, “Revealing
Skype traffic: when randomness plays with you,” in Proc. 2007 Con-
ference on Applications, Technologies, Architectures, and Protocols for Jun Zhang received his Ph.D. degree in 2011
Computer Communications, pp. 37–48. from University of Wollongong, Australia. He is
[15] A. Este, F. Gringoli, and L. Salgarelli, “Support vector machines for TCP currently with School of Information Technology
traffic classification,” Computer Networks, vol. 53, no. 14, pp. 2476– at Deakin University, Melbourne, Australia. His
2490, Sept. 2009. research interests include network and system secu-
[16] A. McGregor, M. Hall, P. Lorier, and J. Brunskill, “Flow clustering rity, pattern recognition, and multimedia processing.
using machine learning techniques,” in Proc. 2004 Passive and Active He has published more than 30 research papers
Measurement Workshop, pp. 205–214. in the international journals and conferences, such
[17] S. Zander, T. Nguyen, and G. Armitage, “Automated traffic classification as IEEE T RANSACTIONS ON I MAGE P ROCESSING,
and application identification using machine learning,” in Proc. 2005 IEEE T RANSACTIONS ON PARALLEL AND D IS -
TRIBUTED S YSTEMS , IEEE T RANSACTIONS ON
IEEE Conference on Local Computer Networks, pp. 250–257.
[18] J. Erman, M. Arlitt, and A. Mahanti, “Traffic classification using I NFORMATION F ORENSICS AND S ECURITY, and IEEE T RANSACTIONS ON
S YSTEMS , M AN AND C YBERNETICS - PART B. Jun Zhang received 2009
clustering algorithms,” in Proc. 2006 SIGCOMM Workshop on Mining
Network Data, pp. 281–286. Chinese government award for outstanding self-financed student abroad. He
is a member of the IEEE.
[19] L. Bernaille, R. Teixeira, I. Akodkenou, A. Soule, and K. Salamatian,
“Traffic classification on the fly,” SIGCOMM Comput. Commun. Rev.,
vol. 36, pp. 23–26, Apr. 2006.
[20] Y. Wang, Y. Xiang, and S.-Z. Yu, “An automatic application signa- Chao Chen received the Bachelor of Information
ture construction system for unknown traffic,” Concurrency Computat.: Technology degree with first class Honours from
Pract. Exper., vol. 22, pp. 1927–1944, 2010. Deakin University, Australia in 2012. He is currently
[21] A. Finamore, M. Mellia, and M. Meo, “Mining unclassified traffic a Ph.D. candidate with School of Information Tech-
using automatic clustering techniques,” in Proc. 2011 TMA International nology, Deakin University. His research interests in-
Workshop on Traffic Monitoring and Analysis, pp. 150–163. clude network management and security, especially
[22] J. Erman, A. Mahanti, M. Arlitt, and C. Williamson, “Identifying and in network traffic classification.
discriminating between web and peer-to-peer traffic in the network core,”
in Proc. 2007 International Conference on World Wide Web, pp. 883–
892.
[23] T. Nguyen and G. Armitage, “Training on multiple sub-flows to optimise
the use of machine learning classifiers in real-world IP networks,” in
Proc. 2006 IEEE Conference on Local Computer Networks, pp. 369– Yang Xiang received his Ph.D. in Computer Science
376. from Deakin University, Australia. He is currently
[24] M. Crotti, M. Dusi, F. Gringoli, and L. Salgarelli, “Traffic classification with School of Information Technology, Deakin
through simple statistical fingerprinting,” SIGCOMM Comput. Commun. University. His research interests include network
Rev., vol. 37, pp. 5–16, Jan. 2007. and system security, distributed systems, and net-
[25] S. Valenti, D. Rossi, M. Meo, M. Mellia, and P. Bermolen, “Accurate, working. In particular, he is currently leading in a
fine-grained classification of P2P-TV applications by simply counting research group developing active defense systems
packets,” in Proc. 2009 International Workshop on Traffic Monitoring against large-scale distributed network attacks. He is
and Analysis, pp. 84–92. the Chief Investigator of several projects in network
[26] J. Erman, A. Mahanti, and M. Arlitt, “Internet traffic identification using and system security, funded by the Australian Re-
machine learning,” in Proc. 2006 IEEE Global Telecommunications search Council (ARC). He has published more than
Conference, pp. 1–6. 100 research papers in many international journals and conferences, such as
[27] J. Erman, A. Mahanti, M. Arlitt, I. Cohen, and C. Williamson, “Of- IEEE T RANSACTIONS ON PARALLEL AND D ISTRIBUTED S YSTEMS , IEEE
fline/realtime traffic classification using semi-supervised learning,” Per- T RANSACTIONS ON I NFORMATION S ECURITY AND F ORENSICS , and IEEE
formance Evaluation, vol. 64, no. 9-12, pp. 1194–1213, Oct. 2007. J OURNAL ON S ELECTED A REAS IN C OMMUNICATIONS. He has served as the
[28] P. Casas, J. Mazel, and P. Owezarski, “MINETRAC: mining flows for Program/General Chair for many international conferences such as ICA3PP
unsupervised analysis & semi-supervised classification,” in Proc. 2011 12/11, IEEE/IFIP EUC 11, IEEE TrustCom 11, IEEE HPCC 10/09, IEEE
International Teletraffic Congress, pp. 87–94. ICPADS 08, NSS 11/10/09/08/07. He has been the PC member for more than
[29] N. Williams, S. Zander, and G. Armitage, “A preliminary performance 50 international conferences in distributed systems, networking, and security.
comparison of five machine learning algorithms for practical IP traffic He serves as the Associate Editor of IEEE T RANSACTIONS ON PARALLEL
flow classification,” SIGCOMM Comput. Commun. Rev., vol. 36, pp. AND D ISTRIBUTED S YSTEMS and the Editor of the Journal of Network and
5–16, Oct. 2006. Computer Applications. He is a senior member of the IEEE.
[30] Y.-s. Lim, H.-c. Kim, J. Jeong, C.-k. Kim, T. T. Kwon, and Y. Choi,
“Internet traffic classification demystified: on the sources of the discrim-
inative power,” in Proc. 2010 International Conference, pp. 9:1–9:12.
[31] J. Ma, K. Levchenko, C. Kreibich, S. Savage, and G. M. Voelker, “Un- Wanlei Zhou received his Ph.D. degree in 1991
expected means of protocol inference,” in Proc2006 ACM SIGCOMM from the Australian National University, Canberra,
Conference on Internet Measurement, pp. 313–326. Australia, and the DSc degree from Deakin Uni-
[32] M. Canini, W. Li, M. Zadnik, and A. W. Moore, “Experience with high- versity, Victoria, Australia, in 2002. He is currently
speed automated application-identification for network-management,” in the chair professor of Information Technology and
Proc. 2009 ACM/IEEE Symposium on Architectures for Networking and the Head of School of Information Technology,
Communications Systems, pp. 209–218. Deakin University, Melbourne. His research interests
[33] Y. Wang, Y. Xiang, J. Zhang, and S.-Z. Yu, “A novel semi-supervised include distributed and parallel systems, network
approach for network traffic clustering,” in 2011 International Confer- security, mobile computing, bioinformatics, and e-
ence on Network and System Security. learning. He has published more than 200 papers in
[34] J. Zhang, Y. Xiang, Y. Wang, W. Zhou, Y. Xiang, and Y. Guan, “Network refereed international journals and refereed interna-
traffic classification using correlation information,” IEEE Trans. Parallel tional conference proceedings. Since 1997, he has been involved in more than
Distrib. Syst., vol. 24, no. 1, pp. 104–117, Jan. 2013. 50 international conferences as the general chair, a steering chair, a PC chair,
[35] A. Webb, Statistical Pattern Recognition. John Wiley & Sons, 2002. a session chair, a publication chair, and a PC member. He is a senior member
[36] D. J. C. MacKay, Information Theory, Inference and Learning Algo- of the IEEE.
rithms. Cambridge University Press, 2003.
Athanasios V. Vasilakos received his Ph.D. degree AND S ERVICE M ANAGEMENT, IEEE T RANSACTIONS ON S YSTEM , M AN ,
in computer engineering from the University of AND C YBERNETICS - PART B, IEEE T RANSACTIONS ON I NFORMATION
Patras, Patras,Greece. He is currently Professor at T ECHNOLOGY IN B IOMEDICINE, IEEE T RANSACTIONS ON C OMPUTERS ,
Kuwait university,Computer Science Dept.He has ACM Transactions on Autonomous and Adaptive Systems, the IEEE Communi-
authored or co-authored over 250 technical papers in cations Magazine, ACM/Springer Wireless Networks (WINET), ACM/Springer
major international journals and conferences. He is Mobile Networks and Applications (MONET). He is founding Editor-in-Chief
author/coauthor of five books and 20 book chapters of the International Journal of Adaptive and Autonomous Communications
in the areas of communications. Prof. Vasilakos has Systems (IJAACS, http://www.inderscience.com/ijaacs) and the International
served as General Chair, Technical Program Com- Journal of Arts and Technology (IJART, http://www.inderscience.com/ijart).
mittee Chair for many international conferences. He He is General Chair of the European Alliances for Innovation, http://eai.eu/ ,
served or is serving as an Editor or/and Guest Editor (funded by the EU Parliament ).
for many technical journals, such as the IEEE T RANSACTIONS ON N ETWORK

An Effective Network Traffic Classification Method With Unknown Flow Detection

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

An Effective Network Traffic Classification Method With Unknown Flow Detection

Uploaded by

Copyright:

Available Formats

IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. 10, NO.

2, JUNE 2013 133

An Effective Network Traffic Classification Method

T RAFFIC classification technique plays an important role

is larger than A. input : large flow set T ; label set Le for E (E ⊂ T )

(a) p = 0.3 (b) p = 0.2 (c) p = 0.1

Fig. 2: Error probability.

[(B ∩ E ∩ Ci = ∅) & (A ∩ Ci = ∅)] = 1. (20) fA (X) = Θx∈X (fncc (x)), (22)

(a) isp (b) wide

Fig. 3: Two real-world traffic datasets.

three−class unknown 0.7

(a) on wide (a) wide

0.8 two−class unknown 0.8

(b) on isp (b) isp

methods when unknown applications are present in the traffic

each traffic class. On the wide dataset, the settings of unknown

dataset, the settings of unknown applications are: bt for the

Fig. 7: F-Measure on isp.

that of Erman’s method whatever the size of unlabelled flows

Fig. 15: Training purity vs number of clusters.

False Detection Rate

True Detection Rate

0.2 A. Remarks on Empirical Study

the previous supervised methods would classify unknown

You might also like