You are on page 1of 8

Journal of Systems Architecture 59 (2013) 1005–1012

Contents lists available at ScienceDirect

Journal of Systems Architecture


journal homepage: www.elsevier.com/locate/sysarc

An in-depth analysis on traffic flooding attacks detection and system


using data mining techniques
Jaehak Yu, Hyunjoong Kang, DaeHeon Park, Hyo-Chan Bang, Do Wook Kang ⇑
IoT Convergence Research Division, Electronics and Telecommunications Research Institute (ETRI), 138 Gajeongno, Yuseong-gu, Daejeon 305-700, Republic of Korea

a r t i c l e i n f o a b s t r a c t

Article history: Recently, as network traffic flooding attack such as DoS and DDoS have posed devastating threats on net-
Available online 19 August 2013 work services, rapid detection, and semantic analysis are the major concern for secure and reliable net-
work services. In addition, in a recent issue of the safety and comfort of vehicles and communication
Keywords: technologies for service is required. We propose a traffic flooding attack detection and an in-depth anal-
Network security ysis system that uses data mining techniques. In this paper we (1) designed and implemented a system
Data mining that detects traffic flooding attacks. Then, it executes classification by attack type and it uses SNMP MIB
DDoS
information based on C4.5 algorithm; (2) conducted a semantic interpretation that extracts and analyzes
Vehicular network
Web of Things
the rules of execution mechanism that are additionally provided by C4.5; (3) performed an in-depth anal-
Association rule mining ysis on the attack patterns and useful knowledge inherent in their data by type, utilizing association rule
mining. Classification by attack and attack type based on C4.5 and association rules, automatic rule
extraction and semantic in-depth interpretation, which are proposed in this paper, provide a positive
possibility to add momentum towards the development of new methodologies for intrusion detection
systems as well as to support establishing policies for intrusion detection and response systems.
Ó 2013 Elsevier B.V. All rights reserved.

1. Introduction and proper response mechanisms are the major concern for secure
and reliable network services [3,6–8]. Moore et al. [2] reported that
The main features of an Internet network are its open environ- the DoS/DDoS (Denial of Service/Distributed Denial of Service) at-
ment and scalability. Today, we are one step closer to this vision tack is the main threat to the entire Internet, and the majority of
due to recent advances in identification technology, Web services, them (90–94%) are deployed by using TCP. As a result, rapid detec-
and wireless networks, which make processing power and commu- tion and fast response mechanisms are the major concern for se-
nication capabilities available in increasingly smaller packages. In- cure and reliable network services [2,3].
deed, the Internet is evolving into the so-called Web of Things DDoS detection methods using SNMP (Simple Network Man-
(WoT), an environment where everyday objects such as buildings, agement Protocol) MIB (Management Information Base) informa-
sidewalks, traffic lights, and commodities are identifiable, read- tion can be classified into trend analysis by protocol, traffic trend
able, recognizable, addressable, and even controllable via the Inter- analysis during a week, method to use correlation between a spe-
net [1–3]. Vehicular Ad hoc NETwork (VANET) offers variety of cific attribute and its information in MIB, [3,9]. However, as most of
services to allow safe and comfortable driving through Vehicular such methods are developed depending on the functions and the
to Vehicular (V2V) and Vehicular to Infrastructure (V2I) communi- characteristics of the attack tools that are used in tests. Hence, ac-
cations in transportation systems. To use these services, safe and tive attempts have been taken to find out a solution through data
reliable V2V and V2I communications must be guaranteed [4,5]. mining and machine learning. Some studies [9–12] used SNMP
The major threats in security research are breach of confidentiality, MIB data for intrusion detection. Jun et al. [9] developed a system
failure of authenticity and unauthorized DoS [2,3]. Recently, as net- named as MAID which uses SNMP MIB-II data for anomaly detec-
work flooding attacks such as DoS/DDoS and Internet Worm have tion. They periodically collected 27 MIB variables from 4 MIB-II
posed devastating threats to network services, rapid detection groups (Interface, IP, TCP, and UDP), and converted them into a
probability density function (PDF) to calculate statistical similarity
⇑ Corresponding author. Address: Cooperative Vehicle-Infra Research Section, metrics which is the input data of the attack classifier. Puttini et al.
Electronics and Telecommunications Research Institute (ETRI), 138 Gajeongno, [10] applied the associated Bayesian classification to the SNMP MIB
Yuseong-gu, Daejeon 305-700, Republic of Korea. Tel.: +82 42 860 6435. variables to detect anomalous network traffic behavior in Mobile
E-mail addresses: dbzzang@etri.re.kr (J. Yu), kanghj@etri.re.kr (H. Kang), Ad Hoc Networks (MANET). Ramah et al. [11] developed an anom-
dhpark82@etri.re.kr (D. Park), bangs@etri.re.kr (H.-C. Bang), kdw4653@etri.re.kr aly detection system using periodic SNMP data collection which is
(D.W. Kang).

1383-7621/$ - see front matter Ó 2013 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.sysarc.2013.08.008
1006 J. Yu et al. / Journal of Systems Architecture 59 (2013) 1005–1012

derived from a PCA (Principle Component Analysis) based unsuper- ities of a victim host first before sending a large flood of packets
vised anomaly detection scheme proposed by Shyu et al. [12]. Yu targeting the victim host, which sabotage both systems resources
et al. [3] proposed a system that detects traffic flooding attacks and network resources. Therefore, the MIB variables used may
and classification a Support Vector Machine (SVM). The founda- not contribute much as before on detecting modern attacks. We
tions of Support Vector Machines were developed by Vapnik in solve this MIB selection problem by the correlation based feature
1995 and are gaining popularity due to their many attractive fea- selection algorithm (CFS) [13].
tures and promising empirical performance. In this method one For the second consideration, we need to determine the detection
maps the data into a higher dimensional input space and one con- timing: when and how often the detection system is triggered to re-
structs an optimal separating hyperplane in this space. This basi- trieve SNMP MIB variables from a target system and analyze them to
cally involves solving a quadratic programming problem, while decide whether the system is normal or abnormal state. Short detec-
gradient based training methods for neural network architectures tion interval gives a fast detection while high processing and traffic
on the other hand suffer from the existence of many local minima. overhead occurs as a side-effect, while long detection interval results
However, such studies have the goal to solve the disadvantages of a in late detection with low burden in system and network. We have to
traditional DDoS detection methodology so that they may overlook determine a suitable detection time and interval.
the advantages of a traditional methodology. In other words, the Fig. 1 illustrates the change of ifInOctets MIB value in the inter-
above mentioned machine learning methodology has been holding val of 1 s which is gathered from a Net-SNMP agent running on a
its ground only in the construction of an efficient system. It over- typical Linux system. Fig. 1 says that SNMP MIB variables are peri-
looked the mechanical interpretation on the system mechanism odically updated with a certain time interval. The ifInOctets MIB
and it turned the core execution mechanism into a black-box. variables of Net-SNMP agent is updated in about every 15 s. We
Therefore, more comprehensive system, even though it is rather figured out that the update interval is different from system to sys-
heuristic, that can consider the hermeneutic advantages of tradi- tem and from agent to agent. If we do not consider the periodic up-
tional DDoS detection methodology is deemed to be desirable. date of SNMP MIB values, the MIB values retrieved at a certain time
In this paper, we propose a system that detects traffic flooding is the values at 15 s before in the worst case.
attacks and executes classification by attack type, using SNMP MIB Thirdly, we have to develop an algorithm for attack detection
information based on C4.5 algorithm. This is a representative clas- using the SNMP MIB data gathered from a target system. This is
sification model of data mining. Moreover, after data preprocessing the problem of how to utilize MIB data for attack detection. Several
of the feature selection and reduction using the method of select- machine learning mechanisms, such as BP, C4.5, and Bayesian net-
ing attribute subsets of SNMP MIB data, this paper conducted an works, have been considered for the classification of attack traffic
in-depth semantic analysis by extracting a set of rules by attack from normal traffic [15]. But the SVM based mechanism, which is
type out of traffic flooding attacks and the characteristics inherent widely accepted as the most successful classification mechanism
in SNMP MIB data using association rule mining. This is a represen- in the pattern recognition area, has not been utilized in the SNMP
tative hermeneutic analysis model of data mining. This system is MIB-based attack classification until now, while it was utilized in
constructed in a hierarchical structure based on C4.5. It first distin- the packet-based attack detection. In this paper, we propose a
guishes attack traffic from normal traffic and then, determines the C4.5-based two-level hierarchical structure for traffic flooding at-
type of attacks in detail. Automatic rule extraction and semantic tack detection, which detects attacks from normal traffic as well
in-depth interpretation of specific rules out of traffic flooding at- as identifies the types of attack. The two-level hierarchical struc-
tacks and their data by attack is also expected to provide a positive ture is very flexible and extendable.
possibility and to give a momentum for the development of new
methodologies for the intrusion detection systems as well as a the- 2.2. C4.5 decision tree
oretical ground for intrusion detection and response system.
The paper is organized as follows. In Section 2, describes consid- Decision tree is a methodology that has been frequently used in
ering points for the SNMP-based attack detection process and re- the classification and prediction in data mining. Its impact and inter-
lated work. In Section 3, the proposed traffic attack detection and action between variables can be easily understood. Each internal
analysis system based on data mining is presented. In Section 4, node denotes a test on an attribute, each branch represents an out-
describes the experiments and results. Finally, in Section 5, the come of the test and each leaf node holds a class label. It has been
conclusion and future research direction are discussed. mainly used in suggesting a model of classification and prediction
as it is different from the neural network structure analysis. It is more
2. Related work intuitive in describing the obtained knowledge and it is convenient
to generate the rules [16,17]. Since ID3, a representative algorithm
2.1. Considering points for SNMP MIB for decision tree, has a weakness of selecting any attribute that has
a wide range of values as high-rank node, this study used C4.5 deci-
SNMP provides a universal method of exchanging data for mon- sion tree algorithm. C4.5 is the most advanced algorithm and its
itoring systems that reside on a network. The use of SNMP is most function of classification and prediction is already proved [16]. In
dominant in the modern industry [3,9]. However, to utilize SNMP C4.5, each node in a tree is associated with a set of cases. Also, cases
for traffic flooding attack detection, we need to consider the fol- are assigned weights to take into account unknown attribute values.
lowing three points in the use of the SNMP MIB variables which af- The information gain of an attribute a for a set of cases Y is calculated
fects the performance and accuracy of the detection system: (1) as follow: If a is discrete, and Y1, . . ., Ys are the subsets of Y consisting
Proper selection of SNMP MIB variables for attack detection, (2) of cases with distinct known value for attribute a, then:
Determination of the detection timing about when and how often, Xn jY i j
(3) Algorithm for attack detection using the selected MIB variables. gain ¼ info ðYÞ  i¼1 jYj
 info ðY i Þ; ð2:1Þ
The first, we need to select proper SNMP MIB variables for
attack detection. This selection should be done to meet that the where
number of SNMP MIB variables involved is minimized and the    
XNClass freq C j ; Y freqðC j ; YÞ
range of attack types covered is maximized. It utilized tcpInErrs, info ðYÞ ¼  j¼1  log2 ð2:2Þ
udpNoPorts, and icmpOutEchoReps SNMP MIB variables for the jYj jYj
purpose of detecting DoS/DDoS attacks. Attackers scan vulnerabil-
J. Yu et al. / Journal of Systems Architecture 59 (2013) 1005–1012 1007

Fig. 1. Periodic update of ifInOctets MIB variable in a typical SNMP agent.

is the entropy function. While having an option to select informa- The amount by which the entropy of Y decreases reflects addi-
tion gain, by default, however, C4.5 considers the information gain tional information about Y provided by X and is called the informa-
ratio of the splitting Y1, . . ., Ys, which is the ratio of information gain tion gain.
to its split information:
  Gain ¼ HðYÞ þ HðXÞ  HðX; YÞ ð2:7Þ
Xn jY i j jY i j
Split info ðYÞ ¼  j¼1  log2 P ð2:3Þ Information gain is a symmetrical measure. Symmetry is a
jYj jYj
desirable property for a measure of feature-feature inter-correla-
It is easy to see that if a discrete attribute has been selected at tion to have. Unfortunately, information gain is biased of feature
an ancestor node, then its gain and gain ratio are zero. Thus, C4.5 with more values. Furthermore, the correlations in Eq. (2.8) should
does not even compute the information gain of those attribute. be normalized to ensure they are comparable and have the same
This Eq. (2.3) the potential information generated by dividing Y affect. Symmetrical uncertainty compensates for information
into n subsets, whereas the information gain measures the infor- gain’s bias toward attributes with more values and normalizes its
mation relevant to classification that arises from the same division. value to the range [0,1]:
Then,
Symmetrical uncertainty coefficient
gain ðYÞ  
Gain ratio ðYÞ ¼ ð2:4Þ Gain
Split info ðYÞ ¼ 2:0  ð2:8Þ
HðYÞ þ HðXÞ
Expresses the proportion of information generated by the split
If the correlation between each of the components in a test and
that is useful, i.e., that appears helpful for classification.
the outside variable is known, and the inter-correlation between
each pair of components is given, then the correlation between a
2.3. Attribute subset selection
composite test consisting of the summed components and the out-
side variable can be predicted from:
Attribute selection is the problem of selecting a subset of attri-
butes from a feature set in order to provide a compact, precise and krcf
Merit ðF s Þ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð2:9Þ
fast classifier with minimal performance degradation as possible k þ kðk  1Þr ff
by removing the attributes that are useless, redundant or that
are used the least [13,14]. In this paper, we used the Correlation- where Fs is the merit function of a feature subset S containing k
based Feature Selection (CFS) that has been verified as the best features, rcf is the mean feature-class correlation, and r ff is the
among the attribute subset selection methods [13]. A probabilistic average feature-feature inter-correlation.
model of a nominal valued feature Y can be formed by estimating
the individual probabilities of the values y e Y from the training 2.4. Association rule mining
data. Entropy is a measure of the uncertainty or unpredictability
in a system. The entropy of Y is given by: Association rule mining finds all rules in the database that sat-
X isfy some minimum support and minimum confidence constraints
H ðYÞ ¼  y2Y pðyÞlog2 ðpðyÞÞ ð2:5Þ [18,19]. For association rule mining, the target of mining is not pre-
determined, while for classification rule mining there is one and
If the observed values of Y in the training data are partitioned
only one pre-determined target, i.e., the class. Let I = {I1, I2, I3, . . ., Im}
according to the values of a second feature X, and the entropy Y
be a set of items. Let D, the task-relevant data, be a set of database
with respect to the partitions induced by X is less than the entropy
transactions where each transaction T is a set of items such that
of Y prior to partitioning, then there is a relationship between fea-
T # I. Each transaction is associated with and identifier, called
tures Y and X Eq. (2.6) gives the entropy of Y after observing X.
X X TID. Let A be a set of items. A transaction T is said to contain A if
H ðYjXÞ ¼  pðxÞ pðyjxÞlog2 ðpðyjxÞÞ ð2:6Þ and only if A # T. An association rule is an implication of the from
x2X y2Y A ) B, where A  I, B  I, and A \ B = U. The rule A ) B holds in the
transaction set D with support, where s, where s is the percentage
1008 J. Yu et al. / Journal of Systems Architecture 59 (2013) 1005–1012

of transactions in D that contain A [ B. This is taken to be the prob- It provides additional information about the attack type to the
ability, P(A [ B). The rule A ) B has confidence c in the transaction intrusion response system. The overall architecture of our pro-
set D, there c is the percentage of transactions in D containing A posed system is given in Fig. 3.
that also contain B. This is taken to be the conditional probability,
P(B|A) [18,19]. That is,
4. Experiment
SupportðA ) BÞ ¼ PðA [ BÞ
ð2:10Þ 4.1. Testbed network structure
confidenceðA ) BÞ ¼ PðBjAÞ
Rules that satisfy both a minimum support threshold and a We constructed a testbed network to carry out an actual attack
minimum confidence threshold are called strong. By convention, experiment. Fig. 4 illustrates the network topology, which consists
we write support and confidence values so as to occur between of one victim system, two attack agent systems, one attack handler
0% and 100%, rather than 0 to 1.0. system, and one dataset collector system. The OS of the victim sys-
tem is Linux Fedora 7. Linux Fedora 8 is used for the OS of other
systems. The testbed network is connected to the campus network,
3. Traffic attack detection and analysis system so normal traffic is generated between the victim host and other
hosts outside the testbed network during the experiment period.
The proposed traffic flooding attack detection and an in-depth We used Stacheldraht [20], a distributed denial of service attack
analysis system is composed of three modules: for online process- tool, to generate attack traffic. The Stacheldraht was selected be-
ing module, there are SNMP MIB generators, MIB update detection cause it is a more mature attack tool compared to other attack
and MIB data store, attack detection and classification module; for tools, such as TFN, TFN2K, or Trinoo. The Stacheldraht is composed
offline module, C4.5 training and association rule mining module of handler (master) and agent (daemon) programs. Agent systems
and lastly system administrator as a management module (refer produce a large flood of packets targeting the victim host, which
to Fig. 2). (1) SNMP MIB generator’s module generates MIB informa- sabotage both systems resources and network resources. For the
tion from the network traffic data; (2) MIB update detection module experiments we conducted three types of network flooding at-
collects ifInOctets MIB, determines the activation time of a detec- tacks: TCP-SYN flooding, UDP flooding, and ICMP flooding attacks.
tion system and executes MIB data store; (3) MIB data store module
stores only the MIB information that is determined in the C4.5
training module from the target system; (4) The collected informa- 4.2. SNMP MIB variables
tion is transferred to attack detection, classification module and
then, it is used to judge the occurrence of attacks and the attack During the attack experiment, the dataset collector system
type in real-time (refer to Fig. 3); (5) C4.5 training module ran- gathered SNMP MIB data from the victim system using SNMP
domly generates various traffic attacks to execute a C4.5-based query messages. Firstly, we investigated 66 MIB variables from five
learning; (6) Association rule module conducts an in-depth seman- MIB-II groups: Interface, IP, TCP, UDP, and ICMP. The data types of
tic interpretation that extracts and analyzes the data characteristics the 66 MIB variables are Counter, which is a non-negative 4-byte
of the data stored in the MIB data store module in a form of rule; (7) integer that may be incremented but not decremented. We se-
System management module detects traffic flooding attack in real- lected 13 MIB variables among 66 MIB variables, which are likely
time, monitors detailed information about classification type. It uti- to be affected by the attack traffic by a comprehensive but not
lizes the rules and semantic interpretation information provided by exhaustive investigation. Most of the 13 MIB variables are used
C4.5 learning and an association rule model in order to establish the in [3,9] for traffic flooding attack detection. The 13 MIB variables
policies for intrusion detection and response system. and their corresponding MIB groups are shown in Table 1.
The hierarchical flooding attack detection and attack type clas-
sification system schematized in Fig. 3 is composed of two layers. 4.3. Semantic analysis based on C4.5
The first layer classifies normal traffic and attack traffic. It reports
any detected attack traffic to the system administrator of the The first experiment is to promptly detect the attack traffic and
attack response system in real-time. The second layer classifies conduct a semantic interpretation on the rules. We performed a
all the attack traffic that is judged as a traffic flooding attack into C4.5 training with 1000 normal MIB records of a training dataset,
TCP-SYN flooding, UDP flooding and ICMP flooding respectively. then, tested with 2500 MIB records of a testing dataset: 1000 nor-

online process offline process management process

MIB update Association


detection rule mining manager
INTERNET

real-time MIB data

MIB data Attack detection


Router SNMP MIB
store and classification manager
generators
Enterprise
Network

C4.5 training
offline process

Fig. 2. The overall architecture of traffic attack flooding detection and analysis system.
J. Yu et al. / Journal of Systems Architecture 59 (2013) 1005–1012 1009

Traffic data

STEP 1 :
Attack detection

SNMP MIB
Normal
data
Attack alarm
Attack
Response
STEP 2 :
Response system Attack type
action classification
Attack types
Response
strategy
update

TCP-SYN UDP ICMP


flooding flooding flooding
Response
strategy
base
Attack type information

Fig. 3. C4.5-based hierarchical two-level structure.

Campus
INTERNET
Network

Testbed Network

L2 Switch

MIB data collector

normal traffic Attack Agents


Victim Attack Handler
attack traffic

Fig. 4. The testbed network for the DDoS attack experiment.

Pn
Table 1 Ti
The SNMP MIB variables used for the attack detection Attack detection rate ¼ Pi¼1
n ð4:1Þ
mechanism. i¼1 Ii

MIB-2 group SNMP MIB variables Pn


Pi
IP ip.ipInReceives False positive rate ¼ Pni¼1 ð4:2Þ
ip.ipInDelivers i¼1 N i
ip. ipOutRequests
ip.ipOutDiscards Pn
TCP tcp.tcpAttemptFails Fi
False negative rate ¼ Pi¼1
n ð4:3Þ
tcp.tcpOutRsts i¼1 Ii
UDP udp.udpInErrors
In the above equation, I is an individual attack traffic MIB re-
ICMP icmp.icmpInMsgs cord, while N is a MIB record for normal traffic. T is an attack traffic
icmp.icmpInErrors
record which is classified as an attack by the system. P indicates a
Icmp.icmpInDestUnreachs
icmp.icmpOutMsgs normal traffic record which is misclassified as attack traffic. F is an
icmp.icmpOutErrors attack traffic record which is misclassified as normal traffic. The
icmp.icmpOutDestUnreachs experimental results are shown in Table 2. We used the attack
detection rate, false positive ratio FPR, and false negative ratio
FNR as the performance criteria. The FPR is the rate of misclassified
mal MIB records and three 500 MIB records for three attacks types: normal traffic, as attack traffic over total normal traffic. The FNR is
TCP-SYN, UDP, and ICMP flooding attacks. We used three impor- the rate of misclassified attack traffic, as normal traffic over total
tant formulas [3] to evaluate the performance of the first step of attack traffic. It is generally accepted that the FNR is more impor-
identification: the attack detection rate, false positive rate (FPR), tant than the FPR for the performance evaluation of an intrusion
and false negative rate (FNR), which are as follows: detection system.
1010 J. Yu et al. / Journal of Systems Architecture 59 (2013) 1005–1012

Table 2 Table 4
The performance of the proposed system in the attack identification. The performance in the attack-type classification.

Attack detection rate False positive rate (FPR) False negative rate (FNR) TCP-SYN flooding UDP flooding ICMP flooding Overall accuracy
99.13% 0.3% 0.87% 100% 100% 100% 100%

Fig. 6. The C4.5 decision tree for the attack classification.

Table 5
The rules for the attack classification.

Rule The rule and semantic analysis


Fig. 5. The C4.5 decision tree for the attack identification. #
1 IF udpInErrors > 0 THEN UDP flooding attack
Fig. 5 is a decision tree for the classification of normal traffic and 2 IF udpInErrors = 0 AND icmpInEchos > 0 THEN ICMP flooding attack
3 IF udpInErrors = 0 AND icmpInEchos = 0 THEN TCP-SYN flooding
attack traffic. During the construction of a decision tree, it can be
attack
seen that the normal traffic and attack traffic are precisely classi-
fied with only 4 variables out of the 13 MIB variables that are de- Pn
fined in the system. The values inside the leaf node imply the Ti
Classification accuracy ¼ Pi¼1
n ð4:4Þ
traffic numbers that are classified and number of misclassified into i¼1 I i
a corresponding class by the learning data. Six rules can be ob-
In the above equation, I is an individual attack traffic record in
tained from Fig. 5 (refer to Table 3).
the corresponding attack class. T is the correctly classified attack
If the rules of Table 3 are comprehensibly analyzed then, the
traffic record. The classification result is also shown in Table 4.
occurrence of attack can be judged with additional detailed condi-
Fig. 6 is a decision tree for a classification by attack type. It
tions depending on the range of ipInReceives that reflect on the
shows than an attack type can be precisely classified with only
number of datagrams. Especially, as most of the traffic flooding at-
two variables. Three rules can be obtained from Fig. 6 (refer to
tacks transfer large-size packets, it was found out that the occur-
Table 5).
rence of an attack was precisely judged when the ipInReceives
The first rule of Table 5 shows that udpInErrors is a value that is
value exceeded 88006.
to be increased by an UDP packet composition error and it reacts
Secondly, we performed a classification test into three types of
only against an UDP flooding attack. In the second rule, icmpInE-
attacks from attack traffic. For this test, we selected three sets of
chos is a value that indicates the number of received icmp messages
MIB records for the C4.5 training of the three TCP-SYN, UDP, and
and it is analyzed to not be influenced by TCP-SYN flooding or UDP
ICMP flooding attacks. Each set consists of 500 MIB data records
flooding attacks. In the third rule, it precisely classifies the attacks
which are randomly selected among the total 1500 traffic records.
other than UDP flooding or ICMP flooding attack into TCP-SYN
The correctly determined MIB records in the first step of attack
flooding attack without using MIB dependent on TCP. The rules in
identification are used for the performance test of the attack clas-
Table 5 show the network load that can be minimized. It also gives
sification. For the performance evaluation [21], we used the classi-
an accurate classification of the representative DDoS type attacks:
fication accuracy shown below:

Table 3
The rules for the attack identification.

Rule # The rule and semantic analysis


1 IF ipInReceives > 88006 THEN attack
2 IF 57316 < ipInReceives 6 88006 AND ipInDelivers > 28782 THEN normal
3 IF 57316 < ipInReceives 6 88006 AND ipInDelivers 6 28782 THEN attack
4 IF ipInReceives 6 57316 AND icmpOutMsgs > 22 THEN attack
5 IF ipInReceives 6 57316 AND icmpOutMsgs 6 22 AND udpInErrors > 0 THEN attack
6 IF ipInReceives 6 57316 AND icmpOutMsgs 6 22 AND udpInErrors = 0 THEN normal
J. Yu et al. / Journal of Systems Architecture 59 (2013) 1005–1012 1011

Table 6
Association rule for attack identification.

Rule # The rule and In-depth analysis Confidence


1 IF udpInErrors P 1545 THEN attack 100%
2 IF ipInReceives P 75945 THEN attack 99%
3 IF icmpOutMsgs P 85162 THEN attack 100%
4 IF ipInReceives P 75945 AND udpInErrors P 1545 THEN attack 100%
5 IF ipInReceives P 75945 AND icmpOutMsgs P 85162 THEN attack 100%
6 IF ipInReceives < 75945 AND icmpOutMsgs < 12166 AND udpInErrors < 1545 THEN normal 95%

Table 7
Association rule for attack-type classification.

Rule The rule and In-depth analysis Confidence


#
1 IF tcpOutRsts P 14418 THEN TCP-SYN flooding attack 100%
2 IF tcpOutRsts P 14418 AND ipInDelivers P 32541 AND icmpInMsgs < 17027 AND icmpInEchos < 16974 AND udpInErrors < 2207 THEN TCP-SYN 100%
flooding attack
3 IF udpInErrors P 2207 AND icmpInMsgs < 17027 AND icmpInEchos < 16974 AND tcpOutRsts < 14418 AND ipInDelivers < 32541 THEN UDP 100%
flooding attack
4 IF icmpInEchos P 67896 OR icmpInMsgs P 85135 THEN ICMP flooding attack 100%
5 IF ipInDelivers P 32541 AND icmpInMsgs < 17027 AND udpInErrors < 2207 THEN TCP-SYN flooding attack 99%
6 IF ipInDelivers < 32541 AND icmpInMsgs < 17027 AND tcpOutRsts < 14418 THEN UDP flooding attack 81%
7 IF ipInDelivers < 32541 AND tcpOutRsts < 14418 AND udpInErrors < 2207 THEN ICMP flooding attack 81%

TCP-SYN flooding, UDP flooding and ICMP flooding with only two 80%. The major association rules among the extracted attributes in
MIB variables. this experiment were summarized in Table 7.
TCP-SYN flooding attack is an attack method that transmits
4.4. An in-depth analysis using association rule mining messages with a fake IP address to a server and the server sends
SYN/ACK to its clients in order to make them wait for receiving
The first experiment was conducted by finding out the inherent ACK forever. From the analysis on the rules for TCP-SYN flooding
useful patterns in the data of normal traffic and attack traffic. (Rule # 1–2), knowledge can be obtained that when the tcpOutRsts
Among the MIB datasets we randomly selected 2500 MIB datasets: value is high or the values of both tcpOutRsts and ipInDelivers are
1000 for a normal state and 500 for each attack type (TCP-SYN, UDP, simultaneously high, it will be a TCP-SYN flooding attack. It was
and ICMP flooding attack) for an in-depth analysis. In order to select especially verified that when the received number of TCP segments
the optimal attribute subset for an in-depth analysis, the Correla- including RST flag is high the number of total datagram transferred
tion Feature Selection (CFS, Eq. (2.9)) of Weka [22] was used. The to the upper layer among the received IP datagrams is high and it
attribute subsets selected from CFS were {ipInReceives, udpInEr- will be a TCP-SYN flooding attack.
rors, icmpOutMsgs}. Minimum support was set at 10% and the min- UDP flooding attack is an attack method that exhausts the net-
imum confidence was set at 80%. Table 6 shows the summary of the work bandwidth of the target network and paralyzes normal oper-
major association rules between the extracted characteristics. ation by consecutively sending packets to the target object. As for
A comprehensive analysis on the rules in Table 6 shows that the UDP flooding attack (Rule # 3), it was verified that when the rest of
character of traffic flooding attack is to transmit large size packets the MIB values excluding the udpInErrors’ value is relatively low, it
in case of the rules for attack traffic (Rule # 1–5). So, a typical will be a pattern that corresponds to the UDP flooding attack. The
pattern of an attack traffic appears when one of the values of udpInErrors value is a value that was increased by error in UDP
ipInReceives, icmpOutMsgs and udpInErrors has a large number. packet composition and it turned out to be a useful pattern to ex-
Typical patterns of an attack traffic appear especially when the plain the characteristics of an UDP flooding attack.
number of received datagrams and the transmitted icmp messages Finally, ICMP flooding attack is an attack method that alters
is high or the number of datagrams not transferred due to an error ping messages and it exhausts the network’s bandwidth whenever
in the udp packet composition is high. In case of normal traffic all the systems on the network that receives an icmp ping message
(Rule # 6), all the MIB values for ipInReceives, icmpOutMsgs and and responds to it. It was verified that the ICMP flooding attack
udpInErrors show a low value pattern. This shows a normal traffic pattern (Rule # 4) can be detected when the value of icmpInMsgs
pattern that, under normal traffic, the number of received data- or icmpInEchos is high. It can be used as an useful pattern of the
grams, especially the number of received icmp messages and num- ICMP flooding attack as the value of icmpInMsgs refers to the total
ber of datagrams not transferred due to an error in the UDP packet number of icmp messages transmitted or the value of icmpInEchos
composition are relatively low when compared to the case of a that means the number of icmp messages received is not influ-
traffic flooding attack. Especially when compared to the experi- enced by TCP-SYN flooding or UDP flooding attack. When compar-
mental result of a decision tree in Section 4.3, it was verified that ing especially to a decision tree in Section 4.3, it was found out that
this experiment provided more comprehensive rules (Rule # 3–5) this experiment provided more comprehensive rules (Rule # 1–2,
that included the rules that are provided in Section 4.3 (Rule # 1, 6–7) including the rules that are provided in Section (Rule # 3–5).
2, 6).
The second experiment is for creating a semantic interpretation 5. Conclusion
and extraction for the patterns of DDoS attack types and 500 cases
for each attack type were randomly extracted for the experiment. In this paper, we proposed (1) designed and implemented a sys-
Five attribute subsets selected as CFS (Eq. (2.9)) in Weka were tem that detects traffic flooding attacks and executes classification
{ipInDelivers, icmpInMsgs, icmpInEchos, tcpOutRsts, udpInErrors}. by the attack type and it uses SNMP MIB information based on C4.5
Minimum support was set at 10% and the minimum confidence at algorithm. This is a representative classification model of data min-
1012 J. Yu et al. / Journal of Systems Architecture 59 (2013) 1005–1012

ing; (2) conducted a semantic interpretation on attack detection Jaehak Yu received his B.S. degree in Computer Science
from Konkuk University in 2001. He received his M.S.
and classification through an in-depth analysis on the rules of and Ph.D. degrees in Computer Science from Korea
the execution mechanism that was additionally provided by C4.5 University, Korea, in 2003 and 2010, respectively. He is
and (3) performed an in-depth analysis on attack patterns and use- currently a senior researcher of Electronics and Tele-
ful knowledge inherent in their data utilizing association rule min- communications Research Institute, Korea. His recent
research interests include data and information analy-
ing. The semantic in-depth analysis method tried in this paper on
sis, intelligent network management, IoT/USN service
attacks and their attack types on the basis of C4.5 and association platform, and Internet of Things.
rule is an innovative research that has never been attempted yet
but has a lot of possibility of contribution in future.
Our future work will be made on a more concrete research that
can support the establishment of policies for intrusion response
systems and intrusion protection systems. Also, our ongoing works
include addressing the presence of things on the Web and further Hyunjoong Kang is currently a researcher at Electron-
validate the ideas of security in various Web environments. ics and Telecommunications Research Institute (ETRI) in
Korea since 2009. He graduated from the Suncheon
Acknowledgments National University with a M.S Degree in Information
and Communication Engineering in 2009. Hyunjoong
Kang’s research focus is Wireless Sensor Networks,
This work was supported by Electronics and Telecommunica- Internet of Things, and Smart Devices.
tions Research Institute (ETRI) Grant funded by the Korea govern-
ment [13ZC1130, Development of USN/WoT Convergence Platform
for Internet of Reality Service Provision].

References

[1] S. Mathew, Y. Atif, Q. Sheng, Z. Maamar, Web of things: Description, discovery


and integration. Proceedings of 2011 IEEE Int. Conf. on Internet of Things, and DaeHeon Park received the B.S. degree in Communi-
Cyber, Physical and Social Computing, 2011, pp. 9–15. cation & Information engineering from Sunchon
[2] D. Moore, G. Voelker, S. Savage, Inferring Internet Denial-of-Service Activity. National University in 2006, M.S. degree in Sunchon
Proceedings of the Usenix Security, Symposium, 2001, pp. 401–414.
National University in 2008, where he is currently a
[3] J. Yu, H. Lee, M. Kim, D. Park, Traffic flooding attack detection with SNMP MIB
Ph.D candidate in Sunchon National Univ. His research
using SVM, J. Comput. Commun. 31 (17) (2008) 4212–4219.
[4] M. Raya, J.P. Hubaux, Securing vehicular ad hoc networks, J. Comput. Secur. 15 interests focus on wireless communication, RFID/USN,
(2007) 39–68. UWB, Communication System, Embedded System
[5] X. Lin, X. Sun, P.H. Ho, X. Shen, GSIS: a secure and privacy preserving protocol for technologies.
vehicular communications, IEEE Trans. Veh. Technol. 56 (6) (2007) 3442–3456.
[6] F.H. Tseng, L.D. Chou, H.C. Chao, A survey of black hole attacks in wireless
mobile ad hoc networks, J. HCIS 1 (1) (2011) 1–16.
[7] X. Wang, Y. Sang, Y. Liu, Y. Luo, Considerations on security and trust
measurement for virtualized environment, J. Convergence 2 (2) (2011) 19–24.
[8] M. Kim, J. Lee, Y. Lee, J. Ryou, COSMOS: a middleware for integrated data
processing over heterogeneous sensor networks, ETRI J. 30 (05) (2008) 696–706.
[9] L. Jun, C. Manikopoulos, Early statistical anomaly intrusion detection of DoS
Hyo-Chan Bang received his B.S. degree in Engineering
attacks using MIB traffic parameters. Proceeding of IEEE Information
Management from Hokkaido Institute of Technology in
Assurance, Workshop, 2003, pp. 53–59.
1995, M.S. degree in Industrial Engineering from Hok-
[10] R. Puttini, M. Hanashiro, F. Miziara, R.D. Sousa, L.J. García-Villalba, C.J. Barenco,
On the anomaly intrusion detection in mobile Ad hoc network environments. kaido Institute of Technology in 1997, where he is cur-
Proceeding of PWC 2006 (LNCS 4217) 2006, pp. 182–193. rently a Ph.D candidate in Chungnam National Univ. He
[11] K.H. Ramah, H. Ayari, F. Kamoun, Traffic anomaly detection and is currently a principal researcher of Electronics and
characterization in the Tunisian National University Network. Proceeding of Telecommunications Research Institute, Korea. His
Networking 2006 (LNCS 3979) 2006, pp. 136–147. recent research interests include intelligent IoT plat-
[12] M. Shyu, S. Chen, K. Sarinnapakorn, L. Chang, A novel anomaly detection form, wireless multimedia communication, USN/WoT
scheme based on principal component classifier. Proceeding of the IEEE convergence platform.
Foundations and New Directions of Data Mining Workshop, Florida, USA, 2003,
pp. 172–179.
[13] M. Hall, Correlation-based Feature Selection for Machine Learning. PhD Diss.,
Department of Computer Science. Waikato University, Hamilton, NZ, 1998.
[14] I. Seok, J. Lee, B. Moon, Hybrid genetic algorithms for feature selection, J. IEEE
Trans. Pattern Anal. Mach. Intell. 26 (11) (2006) 1424–1437.
[15] J.B.D. Cabrera, L. Lewis, X. Qin, W. Lee, R.K. Mehra, Proactive intrusion Do Wook Kang received the B.S. and M.S. degrees in
detection and distributed denial of service attacks – a case study in security Electrical & Electronic Engineering from Dongshin Uni-
management, J. Network System Manage 10 (2) (2005) 225–254. versity in 2002 and 2004, respectively. From 2005 to
[16] S. Ruggieri, Efficient C4.5, J. IEEE Trans. Knowl. Data Eng. 14 (2) (2002) 438–444. 2011, he was a Senior Engineer at R&D Center, Fumate
[17] S. Kim, S. Oh, Decision-tree-based markov model for phrase break prediction, Co., Daejeon, Rep. of Korea, where he developed High-
ETRI J. 29 (04) (2007) 527–529. speed Modems for Wireless communication. In 2011, he
[18] J. Han, M. Kamber, Data Mining: Concept and Techniques, second ed., Morgan joined ETRI, Daejeon, Rep. of Korea. His current research
Kaufmann Publishers, 2007. interests include Ground-penetrating radar, Wireless-
[19] C.R. Valencio, F.T. Oyama, P. Scarpelini, A.C. Colombini, A.M. Cansian, R. Souza, P. Backhaul systems, and Wireless LAN systems.
Correa, MR-Radix: a multi-relational data mining algorithm, J. HCIS 2 (1) (2012)
1–17.
[20] Distributed Denial of Service (DDoS) Attacks/Tools, http://staff.washington.
edu/dittrich/misc/ddos/.
[21] J. Yu, H. Lee, Y. Im, M. Kim, D. Park, Real-time classification of internet
application traffic using a hierarchical multi-class SVM, J. KSII Trans. Internet
Inf. Syst. 4 (5) (2010) 859–876.
[22] Machine Learning Lab in The University of Waikato, http://www.cs.waika
to.ac.nz/ml.

You might also like