You are on page 1of 5

(IJCSIS) International Journal of Computer Science and Information Security

Vol. 8, No. 7, October 2010

False Positive Reduction using IDS Alert Correlation
Method based on the Apriori Algorithm
Homam El-Taj, Omar Abouabdalla, Ahmed Manasrah,
Mohammed Anbar, Ahmed Al-Madi

National Advanced IPv6 Center of Excellence (NAv6)
Universiti Sains Malaysia

Penang, Malaysia
{homam, omar, ahmad, anbar, almadi}

Abstractβ€”Correlating the Intrusion Detection Systems (IDS) methods have minimum amount of false positive, while
is one challenging topic in the field of network security. There anomaly methods can detect novel attacks.
are many benefits from correlating the IDS alerts: to reduce
the huge amount of alerts that IDS triggers, to reduce the false III.IDS ALERTS’ CORRELATION STUDIES
positive ratio and to figure out the relations between the alerts
to get better understanding of the attacks. One of these
correlation techniques based on the data mining. In this paper Correlation is part of intrusion detection studies that smoothes the
we developed new IDS alerts group correlation method (GCM) progress of the analysis of intrusion alerts based on the similarity
based on the aggregated alerts by the Threshold Aggregation between alert attributes, this can represented in mathematical
Framework (TAF) we create our correlation method by expression as below:
adapting the Apriori algorithm for large data. This method
used to reduce the amount of aggregated alerts and to reduce 𝐢𝐢𝐢𝐢𝐢𝐢𝐢𝐢_𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 = {𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴1 , 𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴2 , … , 𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝑛𝑛 }
the ratio of false positive alerts.
Where the group of alerts {Alert1, Alert2, … , Alertn} with the same
Keywordsβ€”Intrusion Detection System; False Positive Alerts;
features which have relations is represented by Corr_Alert.
Alert Correlation; Data Minig.
However, most of the correlation methods focus on IDS alerts by
examining other intrusion evidence provided by system monitoring
tools or scanning tools. The aim of correlation analysis is to detect
relationships among alerts so it will be easy to build attack
Based on the essential and extensive usage of internet and
their applications, threats and intrusions become wider and
smarter. And because IDS triggers huge amount of alerts the
A. Classification of Alert Correlation Technique
need of study these alerts become essential too. The study of
IDS alerts led to bringing to light some of the IDS issues
IDS alerts correlation studies got many angles to cover this issue
which should be studied, these issues comes in how to group
using many methods and techniques which can be categorized by:
the alerts, define the relation between the alerts and reduce
similarity-based, pre-defined attack scenarios, pre-requisites and
the false alerts.
consequences and statistical causal analysis.
a) Similarity-Based
IDS monitors the protected network activities and analyze
This technique is based on comparing alert features to see if
them to trigger alerts if there is any malicious activity
there is a similarity between the features, mainly the
accrued. IDS can detect these activities based on anomaly
detection methods [1], misuse detection methods [2] or a correlation will be based on these features (Source IPs,
compensation between both of them. While anomaly Distention IPs, Source Ports and Distention Ports).
methods detect the malicious traffic by determining the Valdes and Skinner [3] correlated the IDS alerts by three
abnormality between the suspicious activities flow and the phases starting with the minimum similarity is based on the
norm flow based on a chosen threshold, misuse methods similarity of source and destination IPs, while the second
phase similarity is based on attack class and attack name
detect malicious activates based on their signatures. The
plus source and destination IPs. This phase ensures that it
main differences between these methods based on the
correlates the same alert from different sensors, and the last
detecting novel attacks and the false positive ratio, misuse
phase a threshold value is applied to correlate two alerts
This research was sponsored by the National Advanced IPv6 Center of based on the similarity of similar attack class with no
Excellence (NAv6) Fellowship in Universiti Sains Malaysia (USM).
consideration of other features.
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No. 7, October 2010

The idea of studying the attack scenarios came from the fact APRIORI ALGORITHM
that intrusions mainly took several actions to a successful Our correlation method is based on the IDS aggregated alert
attack. using Threshold Aggregation Framework (TAF), TAF output
Debar and Wespi [4] They proposed a system to correlate will be accurate aggregated alerts with no redundant alerts
and aggregate IDS alerts triggers by different sensors, their and incomplete alerts. In TAF to aggregate two alerts or
system got two steps starting by removing the redundant more a threshold value should be applied to give more
alerts if they are from different sensor, then correlating the accuracy combination results [7].
alerts is achieved by applying the Consequences rules which Figure 4.1 shows the TAF flowchart, the TAF has two types
specifies that any alert should be followed by another type of inputs; the IDS alerts and the user aggregation options.
of alert, depending on these rules the alerts will be Depending on these two inputs the aggregation will be done.
correlated so the aggregation phase will start to check if The user will choose which type of aggregation method to
there are any similarity between the source and destination aggregate the IDS alerts.
IPs and attack class. We propose Group Correlation Method (GCM) which will
use the output of the TAF to correlate the alerts by using the
c) Pre-Requisites and Consequences Apriori algorithm.
From the GCM flowchart in Figure 4.2 we can see that there
This technique comes in the middle between features is an alert counter checker to see whether the amount of the
similarity correlations and scenarios based correlations. Pre- alert in the file less than or equal 2 we drop the alerts since
requisites can be defined as the essential conditions that there will be no need to correlate them.
must exist for the attack to be succeeded, and consequences
for the attacks are defined as conditions that might exist
User Selection
after a specific attack occurred.
Cuppens and Miege [5] they proposed a cooperation module
for IDS alerts with five main functions: alert base Selection
With Thr Threshold Value
management function to normalize the alerts, alert clustering Thr = tr

and alert merging functions used to detect the similarity so
Without Thr
the alerts will be clustered and merged with each other, alert
correlation function will use the explicit correlation rules Database
Query Generator Save
with pre-defined and consequence statement to do the Container

correlation, intention recognition function which is used to
extrapolate intruder actions provides a global diagnosis of Alert
Missing Features Checker Aggregation Data
the (past, present and future) of the intruders actions, and
reaction function used to help the system administrators to
choose the best measurement to prevent the intruder’s Check Generating
Drop Alert Bad Parsing Results
malicious actions. Parsing


d) Statistical Causal Analysis Data Parser Show Results to

This technique relies on the way of ranking the IDS alerts
based on one of the statistical models to correlate them. Data Manipulator

Kumar [6] implemented anomaly detection by using
Granger Causality Test (time series analysis method) to IDS Alerts New Alerts Data Analyzer
correlate alerts in attack scenario analysis. This technique
aims to reduce the amount of raw alerts by merging alert Figure 4.1 TAF flowchart [7]
based on their features, statistical causal analysis uses
clustering technique to rank the alerts based on the relations
of attacks. This technique is a pure statistical causality
analysis with no need for a pre-defined knowledge attack

ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No. 7, October 2010

A. Apriori Algorithm Item in the second item group as one
set of S{ i1, i2, ….., in }
The reason of choosing the Apriori algorithm because it is (3) Set minSupp & Set minCon
one of fastest data mining algorithms used to find all frequent (4) Calculate support value for each in in S
itemsets in a large database[8]. Apriori algorithm depends on (5) Iteration I = n-1
two predefined threshold values (Support and Confidence) to (6) While I β‰₯ 1
see whether the itemset (group of alerts) are related to each (7) Do β‹‚nr=1 iar
other or not. The Support value equals the frequent of items (8) Calculate Support and Confidence for
in the itemset, while the Confidence value can be calculated in in D{ j1, j2, ….., jm } where D ∈S
by the following equation: (9) For each jm in D if Support < minSupp
𝐿𝐿𝐿𝐿𝐿𝐿 + 𝑅𝑅𝑅𝑅𝑅𝑅 OR Confidence < minCon Drop the
𝐢𝐢𝐢𝐢𝐢𝐢𝐢𝐢𝐢𝐢𝐢𝐢𝐢𝐢𝐢𝐢𝐢𝐢𝐢𝐢 = βˆ— 100% (1) Itemset.
𝐿𝐿𝐿𝐿𝐿𝐿 (10) I = I-1
Where LHD is the support of left side, RHD is the support of Figure 4.3 Apriori Algorithm
right side.
B. Mathmatical representation of Apriori Algorithm
Files of
Aggregated Alerts
For a better understanding of Apriori algorithm we are
mathematically representing it as follow:
Alert Amount
Amount ≀ 2 The Initial Step:-

Let Itemset S =i1, i2, ….., in, R =1, 2, 3, …, g and I=
Database Iteration.

Iteration I=0 :-
Generate Itemset Ia Drop Alert
𝑆𝑆 = (𝑖𝑖1, 𝑖𝑖2, … . . , 𝑖𝑖𝑛𝑛), 𝑇𝑇 = (𝑗𝑗1, 𝑗𝑗2, … , π‘—π‘—π‘˜π‘˜) π‘ π‘ π‘ π‘ π‘ π‘ β„Ž π‘‘π‘‘β„Žπ‘Žπ‘Žπ‘Žπ‘Ž π‘—π‘—π‘šπ‘š
∈ {1, 2, 3, … , 𝑔𝑔}, π‘šπ‘š = (1,2, … . , π‘˜π‘˜)
MinSupp YES

Calculate for each ia YES 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 = |𝑇𝑇| = π‘˜π‘˜
Support &

Iteration I=1:-
If ia Support < If ia Confidence Show Results to
We make intersection between ie and id where e β‰  d such
Save MinSupp < MinCon User that

𝑖𝑖𝑓𝑓 ∩ 𝑖𝑖𝑑𝑑 = (𝑗𝑗1 , 𝑗𝑗2 , … , π‘—π‘—π‘˜π‘˜ )𝑒𝑒 ∩ (𝑗𝑗1 , 𝑗𝑗2 , … , π‘—π‘—π‘˜π‘˜ )𝑑𝑑 = (𝑗𝑗1 , 𝑗𝑗2 , … , 𝑗𝑗𝑏𝑏 )
Figure 4.2 GCM flowchart
Where, 𝑗𝑗1 , 𝑗𝑗2 , … , 𝑗𝑗𝑏𝑏 ∈ 1,2,3, … , 𝑔𝑔 π‘Žπ‘Žπ‘Žπ‘Žπ‘Žπ‘Ž 𝑏𝑏 ≀ 𝑒𝑒, 𝑏𝑏 ≀ 𝑑𝑑
Support value should be calculated first for each itemset in
the current iteration, and only the itemsets that are bigger
than the threshold value minSupp. The second step is to
calculate the confidence by using equation 1. this step will be
done for each itemset in the current iteration, this 𝑖𝑖𝑒𝑒𝑒𝑒 = 𝑖𝑖𝑒𝑒 ∩ 𝑖𝑖𝑑𝑑
confidences value will be compared with the second
threshold value minCon to determine whether the current 𝑆𝑆 = 𝑖𝑖𝑒𝑒𝑒𝑒
itemset will be used in the second iteration or not. However;
the main idea of Apriori is to determine if there is a Where, 𝑒𝑒 = 1, … 𝑛𝑛, π‘Žπ‘Žπ‘Žπ‘Žπ‘Žπ‘Ž 𝑑𝑑 = 1, … , 𝑛𝑛
relationship between the alerts which will be distinguished
by the confidence value. 𝑒𝑒 β‰  𝑑𝑑
Apriori works as illustrated in figure 4.3: 𝑇𝑇 = 𝑖𝑖𝑒𝑒 ∩ 𝑖𝑖𝑑𝑑
(1) Read the aggregated alert
(2) Get two Items as a set of the First Item 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 = |𝑇𝑇| = 𝑏𝑏
and the value of the redundant of that

ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No. 7, October 2010 𝐼𝐼𝐼𝐼

𝑏𝑏 < π‘šπ‘šπ‘šπ‘šπ‘šπ‘šπ‘šπ‘šπ‘šπ‘šπ‘šπ‘šπ‘šπ‘š then eliminate ied then the average of all confidence for that itemset will be the
confidence for it.
Iteration I=2 :- To understand the mathematical representation, check the
We make intersection between Three S ie , id & ih following Example:
Let the sample of the first item and the second item took 𝑖𝑖𝑓𝑓
∩ 𝑖𝑖𝑑𝑑 ∩ π‘–π‘–β„Ž = (𝑖𝑖𝑓𝑓 ∩ 𝑖𝑖𝑑𝑑 )𝑒𝑒 ∩ π‘–π‘–β„Ž = (𝑗𝑗1 , 𝑗𝑗2 , … , 𝑗𝑗𝑏𝑏 ) ∩ from the table 4.2, minSupp = 2, minCon = 80%.
(𝑗𝑗1 , 𝑗𝑗, … , π‘—π‘—π‘˜π‘˜ ) = (𝑗𝑗1 , 𝑗𝑗2 , … , 𝑗𝑗𝑓𝑓 ) TABLE 4.2 EXAMPLE SET
First Item Second Item
Where, 𝑗𝑗1 , 𝑗𝑗2 , … , 𝑗𝑗𝑓𝑓 ∈ 1,2,3, … , 𝑔𝑔 π‘Žπ‘Žπ‘Žπ‘Žπ‘Žπ‘Ž 𝑓𝑓 ≀ 𝑏𝑏, 𝑓𝑓 ≀ β„Ž 1 1
2 1 𝑆𝑆
= 𝑖𝑖𝑒𝑒𝑒𝑒 β„Ž 5 2
2 3
Where, 𝑒𝑒 = 1, … , 𝑛𝑛 π‘Žπ‘Žπ‘Žπ‘Žπ‘Žπ‘Ž 𝑑𝑑 = 1, … , 𝑛𝑛 π‘Žπ‘Žπ‘Žπ‘Žπ‘Žπ‘Ž β„Ž = 1, … , 𝑛𝑛 3 1
4 2 𝑒𝑒
β‰  𝑑𝑑 β‰  β„Ž 1 2
2 3
T = ie ∩ id ∩ ih 3 2
= |𝑇𝑇| = 𝑓𝑓
So First item F = {1, 2, 5, 2, 3, 4, 1, 2, 3, 5}, and Second 𝐼𝐼𝐼𝐼
𝑏𝑏 < π‘šπ‘šπ‘šπ‘šπ‘šπ‘šπ‘šπ‘šπ‘šπ‘šπ‘šπ‘šπ‘šπ‘š then eliminate iedh Item S = {1, 2, 3}
Iteration I = c :- (General Form) F0 = {1, 2, 3, 4, 5} and S0 = {{1, 2}, {1, 2, 3}, {1, 2}, {2},
We make intersection between each itemset in c S= ia1 , {2}} (No redundancy in second Item)
ia2,…, iac 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝐹𝐹0 = {2, 3, 2, 1, 1} (Items (4, 5) will eliminated <
ia1 ∩ ia2 … . .∩ iac = οΏ½ iar = (j1 , j2 , … . . , jz )
r=1 I=1
F1 = {(1, 2), (1, 3), (2, 3)} and S1 = {{1, 2}, {1, 2}, {2}}
Where, 𝑗𝑗1 , 𝑗𝑗2 , … , 𝑗𝑗𝑧𝑧 ∈ 1,2,3, … , 𝑔𝑔 π‘Žπ‘Žπ‘Žπ‘Žπ‘Žπ‘Ž 𝑧𝑧 ≀ 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝐹𝐹1 = {2, 2, 1} (Item (2, 3) will eliminated <
from all order in S minSupp)
c Confidence 𝐹𝐹(1,2) = 𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 οΏ½ βˆ— 100%
r=1 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆(1,2) 100 + 67
+ βˆ— 100%οΏ½ =
= |𝑇𝑇| = 𝑧𝑧 𝐼𝐼𝐼𝐼 𝑏𝑏 < π‘šπ‘šπ‘šπ‘šπ‘šπ‘šπ‘šπ‘šπ‘šπ‘šπ‘šπ‘šπ‘šπ‘š then eliminate = 83%
οΏ½ iar 2 2
Confidence 𝐹𝐹(1,3) = 𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 οΏ½ βˆ— 100% + βˆ— 100%οΏ½
r=1 2 2
= 100%
1 1
z Confidence 𝐹𝐹(2,3) = 𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 οΏ½ βˆ— 100% + βˆ— 100%οΏ½ =
3 2
Confidence of S = 33+50
Support β‹‚cr=1 iar = 42% (Item will be eliminated)
F2 = {(1, 2, 3)} and S1 = {{1, 2}}
The denumerator 2 2
Confidence 𝐹𝐹(1,2) = 𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 οΏ½ βˆ— 100% + βˆ— 100% + 0
(𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 οΏ½ π‘–π‘–π‘Žπ‘Žπ‘Žπ‘Ž ) 2 2
βˆ— 100% = 100%
representing the Support of all components in
οΏ½ π‘–π‘–π‘Žπ‘Žπ‘Žπ‘Ž From the above example it is Obvious that: First; the
π‘Ÿπ‘Ÿ=1 stopping rule of the iterations when there are no items to
compare with. Second; the itemsets (1,2), (1,3), (1, 2, 3)
The confidence should be calculated for each itemset, and

ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No. 7, October 2010

have relationships by their percentage of {83%, 100%, [5] F. Cuppens and A. Miege, "Alert correlation in a cooperative
intrusion detection framework," in IEEE Symposium on Security
100%}. Third; the items (4) and (5) are out of range.
and Privacy, Berkeley, California, USA, 2002, pp. 202-215.
[6] V. Kumar, J. Srivastava, A. Lazarevic, W. Lee, and X. Qin,
V. IMPLEMENTATION ISSUES "Statistical Causality Analysis of Infosec Alert Data," in
Managing Cyber Threats. vol. 5: Springer US, 2005, pp. 101-
Group correlation Method (GCM) can be used as standalone 127.
system to read the aggregated IDS alerts, moreover; GCM [7] Homam El-Taj, Omar Abouabdalla, Ahmed Manasrah, Ahmed
can work only with complete alerts with no redundancy to Al-Madi, Muhammad Imran Sarwar, and S. Ramadass,
correlate them easing the analyst job. GCM has two main "Forthcoming Aggregating Intrusion Detection System Alerts
Framework," in The Fourth International Conference on
inputs: the user to choose his threshold values (minSupp and Emerging Security Information, Systems and Technologies
minCon) and the aggregated IDS alerts to be correlated. (SECURWARE 2010 ), Venice/Mestre, Italy 2010.
Figure 4.2 shows the GCM flowchart. GCM will start [8] W. Kosters and W. Pijls, "Apriori, a depth first
processing the IDS alerts with no need of filtering the alerts implementation," in Frequent Itemset Mining Implementations
Repository (FIMI03), 2003.
or remove the redundancy. Finally the result will be shown
and save in the database based on user request. The process AUTHORS PROFILE
of dropping the insufficient alerts means that these alerts Homam El-Taj Is a research officer and fellowship holder in National
have no relationships with other alerts. Advanced IPv6 Centre of excellence (NAv6) at Universiti Sains Malaysia
(USM), He hold his Bachelor in Computer Science From Philadelphia
VI. DISCUSSION University Amman Jordan 2003, and a Master degree in computer science
from (USM) in the area of Distributed Computing 2006, His master
This paper presented the GCM method for correlating the research was on Message Digest Based on Elliptic Curve Concept (MDEC),
aggregated alerts from TAF. The advantages of the proposed Currently he is a PhD Candidate in NAv6 at USM, His PhD research area
method are the improvement of the alert correlation process, in the field of Network Security, He has published several research articles
in Journals and Proceedings.
especially when it is related to accurate irredundant alerts
only, and reducing the time for correlating the alerts. The
Dr. Omar Amer Abouabdalla obtained his PhD degree in Computer
main objective is to minimize the amount of alerts by Sciences from University Science Malaysia (USM) in the year 2004.
investigating the relationships between the alerts and alerts’ Presently he is working as a senior lecturer and domain head in the National
features which will lead to minimizing the false positive Advanced IPv6 Centre - USM. He has published more than 50 research
articles in Journals and Proceedings (International and National). His
form the IDS alerts.
current areas of research interest include Multimedia Network, Internet
This method intends to become a general guide that can be Protocol version 6 (IPv6), and Network Security.
implemented and extended to full Forensic investigation
system. Other benefits of the proposed methods are: Firstly, Dr. Ahmed M. Manasrah is a senior lecturer and the Head of iNetmon
to discover the attacks’ behaviors. Secondly, finding novel project as well as the research and innovation of the National Advanced
attacks. Thirdly, this method will save the time of analyzing IPv6 Centre of Excellence (NAV6) in Universiti Sains Malaysia. He is also
the IMPACT Research Domain Head for Botnet and threat assessment
the alerts. Finally, using this method will give us relational Research. Dr. Ahmed obtained his Bachelor of Computer Science from
accurate alerts with no false alerts. Modifying the value of Mutah University, al Karak, Jordan in 2002. He obtained his Master of
the two thresholds will control the amount of correlated Computer Science and doctorate from Universiti Sains Malaysia in 2005
alerts. and 2009 respectively. Dr. Ahmed is heavily involved in researches carried
by NAv6 centre, such as Network monitoring and Network Security
monitoring with filing 3 Patents in Malaysia.
This research was supported by the National Advanced IPv6 Mohammed Anbar is a research officer in National Advanced IPv6 Centre
Center of Excellence (NAv6) in Universiti Sains Malaysia of Excellence (NAv6) at Universisti Sains Malaysia. His main research area
(USM). is Network Security and Malware Protection. Anbar has achieved his
Masters in information technology from university Utara Malysia (UUM)
in 2009. Currently, he is a PhD candidate in NAv6.
Ahmed Azmi Almadi is a research officer in National Advanced IPv6
[1] W. Fan, M. Miller, S. Stolfo, W. Lee, and P. Chan, "Using Centre of Excellence (NAv6) at Universisti Sains Malaysia. His main
artificial anomalies to detect unknown and known network research area is Network Security and Malware Protection. Almadi has
intrusions," Knowledge and Information Systems, vol. 6, pp. obtained his Masters in Computer Science from USM in 2007. Currently,
507-527, 2004. he is a PhD candidate and fellowship holder in NAv6. His PhD research
[2] M. Sheikhan and Z. Jadidi, "Misuse Detection Using Hybrid of focuses on Botnet Detection.
Association Rule Mining and Connectionist Modeling," World
Applied Sciences I, vol. 7, pp. 31-37, 2009.
[3] A. Valdes and K. Skinner, "Probabilistic alert correlation," in
the Fourth International Symposium on Recent Advances in
Intrusion Detection, 2001, pp. 54–68.
[4] H. Debar and A. Wespi, "Aggregation and correlation of
intrusion-detection alerts," in 4th International Symposium on
Recent Advance in Intrusion Detection(RAID) 2001, 2001, pp.

ISSN 1947-5500