Professional Documents
Culture Documents
Sistema de Deteccion de Itrusos PDF
Sistema de Deteccion de Itrusos PDF
A T HESIS
S UBMITTED F OR THE D EGREE OF
D OCTOR OF P HILOSOPHY
IN THE FACULTY OF E NGINEERING
by
Ciza Thomas
c
Ciza
Thomas
April 2009
All rights reserved
Acknowledgements
The endless thanks goes to Lord Almighty for all the blessings he has showered onto me, which has enabled me to write this last note in my research work.
During the period of my research, as in the rest of my life, I have been blessed
by Almighty with some extraordinary people who have spun a web of support
around me. Words can never be enough in expressing how grateful I am to those
incredible people in my life who made this thesis possible. I would like an attempt to Thank them for making my time during my research in the Institute a
period I will treasure.
I am deeply indebted to my research supervisor, Professor N. Balakrishnan for
presenting me such an interesting thesis topic. Each meeting with him added
invaluable aspects to the implementation and broadened my perspective. He has
guided me with his invaluable suggestions, lightened up the way in my darkest
times and encouraged me a lot in the academic life. From him I have learned to
think critically, to select problems, to solve them and to present their solutions.
I would like to thank him for furthering my education in many subjects like
probability theory, network security, pattern recognition and machine learning.
He has given me the best training within the country and even abroad by sending me to Cylab and CERT of CMU, Pittsburg, US. It was more than I had ever
hoped for in my research life. His drive for scientific excellence has pushed me
to aspire for the same (but could never achieve it). It was a great pleasure for
me to have a chance of working with him. He was the best choice I could have
made for an advisor. Sometime we are just (or incredibly) lucky!!!
I consider it a great honor to have been part of the MMSL lab, SERC department of the Indian Institute of Science and I salute the efforts of my Professor
for his support to the nation in many many ways. I would also like to mention
ii
iii
my deep gratitude towards Prof. R. Govindarajan, Chairman, SERC for all the
support provided to me while I was a student in the department.
I will be failing in my duty if I dont acknowledge some of my friends in the
campus with whom I have shared my research experiences since it was a joy
and enlightenment to me. I am fortunate to have a friend like Sharmili Roy
who has opened her heart and her problems to me in turn motivating me many
a times with her extraordinary brilliance and analytical perceptions. Ms. Neeta
Trivedi has helped me through the totally alien landscape of writing any document correctly. Suneesh S.S. has been a caring friend who has helped me at
times of troubles in the campus.
I would like to address special thanks to the unknown reviewers of my thesis, for accepting to read and review this thesis. I wish to thank the authors,
developers and maintainers of the open source used in this work. I would like to
appreciate all the researchers whose works I have used, initially in understanding my field of research and later for updates. I would like to thank the many
people who have taught me starting with my school teachers, my undergraduate
teachers, and my graduate teachers and examiners especially Prof. Joy Kuri,
Prof. Vital Rao, Prof. Veni Madhavan, Prof. Anurag Kumar, and Prof. Mathew
Jacob.
It is with sincere gratitude that I wish to thank Prof. K.R. Ramakrishnan, Prof.S.
M. Rao, Prof. S.K. Sen, and Prof. K. Gopakumar for the caring they have provided. I consider it a great privilege to have associated with some great Professors in my field of research namely Prof. Raj Reddy, Dr. B.V Dasarathy, Prof.
Dorothy Denning, Prof. P.K.Chan, Prof. Mathew Mahoney, CERT cylab group,
Dr.Athithan, and Mr. Philip and I appreciate the help rendered by them. Prof.
Jyothi Balakrishnan, Mr. Murali, and Ms. Reshmi need a special mention in
this acknowledgement for being particularly supportive during times of need. I
feel obliged to say thank you one more time. I also like to express my gratitude
to Dr. Latha Christy, whose thoughtful advise when I was away from my kids
during the days of my research, served to give me a sense of direction during
my PhD studies. I wish to thank Mr. Vishwas Sharma, Dr. G. Ravindra, Dr.J.
iv
Sujatha, Ms. K. Nagarthna, Ms. Swarna, Mr. Ravikumar, Mr. Sasikumar, Mr.
Sekhar, SERC security staff and a few others who have in some way or the other
helped me at various stages during my research life. And then, there are all the
other people who are not mentioned here but has helped in making IISc a very
special place over all these years. I would also like to thank my employers,
the Director of Technical Education, Govt. of Kerala, Principal, and Head of
the Department for the support and encouragement extended to me during my
period of research in the Institute.
I would express my deep sense of gratitude to the affection and support shown
to me by my parents-inlaw. My father-inlaw could not see me reach this stage of
my research and I acknowledge him in front of his memory. I take this opportunity to dedicate this work to my parents who have made me what I am, husband
and children who have given consistent support throughout my research, and
my guide who had a vision in my research work. I learnt to aspire to a career
in research from my parents in my childhood, and later from my husband. My
parents have passed onto me a wonderful humanitarian lineage, whose value
cannot be measured by any worldly yardstick. The warmest of thanks to my
husband Dr. T. John Tharakan for his understanding and patience while I was
far away from home during the period of my research in the Institute. He has
supported me in each and every way, believed in me permanently and inspired
me in all dimensions of life. I am blessed with two wonderful kids Alka and
Alin, who knew only to encourage and never did complain about anything even
when they had to suffer a lot in my absence over these years. I owe everything
to them, without their everlasting love, this thesis would never be completed.
To you all, I dedicate this work.
All of you made it possible for me to reach this last stage of my endeavor.
Thank You from my heart-of-hearts.
Ciza Thomas
vi
Abstract
The technique of sensor fusion addresses the issues relating to the optimality
of decision-making in the multiple-sensor framework. The advances in sensor
fusion enable to perform intrusion detection for both rare and new attacks. This
thesis discusses this assertion in detail, and describes the theoretical and experimental work done to show its validity.
The attack-detector relationship is initially modeled and validated to understand
the detection scenario. The different metrics available for the evaluation of intrusion detection systems are also introduced. The usefulness of the data set
used for experimental evaluation has been demonstrated. The issues connected
with intrusion detection systems are analyzed and the need for incorporating
multiple detectors and their fusion is established in this work. Sensor fusion
provides advantages with respect to reliability and completeness, in addition to
intuitive and meaningful results. The goal for this work is to investigate how
to combine data from diverse intrusion detection systems in order to improve
the detection rate and reduce the false-alarm rate. The primary objective of the
proposed thesis work is to develop a theoretical and practical basis for enhancing the performance of intrusion detection systems using advances in sensor
fusion with easily available intrusion detection systems. This thesis introduces
the mathematical basis for sensor fusion in order to provide enough support for
the acceptability of sensor fusion in performance enhancement of intrusion detection systems. The thesis also shows the practical feasibility of performance
enhancement using advances in sensor fusion and discusses various sensor fusion algorithms, its characteristics and related design and implementation issues. We show that it is possible to build performance enhancement to intrusion
detection systems by setting proper threshold bounds and also by rule-based fusion. We introduce an architecture called the data-dependent decision fusion as
vii
viii
a framework for building intrusion detection systems using sensor fusion based
on data-dependency. Furthermore, we provide information about the types of
data, the data skewness problems and the most effective algorithm in detecting
different types of attacks. This thesis also proposes and incorporates a modified
evidence theory for the fusion unit, which performs very well for the intrusion
detection application. The future improvements in individual IDSs can also
be easily incorporated in this technique in order to obtain better detection capabilities. Experimental evaluation shows that the proposed methods have the
capability of detecting a significant percentage of rare and new attacks. The
improved performance of the IDS using the algorithms that has been developed
in this thesis, if deployed fully would contribute to an enormous reduction of
the successful attacks over a period of time. This has been demonstrated in the
thesis and is a right step towards making the cyber space safer.
Keywords
Intrusion Detection Systems, Sensor Fusion, Negative Binomial Distribution,
Chebyshev Inequality, Data-dependent Decision Fusion, Neural Network, Baserate Fallacy, Accuracy Paradox, Dempster-Shafer Evidence Theory, Contextdependent Operator
ix
Notation
Details
0
1
av
f usion
Frame of Discerment
Correlation coefficient
ji,k
i (s)
Bahadur-Lazarsfeld polynomial
At
At+1
Bel(A)
Belief of hypothesis A
Cj
Class labels
Detection rate
Di
Dt
xi
Notation
Details
Dt+1
E(s)
Expectation of s
F (X)
Fi
F Pi
Fj
Gx
Average occurrence of X
Likelihood function
N
N
Nt
Ne
Precision
P (s)
Pe
Probability of error
P l(A)
Plausibility of hypothesis A
P ()
P0
P1
Recall
Threshold
T Pi
V ar(s)
Variance of s
ai
Ambiguity
Detector efficiency
Error rate
ef (x)
Feature extractor
e1 , ...em
Clumping factor
Detector correlation
xii
Notation
Details
m(A)
Time in years
pi
p(A)
Probability of hypothesis A
{(xn , yn )}
Fusion output
sji
si
vr
Abbreviation
Details
AFRL
ALAD
ANN
ARPANET
AUC
BPA
CD
Context Dependent
CERT
CERT/CC
CRB
CTF
CV
Coefficient of Variation
DARPA
DCost
Damage Cost
DD
Data-dependent Decision
DDoS
DS
Dempster-Shafer
DOD
Department of Defense
DoS
Denial of Service
DRDoS
D-Tree
Decision Tree
xiii
Abbreviation
Details
EMERALD
FBI
FCM
F-score
FN
False Negative
FoD
Frame of Discernment
FP
False Positive
FTP
IC3
ICMP
IDS
IIDS
IP
Internet Protocol
KDD
LR
Likelihood Ratio
MAP
Maximum A Posteriori
MIT
NB
Naive Bayes
PHAD
P-test
A significance test
RBF
R2L
Remote to Local
RB
Rule based
ROC
RCost
Response Cost
SSH
Secure SHell
SVM
TBM
TCP
TN
True Negative
TP
True Positive
U2R
User to Root
UDP
Contents
Acknowledgements
ii
Abstract
vii
Keywords
ix
1 Introduction
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Intrusion Detection Systems: Background . . . . . . . .
1.2.1 Growth of the Internet . . . . . . . . . . . . . .
1.2.2 Growth of Internet attacks . . . . . . . . . . . .
1.2.3 Cyber crimes in India . . . . . . . . . . . . . . .
1.2.4 Financial risks in corporate networks . . . . . .
1.2.5 Need for Intrusion Detection Systems . . . . . .
1.2.6 Current status, challenges and limitations of IDS
1.2.7 Open issues . . . . . . . . . . . . . . . . . . . .
1.3 Motivation . . . . . . . . . . . . . . . . . . . . . . . . .
1.4 Problem statement . . . . . . . . . . . . . . . . . . . .
1.5 Major contributions of this thesis . . . . . . . . . . . . .
1.5.1 Theoretical formulation . . . . . . . . . . . . .
1.5.2 Experimental validation . . . . . . . . . . . . .
1.6 Research goal . . . . . . . . . . . . . . . . . . . . . . .
1.7 Organization of the thesis . . . . . . . . . . . . . . . . .
xiv
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
1
1
2
3
4
5
7
9
9
12
13
13
14
14
14
CONTENTS
xv
30
31
32
33
38
45
46
48
49
49
.
.
.
.
.
.
.
.
.
.
53
53
55
57
57
58
59
64
65
66
67
.
.
.
.
68
68
69
70
72
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
17
17
20
20
21
24
28
CONTENTS
xvi
. 73
. 73
. 73
.
.
.
.
.
.
76
78
86
88
93
103
.
.
.
.
.
104
104
105
109
109
113
4.3
4.4
4.5
4.6
4.7
6 Performance Enhancement of IDS using Rule-based Fusion and Datadependent Decision Fusion
114
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
6.2 Rule-based fusion . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.3 Data-dependent decision fusion . . . . . . . . . . . . . . . . . . 117
6.3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . 118
6.3.2 Data-dependent decision fusion architecture . . . . . . . 118
6.3.3 Detection of rarer attacks . . . . . . . . . . . . . . . . . 121
6.4 Results and discussion . . . . . . . . . . . . . . . . . . . . . . 122
6.4.1 Test setup . . . . . . . . . . . . . . . . . . . . . . . . . 122
6.4.2 Data set . . . . . . . . . . . . . . . . . . . . . . . . . . 123
6.4.3 Data-dependent decision fusion algorithm . . . . . . . . 123
6.4.4 Experimental evaluation . . . . . . . . . . . . . . . . . 126
6.4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . 131
6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
7 Modified Dempster-Shafer Theory for Intrusion Detection Systems 134
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
7.2 Dempster Shafer combination method . . . . . . . . . . . . . . 136
CONTENTS
xvii
7.2.1
7.3
7.4
7.5
7.6
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
137
138
141
142
145
149
151
153
154
.
.
.
.
.
.
.
.
.
.
.
.
156
156
157
157
158
159
161
162
164
165
166
167
167
9 Conclusions
169
9.1 Results and discussion . . . . . . . . . . . . . . . . . . . . . . 170
9.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
9.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
A Attacks on the Internet: A study
A.1 Introduction . . . . . . . . . . .
A.2 History of Internet attacks . . .
A.3 Attack motivation and objectives
A.4 Attack taxonomy . . . . . . . .
A.4.1 Viruses . . . . . . . . .
A.4.2 Worms . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
174
174
175
176
177
177
179
CONTENTS
A.4.3 Trojans . . . . . . . . . . . . .
A.4.4 Buffer overflows . . . . . . . .
A.4.5 Denial of Service attacks . . . .
A.4.6 Network-based attacks . . . . .
A.4.7 Password attacks . . . . . . . .
A.4.8 Information gathering attacks .
A.4.9 Blended attacks . . . . . . . . .
A.5 Top ten cyber security menaces for 2008
A.6 Conclusion . . . . . . . . . . . . . . .
xviii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
179
180
181
185
186
186
188
188
191
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
192
192
193
194
210
210
213
218
219
221
221
222
228
231
231
233
234
236
236
De.
.
.
.
.
.
.
.
.
.
237
237
239
242
243
245
CONTENTS
xix
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
249
249
250
250
251
251
251
252
252
252
253
254
255
256
List of Tables
2.1
2.2
2.3
2.4
.
.
.
.
.
.
.
.
22
26
29
29
3.1
3.2
3.3
3.4
3.5
.
.
.
.
.
56
60
60
61
61
5.1
5.2
5.3
5.4
5.5
5.6
5.7
6.1
6.2
6.3
xx
LIST OF TABLES
6.4
6.5
6.6
6.7
6.8
6.9
7.1
7.2
7.3
xxi
139
139
140
146
152
152
152
152
152
153
154
List of Figures
1.1
1.2
Growth of Internet in terms of the host count over the years [2] .
A typical security scenario in any network . . . . . . . . . . . .
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
2.10
2.11
2.12
2.13
2.14
2.15
4.1
5.1
5.2
5.3
5.4
5.5
6.1
6.2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2
6
25
26
34
34
36
36
36
36
38
44
44
44
45
45
48
106
111
112
112
112
LIST OF FIGURES
xxiii
6.3
6.4
6.5
7.1
8.1
8.2
over
. . .
. . .
. . .
. . .
the
. .
. .
. .
. .
.
.
.
.
176
183
184
185
Chapter 1
Introduction
Life was simpler before World War II. After that, we had systems.
Admiral Grace Hopper
1.1 Introduction
Security attacks through Internet have proliferated in recent years. Hence, information security is an issue of very serious global concern of the present time.
This chapter discusses the growth of the Internet and the tremendous security
threat posed by the increased complexity, accessibility and openness of the Internet. The importance of protecting the corporate network has been introduced.
The need for network security and in particular the need for Intrusion Detection
Systems (IDS) have been brought out. This chapter also includes the motivation
for the work reported in this thesis, scope and objective of the thesis, the major
contributions of this thesis and synopsis of all the chapters of this thesis.
Chapter 1
x 10
HOST COUNT
0
1994
1996
1998
2000
2002
YEAR
2004
2006
2008
Figure 1.1: Growth of Internet in terms of the host count over the years [2]
Chapter 1
the Internet have become both more prolific and easier to implement because
of the ubiquity of the Internet and the pervasiveness of easy-to-use operating
systems and development environments.
There are multiple penetration points for intrusions to take place in a network
system. For example, at the network level carefully crafted malicious IP packets can crash a victim host; at the host level, vulnerabilities in system software
can be exploited to yield an illegal root shell. The security threats have exploited all kinds of networks ranging from traditional computers to point-topoint and distributed networks. These security threats have also exploited the
vulnerable protocols and operating systems extending attacks to operating system on various kinds of applications, such as database and web servers. The
most popular operating systems regularly publish updates, but the combination
of poorly administered machines, uninformed users, a vast number of targets,
and ever-present software bugs has allowed exploits to remain ahead of patches.
Appendix A includes an involved study of the attacks on the Internet for further
reference.
1.2.3 Cyber crimes in India
The general and US focused trend of the cyber attacks have been highlighted in
the beginning of the section; but the trend in India is important to be looked into.
In the Indian scenario with e-commerce becoming popular in the last few years,
cyber crime is a term used to broadly describe criminal activity in which computers or computer networks are a tool, a target, or a place of criminal activity
and include everything from electronic cracking to denial of service attacks [4].
It is also used to include traditional crimes in which computers or networks are
used to enable the illicit activity. A key finding of the Economic Crime Survey 2006 was that a typical perpetrator of economic crime in India was male
(almost100%), a graduate or undergraduate and 31-50 years of age. Further,
one third of the frauds were from insiders and over 37% of them were in senior
managerial positions.
Trying to identify how widespread is the crime in India and the world over,
Chapter 1
experts feel that only a tiny proportion of cyber crime incidents are actually reported world over. In India, cyber crime cases registered are less compared to
the US, Europe, etc. The Internet Crime Complaint Center (IC3) 2006 ranks
the US (60.9%) as first among the nations in hosting perpetrators followed by
the UK (15.9%). Many countries, including India, have established Computer
Emergency Response Teams (CERTs) with an objective to coordinate and respond during major security incidents/events. These organizations identify and
address existing and potential threats and vulnerabilities in the system and coordinate with stakeholders to address these threats.
1.2.4 Financial risks in corporate networks
The threats on the Internet can translate to substantial losses resulting from
business disruption, loss of time and money, and damage to reputation. The
financial impact of application downtime and lost productivity caused by the
increasing number of application level vulnerabilities and frequency of attacks
is substantial.
According to the census conducted in US in 2007, volume of business-to-business
commerce has increased from US $38 billion in 1997 to US $990 billion in 2006
(an increase of 6.3%). The total E-commerce revenue of US $20 billion in 1999
has increased to US $990 billion in 2006. It was predicted that by 2008, online
retail sales will account for 10 percent of total US retail sales. There may have
been boom in the usage of Internet and online businesses; but one main issue
is security of online environment that is affecting both the users and businesses
alike.
According to US report, cyber crime robs US business about 67.2 billions a
year. Over the past two years, US consumers have lost US $8 billion to online
fraud schemes. The online fraudsters are not only cheating online business, but
they are also increasing the perception of fear among consumers. The 2005
annual computer crime and security survey [5], jointly conducted by the Computer Security Institute and the FBI, indicated that the financial losses incurred
by the respondent companies due to network attacks were US $130 million.
Chapter 1
In another survey commissioned by VanDyke Software in 2003, found that firewalls alone are not sufficient to provide adequate protection. Moreover, according to recent studies, an average of twenty to forty new vulnerabilities are
discovered every month in commonly used networking and computer products.
Such wide-spread vulnerabilities in software add to todays insecure computing/networking environment.
1.2.5 Need for Intrusion Detection Systems
Intrusions refer to the network attacks against vulnerable services, data-driven
attacks on applications, host-based attacks like privilege escalation, unauthorized logins and access to sensitive files, or malware like viruses, worms and
trojan horses. These actions attempt to compromise the integrity, confidentiality
or availability of a resource. Intrusions result in services being denied, system
failing to respond or data stolen or being lost. Intrusion detection means detecting unauthorized use of a system or attacks on a system or network. Intrusion
Detection Systems are implemented in software or hardware in order to detect
these activities.
An Intrusion Detection System (IDS) typically operates behind the firewall as
shown in Figure 1.2, looking for patterns in network traffic that might indicate
malicious activity. Thus, IDSs are used as the second and the final level of defense in any protected network against attacks that breach other defences. The
need for this second layer of protection is many times questioned like Do we
need an IDS once we have a firewall?. To briefly answer this question, it is
required to understand what a firewall does and does not do, and what an IDS
does and does not do. This will help in realizing the need for both IDS and
firewall to help in securing a network.
The existing network security solutions, including firewalls, were not designed
to handle network and application layer attacks such as Denial of Service and
Distributed Denial of Service attacks, worms, viruses, and Trojans. Along with
the drastic growth of the Internet, the high prevalence of the threats over the
Internet has been the reason for the security personnel to think of IDSs.
Chapter 1
The unauthorized activities on the Internet are not only by the external attackers
but also by internal sources, such as fraudulent employees or people abusing
their privileges for personal gain or revenge. These internal activities cannot
be prevented by a firewall which usually stops the external traffic from entering
the internal network. Firewalls are made to stop unnecessary network traffic
into or out of any network. Packet filtering firewalls typically will scan a packet
for layer 3 and layer 4 protocol information. There are not very much dynamic
defensive abilities to most firewalls. The traffic approaching the firewall either
matches up to applied rule and is allowed through or the traffic is stopped and
the firewall logs the blocked traffic. As a result, IDSs, as originally introduced
by Anderson [6] in 1980 and later formalized by Denning [7] in 1987, have received increasing attention in the recent years. The IDSs along with the firewall
form the fundamental technologies for network security.
IDSs can be categorized into two classes, anomaly based IDSs and misuse based
IDSs. (Appendix B provides a detailed survey on various IDSs.) Anomaly
based IDSs look for deviations from normal usage behavior to identify abnormal behavior. Misuse based, on the other hand, recognize patterns of attack.
Anomaly detection techniques rely on models of the normal behavior of a computer system. These models may focus on the users, the applications, or the
network. Behavior profiles are built by performing statistical analysis on historical data [8, 9], or by using rule based approaches to specify behavior patterns
Chapter 1
[10, 11, 12]. A basic assumption of anomaly detection is that attacks differ from
normal behavior in type and amount. By defining whats normal, any violation
can be identified, whether it is part of threat model or not. However, the advantage of detecting previously unknown attacks is paid for in terms of high
false-positive rates in anomaly detection systems. It is also difficult to train
an anomaly detection system in highly dynamic environments. The anomaly
detection systems are intrinsically complex and also there is some difficulty in
determining which specific event triggered the alarms.
On the other hand, misuse detection systems essentially contain attack descriptions or signatures and match them against the audit data stream, looking for
evidence of known attacks [5, 13]. The main advantage of misuse detection
systems is that they focus analysis on the audit data and typically produce few
false positives. The main disadvantage of misuse detection systems is that they
can detect only known attacks for which they have a defined signature. As
new attacks are discovered, developers must model and add them to the signature database. In addition, signature-based IDSs are more vulnerable to attacks
aimed at triggering a high volume of detection alerts by injecting traffic that has
been specifically crafted to match the signatures used in the analysis process.
This type of attack can be used to exhaust the resources on the IDS computing
platform and to hide attacks within the large number of alerts produced.
In contrast to firewalls, a misuse based IDS will scan all packets at layers 3
and 4 as well as the application level protocols looking for back door Trojans,
Denial of Service attacks, worms, buffer overflow attacks, detect scans against
the network etc. An IDS provides much greater visibility to detect signs of
attacks and compromised hosts. There is still the need for a firewall to block
traffic before it enters the network; but, an IDS is also needed to make sure that
the traffic that gets past the firewall will be monitored.
1.2.6 Current status, challenges and limitations of IDS
Current cyber security capabilities have evolved largely as trivial patches and
add-ons to the Internet, which were designed on the principles of open communication and implicit mutual trust. It is now recognized that it is no longer
Chapter 1
sufficient to follow such evolutionary paths and that security must be considered as a sophisticated research and design part of the information infrastructure. With all the progress that IDS have made over the last few years, it still has
some major challenges. The analysis is always slow and often computationally
intensive. Hence, intrusion detection programs used to detect intrusions after
the intrusions have occurred. There is still a little hope to catch an attack in
progress.
The attackers continue to find ingenious ways to compromise remote hosts and
frequently make their tools publicly available. Also, the increasing size and
complexity of the Internet along with the end host operating systems, make it
more prone to vulnerabilities. Additionally, there is only very little broad understanding of the intrusion activity due to many privacy issues. Because of these
challenges, current best practices for the Internet security rely on the reports of
new intrusions and security holes from organizations like CERT. Another wellknown fact is that false positives are one of the biggest problems when working
with IDS. Also, a large number of false alerts mean a lot in terms of acceptability of IDSs if the incidence of attacks is considerably less in comparison to the
normal traffic.
It is very difficult to integrate router logs, system logs, firewall logs, and host
based IDS alerts, with alerts from a network based IDS. The last main challenge
is the need for skilled IDS analysts. In order to monitor and evaluate the alerts
forces, the analyst has to stay on top of all the newest attacks, worms, virus, different operating systems and network changes on the internal network to keep
rule list accurate.
In the last two decades, a range of commercial and public domain intrusion
detection systems have been developed. These systems use various approaches
to detect intrusions. As a result, they show distinct preferences in detecting certain classes of attacks with improved accuracy while performing moderately for
the other classes. The analysis of these IDSs have given us some insight into
the problems that still have to be solved before we can have intrusion detection
systems that are useful and reliable for detecting a wide range of intrusions.
Chapter 1
1.3 Motivation
The motivation behind the present work was the realization that with the increasing traffic and increasing complexity of attacks, none of the present day
stand-alone intrusion detection systems can meet the high demands for a very
high detection rate and an extremely low false alarm rate. Also, most of the
IDSs available in literature show distinct preference for detecting a certain class
of attack with improved accuracy while performing moderately for the other
classes of attacks. The inability of single IDSs for an acceptable attack detection is discussed in chapters 2 and 3. In view of the enormous computing power
available with the present day processors, combining multiple IDSs to obtain
best-of-breed solutions has been attempted earlier. The following works have
also motivated us to choose this area of research.
Lee et al. comment in their work [14] that analyzing the data from multiple
sensors should increase the accuracy of the IDS. Kumar [15] observes that correlation of information from different sources has allowed additional information to be inferred that may be difficult to obtain directly. Such correlation is
Chapter 1
10
also useful in assessing the severity of other threats, be it severe like an attacker
making a concerted effort to break in to a particular host, or severe like the
source of the activity is a worm with the potential to infect a large number of
hosts in a short span of time.
Lane in his work [16] comments that it is well known in the machine learning literature that appropriate combination of a number of weak classifiers can
yield a highly accurate global classifier. Likewise, Neri [17] notes the belief
that combining classifiers learned by different learning methods, such as hillclimbing and genetic evolution, can produce higher classification performances
because of the different knowledge captured by complementary search methods.
The use of numerous data mining methods is commonly known as an ensemble
approach, and the process of learning the correlation between these ensemble
techniques is known by names such as multistrategy learning, or meta-learning.
Chan and Stolfo [18] note that the use of meta-learning techniques can be easily parallelized for efficiency. Additional efficiencies can be gained by pruning
less accurate classifiers [19]. Carbone [20] notes that these multistrategy learning techniques have been growing in popularity due to the varying performance
of different data mining techniques. She describes multistrategy learning as a
high level controller choosing which outputs to accept from lower level learners
given the data, what lower level learners are employed, and what the current
goals are.
The generalizations made concerning ensemble techniques are particularly apt
in intrusion detection. As Axelsson [21] notes, In reality there are many different types of intrusions, and different detectors are needed to detect them.
As such, the same argument that Lee et al. make in their work [22, 23] for
the use of multiple sensors applies to the use of multiple methods as well: if
one method or technique fails to detect an attack, then another should detect
it. They note in the work of Lee and Stolfo [24] that combining evidence from
multiple base classifiers is likely to improve the effectiveness in detecting intrusions. They went on to find that by combining signature and anomaly detection
models, one can improve the overall detection rate of the system without compromising the benefits of either detection method [14] and that a well designed
Chapter 1
11
/ updated misuse detection module should be used to detect the majority of the
attacks, and anomaly detection is the only hope to fight against the innovative
and stealthy attacks [43]. Mahoney and Chan [25] suggest that, because their
IDSs use technique that has significant non-overlap with other IDSs, combining
their technique with others should increase detection coverage. In performing a
manual post hoc analysis of the results of the 1998 DARPA Intrusion Detection
Challenge, the challenge coordinators found that the best combination of 1998
evaluation systems provides more than two orders of magnitude of reduction in
false alarm rate with greatly improved detection accuracy [27].
Despite the positive accolades and research results that sensor fusion approaches
for intrusion detection have received, the only IDS that has been specifically designed for the use of multiple detection methods is EMERALD [30], a research
IDS. The lack of published research into applying multiple and heterogeneous
IDSs seems to be a significant oversight in the intrusion detection community.
We believe that fusion based IDSs are the only foreseeable way of achieving
high accuracy with an acceptably low false positive rate. In spite of all the earlier works in enhancing the performance of IDSs, the overall performance of the
IDS leaves room for improvement. Multi-sensor fusion meets the requirements
of a better than the best detection by a refinement of the combined response of
different IDSs with largely varying accuracy. The motivation for applying sensor fusion in enhancing the performance of intrusion detection systems is that a
better analysis of existing data gathered by various individual IDSs can detect
many attacks that currently go undetected.
Through out this thesis we use the term sensor to denote a component that monitors the network traffic or the audit logs for indications of suspicious activity
in a network or on a system, according to a detection algorithm and produces
alerts as a result. A simple IDS, in most cases constitutes a single sensor. We
therefore use the terms sensor and IDS interchangeably in this thesis.
Chapter 1
12
Chapter 1
13
Can we effectively detect network intrusions by applying advances in sensor fusion techniques to intrusion detection?
Is it possible to combine a number of currently known intrusion detection
systems with a simple fusion technique based on the data-dependency and
performance of the individual IDS?
Can a single computational model be used to represent and monitor exploitations in all the attack categories using advanced sensor fusion techniques?
The strategy for answering these questions is to engage in a study of the literature concerned with similar studies, and then to proceed with a theoretical and
empirical analysis.
Chapter 1
14
Chapter 1
15
attacks, the importance for protecting the corporate network and the need of network security and in particular the need of intrusion detection systems. Chapter
2 discusses in detail the data skewness in the monitoring traffic as well as other
issues in intrusion detection. This chapter also models the attack-detection scenario for IDSs. The model includes attack carrying capacity, performance improvement of detectors, and detector correlation assuming non-random interaction between attacks and detectors. Chapter 3 discusses the evaluation and
testbed of IDSs. This chapter includes a detailed discussion on the DARPA data
set and its usefulness in the IDS evaluation. The modifications of the available
IDSs used in this work has been attempted. This chapter also brings out the
inability of single IDSs to make a complete detection coverage of the attack
domain. Chapter 4 provides a survey of sensor fusion after identifying the issues and limitations of single IDSs in chapters 2 and 3 respectively. The related
work in sensor fusion and in particular the related work using sensor fusion in
intrusion detection application are discussed. The mathematical basis as well
as the theoretical analysis of sensor fusion in IDS are also incorporated in this
chapter. Chapter 5 includes the selection of intrusion detection system threshold
bounds, which is an important parameter in sensor fusion. The bounds are deduced by means of Chebyshev inequality using the detection rate and the false
positive rate, for effective sensor fusion. The theoretical proof is supplemented
with empirical evaluation and the results have been compared.
Chapter 6 discusses the performance enhancement of intrusion detection using rule based fusion. A new data-dependent decision fusion architecture is
also proposed in this chapter. The experimental evaluation given in this chapter uses the data-dependent decision fusion architecture by specifically looking
at the data skewness in the monitoring traffic. Chapter 7 presents a new modified evidence theory, which is an extension and improvement of the classical
Dempster-Shafer theory. The context dependent operator proposed in this chapter was demonstrated to be feasible for sensor fusion. Chapter 8 provides theoretical models for the intrusion detection systems, the neural network learner
and the sensor fusion system. The chapter also includes a discussion on the
effect of threshold on detection and the threshold optimization. Chapter 9 provides the results and discussion on the main findings of this investigation. The
Chapter 1
16
Chapter 2
Issues Connected with Single IDSs and the
Attack-Detection Scenario
Problems worthy to attack prove their worth by fighting back.
Paul Erdos
2.1 Introduction
The probability of intrusion detection in a corporate environment protected by
an IDS is low because of various issues. The network IDSs have to operate
on encrypted traffic packets where analysis of the packets is complicated. The
high false alarm rate is generally cited as the main drawback of IDSs. For IDSs
that use machine learning technique for attack detection, the entire scope of
the behavior of an information system may not be covered during the learning
phase. Additionally, the behavior can change over time, introducing the need
for periodic online retraining of the behavior profile. The information system
can undergo attacks at the same time the intrusion detection system is learning the behavior. As a result, the behavior profile contains intrusive behavior,
which is not detected as anomalous. In the case of signature-based IDSs, one
of the biggest problems is maintaining state information for signatures in which
the intrusive activity encompasses multiple discrete events (i.e., the complete
attack signature occurs in multiple packets on the network). Another drawback
is that the misuse detection system must have a signature defined for all possible attacks that an attacker may launch against the network. This leads to the
necessity for frequent signature updates to keep the signature database of the
17
Chapter 2
18
Chapter 2
19
more damage than the majority attack types. Thus high accuracy is not necessarily an indicator of high model quality, and therein lies the accuracy paradox
of predictive analytics. This chapter gives supporting fact to the need of giving
a higher weighting to the minority attack types namely Remote to Local (R2L)
and User to Root (U2R), compared to the majority attack types like probe and
Denial of Service (DoS) [27]. Also, the cost of missing an attack is higher than
the cost of false positives.
The class distribution affects the learning of the IDSs. The problem of designing IDSs to work effectively and yield higher accuracies for minor attacks even
in the mix of data skewness has been receiving serious attention in recent times.
In most of the available literature [31, 32, 33], this is overcome by resampling
the training distribution. The resampling is done either by oversampling of the
minority class or by undersampling of the majority class. The other commonly
used approach for overcoming data imbalance is through cost-sensitive learning
[34, 35], the two-phase rule induction method [36], and rule based classification
algorithms like RIPPER [37] and C4.5 rules [38]. In spite of all such attempts,
the performance of the IDSs in detecting minority and rarer attacks leaves room
for improvement. It is necessary for a detection system to perform much better
than those reported so far for the minority attacks while preserving the performance for the majority attacks.
This chapter also models the attack-detection scenario for intrusion detection
systems. The model includes the attack carrying capacity, the detectors performance improvement from the attacks that it has detected, and the detector
correlation assuming non-random interaction between the attack and the detector. This modeling shows that as the intrusion detection performance improves
with time, the slope of F-score is positive and becomes steeper, which causes
the effect of attack to disappear. However, it is seen from the study of the IDSs
developed over the last ten years that it is not possible to get that type of a
growth rate with a single system so that the effect of attacks is not felt in the
information systems. This establishes the need for enhancing the performance
of IDSs.
Chapter 2
20
Chapter 2
21
misuse. The network-based IDS captures the network traffic and analyzes the
packet or connection for attack traffic. The host-based IDS monitors the activities of the host on which it is installed.
The network traffic is made up of attack or anomalous traffic, and the normal
traffic. The real-world traffic is predominantly made up of normal traffic rather
than attack traffic. The fact that the normal traffic abound is supported by analyzing an University network traffic using the snort packet logger. Also, a study
of the characteristics of the network traffic during phases dominated by DDoS
attacks and worm propagation has been done. One of the statistics on the traffic
volume generated by DDoS attack is given in the work of Lan et al.[39]. One
typical situation had 28 attackers and generated 11 Mega packets of 8.6 Gb of
attack traffic in 192 seconds directed at a single host. The magnitude of the attack traffic is about three times more than the normal traffic during this duration
of time. Just after a duration of 192 seconds, the attack activity has come down
to normal. With an exception of such very small durations and that too very
rarely, the usual rate of attack traffic is extremely low. In addition, Bay [40]
mentions anomalous activity to be extremely rare and unusual. Fawcett [41]
has commented that positive activity is inherently rare.
2.3.1 Classification of attacks
A general classification of various attacks found in the network traffic is introduced in Appendix A. The thesis work of Kendall [27] provides in detail
an attack taxonomy with respect to the DARPA Intrusion Detection Evaluation data set [42]. The same is discussed here in brief. The various attacks
found in the DARPA 1999 data set are given in the Table 2.1. The probe or
scan attacks automatically scan a network of computers or a DNS server to find
valid IP addresses (ipsweep, lsdomain, mscan), active ports (portsweep, mscan),
host operating system types (queso, mscan) and known vulnerabilities (satan).
The DoS attacks are designed to disrupt a host or network service. These include crashing the Solaris operating system (selfping), actively terminate all
TCP connections to a specific host (tcpreset), corrupt ARP cache entries for a
victim not in others caches (arppoison), crash the Microsoft Windows NT web
server (crashiis) and crash Windows NT (dosnuke). In R2L attacks, an attacker
Chapter 2
22
Table 2.1: Details of attack types present in DARPA 1999 data set [27]
Attack type
Solaris
SunOS
WinNT
Linux
All
Probe
portsweep
portsweep
ntinfoscan
lsdomain, mscan illegal-sniffer
queso
queso
portsweep
queso, portsweep
ipsweep
satan
portsweep
DoS
neptune pod
arppoison
arppoison
apache2, back
processtable
land, pod
crashiis
arppoison
smurf
neptune
dosnuke
mailbomb
syslogd
mailbomb
smurf
neptune, pod
tcpreset
processtable
tcpreset
processtable
warezclient
smurf, tcpreset
teardrop, udpstorm
R2L
dict, ftpwrite
dict
dict
dict, imap, named
snmpget
guest
xsnoop
framespoof
ncftp, phf
httptunnel
netbus
sendmail
xlock
netcat
sshtrojan
xsnoop
ppmacro
xlock, xsnoop
U2R
eject, ps
loadmodule
casesen
perl, xterm
fdformat
ntfsdos
sqlattack
ffbconfig
nukepw
sechole, yaga
Data
secret
ntfsdos
secret sqlattack
ppmacro
who does not have an account on a victim machine gains local access to the
machine (e.g., guest, dict), exfiltrates files from the machine (e.g., ppmacro) or
modifies data in transit to the machine (e.g., framespoof). New R2L attacks
include an NT PowerPoint macro attack (ppmacro), a man-in-the middle web
browser attack (framespoof), an NT trojan-installed remote-administration tool
(netbus), a Linux trojan SSH server (sshtrojan) and a version of a Linux FTP
file access-utility with a bug that allows remote commands to run on a local
machine (ncftp).
In U2R attacks, a local user on a machine is able to obtain privileges normally
reserved for the UNIX super user or the Windows NT administrator. The Data
attack is to exfiltrate special files, which the security policy specifies should
remain on the victim hosts. These include secret attacks, where the user with
permission to access the special files exfiltrates them via common applications
such as mail or FTP, and other attacks where privilege to access the special files
is obtained using an U2R attack (ntfsdos, sqlattack). An attack could be labeled
Chapter 2
23
as both U2R and Data if one of the U2R attacks was used to obtain access to the
special files. The Data category thus specifies the goal of an attack rather than
the attack mechanism [27].
Attack behavior [43]
An analysis of the various attack types within the network traffic has resulted in
certain inferences that are listed below:
Probing attacks are expected to show limited variance as it involve making connections to a large number of hosts or ports in a given time frame. Likewise,
the outcome of all U2R attacks is that a root shell is obtained without legitimate means, e.g., login as root, su to root, etc. Thus, for these two categories
of attacks, given some representative instances in the training data, any learning
algorithm was able to learn the general behavior of these attacks. As a result,
the IDSs detect a high percentage of old and new Probing and U2R attacks. On
the other hand, DoS and R2L have a wide variety of behavior because they exploit the weaknesses of a large number of different network or system services.
The features constructed based on the available attack instances are very specialized to the known attack types. Hence most of the trivial IDS models missed
a large number of new DoS and R2L attacks. It is understood that the misuse
detection models fail in the case of novel attacks. Even the anomaly detection
models do not work well when there is large variance in user behavior since
the algorithm tries to model the normal behavior and the attack behavior shows
a large variance many times overlapping with the normal behavior. Hence it
turns out to be difficult to guard against the new and diversified attacks. The
same is the case with the network traffic with a relatively small number of intrusion only patterns, normal network traffic can have a large number of variations.
It is observed from most of the previous studies that there was no attempt to
consider the correlation information in the input network traffic for improving
the detection effectiveness. The network traffic can be characterized in terms
of sequences of discrete data with temporal dependency [44, 45, 46]. It is observed in [47] that different network intrusions have different correlation statistics which can be directly utilized in the covariance feature space to distinguish
Chapter 2
24
multiple and various network intrusions effectively. By constructing a covariance feature space, a detection approach can thus utilize the correlation differences of sequential samples to identify multiple network attacks. It is also
pointed out that the covariance based detection will succeed in distinguishing
multiple classes with near or equal means while any traditional mean based
classification approach will fail. Even the best intrusion detection systems for
the DARPA evaluation [48] shows that less than 10% of new R2L intrusion attempts have been detected. Hence, the numbers of detections of new attacks are
more significant in determining the quality of IDSs.
2.3.2 Identification of real-world network traffic problems
The attack traffic in a real-world traffic is mostly rare. In addition, the attack
types within the attack class itself is skewed with the probes and DoS attacks
abound whereas the R2L and U2R attacks being rare. The effect of this data
skewness poses some very serious issues in the performance of IDSs, mainly in
two ways:
1. Applying the conditional probability using Bayes theorem, the detection
of an attack can be shown to be difficult unless both the percentage of
attacks in the entire traffic and the accuracy rate of their identification are
far higher than they are at present. The Bayesian rate of attack detection
P r(I|A) [49] is given by:
P r(I|A) =
P r(A|I) P r(I)
P r(A|I) P r(I) + P r(A|N I) P r(N I)
(2.1)
Chapter 2
25
0.9
Pr(INTRUSION | ALERT)
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
5
10
10
10
10
10
PROBABILITY OF ATTACK (LOG SCALE)
10
Figure 2.1: Probability of attack vs Bayesian attack detection rate for fixed values of FP and TP
Force base. With an IDS which is 99% accurate, and a false positive rate of
0.01, the Bayesian rate of attack detection, P r(I|A) is obtained as 0.00379.
Hence the false positives increase to roughly 262 for detecting almost one
real attack. This clearly shows the inability of the IDS for its proposed
task of attack detection, where the actual attacks get embedded in the large
volume of false positives. Even though the detection is 99% certain, the
chance of detecting an attack in the total alerts is only 1/263, due to the
fact that normal traffic is much larger than the attack traffic. Thus it is
difficult to interpret what a small false alarm rate is, when the base rate is
also small. From equation 2.1, the Bayesian attack detection rate can be
approximated as:
P r(I)
P r(I|A)
(2.2)
P r(A|N I)
Equation 2.2 clearly shows that for the probability of an alert being an
intrusion will be almost 1 for a given data, only if the false alarm rate is
of the same order as the prior probability. This adverse effect of the data
skewness is also illustrated with Figure 2.1. Figure 2.1 shows that the
naturally occurring class distribution often does not produce the best performing IDS. The optimal distribution generally contains more than 50%
of minority class examples.
Chapter 2
26
FScore
0.6
0.5
0.4
0.3
0.2
0.1
10
10
10
Base Rate
10
10
Chapter 2
27
case of imbalance in data and also when the cost of different errors vary
markedly. Most of the IDSs generate a trivial model by almost predicting the majority classes, since predicting the minority classes has a much
higher error rate, which in turn degrades the performance of the IDS. This
is accuracy paradox, which says that in predictive analytics, high accuracy
is not necessarily an indicator of high model quality. This is explained
with the DARPA test data set with 5 million test records consisting of 190
attacks. Consider an IDS which detects 100 attacks at a false positive rate
of 0.01%. The accuracy of this detector is 99.994%. However the accuracy
paradox lies in the fact that the accuracy can be easily made 99.996% by
always predicting normal. The second model, even though has a higher
accuracy, is useless since it does not detect attacks. Hence, most of the
IDSs do not detect minority class types sufficiently well since they aim
to minimize the overall error rate, rather than paying attention to minority
class, which is obviously not the desired detection result. Thus the present
day stand-alone IDSs are not effective in detecting the attacks, especially
the rare class of attack types.
The probes aim at knowing whether a particular IP exists and if so the
details regarding the services and the operating system running on it. Probing may be normal or may be the pre-phase of an attack. In the latter case,
it may not be possible to confirm the intend of the prober and thus recognize an attack, but instead proper preventive measures like the installation of patches, deployment of firewall, addition of firewall rules, removal
of unused services, etc., can be taken so that the vulnerability does not
get exploited. DoS is mainly disrupting the services on a network or on
a host. Hence DoS causes service denial and probes are for reconnaissance/surveillance.
The R2L attack on the other hand gains account on a remote machine,
exfiltrates files, modifies data, installs trojans for back door entry etc. The
U2R attack uses buffer overflow to acquire root shell and getting the full
control of the system. Data attacks are the special case where an attacker
gets privilege to access the special files. In real-world environment, these
Chapter 2
28
minority attacks are more dangerous than the majority attacks. Hence, it
is essential to improve the detection performance for the minority attacks,
while maintaining a reasonable overall detection rate.
2.3.3 Non-uniform misclassification cost
It is important to understand that the cost of misclassifying an attack as a normal(type I errors or FN) is often more than the cost of misclassifying a normal
as an attack (type II errors or FP). The issues involved in the measurement of
the cost factors have been studied by the computer risk analysis and security assessment communities. Denning in her book [50] remarks on the cost analysis
and the risk assessment in general that it is not an exact science because precise measurement of relevant factors is often impossible. Damage cost (DCost)
characterizes the amount of damage to a target resource by an attack when intrusion detection is unavailable or ineffective. Response cost (RCost) is the cost
of acting upon an alarm or log entry that indicates a potential intrusion [51]. Lee
et al. [24] have come up with an attack taxonomy illustrated in Table 2.3, which
categorizes intrusions that occur in the DARPA Intrusion Detection Evaluation
data set.
When attempting to look at the skewness within the minority class, it is observed that there is again a still higher misclassification cost for the minority
attack types. This has also been highlighted in the cost matrix of Table 2.4 published for the KDD IDS evaluation [52]. Hence, it is important to have IDSs
that minimize the overall misclassification cost by performing better on minority classes and again on minority attack types. Thus, by thinking in line with
the KDD evaluations done on IDSs, the goal was not only the improvement in
accuracy but also the reduction in misclassification cost. The misclassification
cost penalty was the highest for one of the most infrequent attack type and that
too for the type I error [52]. The total misclassification cost can be reduced if
the type I errors and the type II errors can be reduced. The advantage with most
of the rare events is that their signatures are unique which can be learned from
the given data. In some of the commonly encountered attacks, the attacker often
upload some malicious code onto the target machine and then login to crash the
machine to get root access, like casesen and sechole. Also, there are attacks that
Chapter 2
29
Table 2.3: Damage cost and response cost of different attack types [24]
SubDescription
Attack Type
Cost
category
(by results) (by techniques)
U2R
local
legitimate user trying to
DCost=100
acquire higher privileges
RCost=40
remote
from remote machine
DCost=100
acquiring root privileges
RCost=60
R2L
single
with a single event an illegal
DCost=50
user access is obtained
RCost=20
multiple
with multiple events an illegal DCost=50
user access is obtained
RCost=40
DoS
crashing
crashing a system by
DCost=30
certain framed packets
RCost=10
consumption
exhausting bandwidth or
DCost=30
system resources
RCost=15
Probe
simple
fast scan
DCost=02
RCost=05
stealth
slow scan
DCost=02
RCost=07
Predicted
Actual
Normal
Probe
DoS
U2R
R2L
1
0
1
2
2
2
2
0
2
2
2
2
2
0
2
R2L
2
2
2
2
0
Chapter 2
30
(2.4)
C10
C01 + C10
(2.5)
For the DARPA 1999 data set, the cost matrix in Table 2.3 shows that C01 > C10
and on substituting the values in equation 2.5, popt is obtained as 0.41. Hence,
the optimal decision from any IDS cannot be expected with data skewness of
prior probability of attack less than 0.41. This proves that data skewness results
Chapter 2
31
Chapter 2
32
Chapter 2
33
is the IDS developers who take the middle layer and keep the ecosystem stable
by maintaining at least a minimum number of undetected attacks and detectors
at all time.
2.4.2 Testing the performance of Intrusion Detection Systems
The evaluation of IDS was initiated by the US Defense Advanced Research
Projects Agency (DARPA) in 1998 and has been the most comprehensive scientific study known for comparing the performance of different IDSs [48].
The MIT Lincoln Laboratory synthesized the network traffic with its data sets
DARPA 1998 and DARPA 1999 [42]. The performance of IDS can be evaluated
by choosing these publicly available data sets.
IDSs can be configured and tuned in a variety of ways in order to reduce the
false positive rate and to maximize the detection rate. However, there is a tradeoff between these two metrics for any system and hence these measurements
are used to form the Receiver Operating Characteristic (ROC) curves. An ROC
curve plots the detection rate against the false alarm rate. If the IDS raises
alarms very often on every suspicious packet, the false alarm rate as well as the
detection rate will increase. On the other hand, if the IDS raises alarms only after sufficient evidence is available i.e., lower false alarms, the detection rate will
suffer but with an increased alarm confidence. An IDS can be operated at any
given point on the ROC curve. The optimal operating point for an IDS, given
a particular network, is determined by factors like the cost of a false alarm, the
value of a correct detection and the prior probabilities of normal and attack traffic. The ROC curve conveys information of importance when analyzing and
comparing IDSs. Figure 2.3 is an ROC graph plotted with each point identifying the status of a particular IDS, developed from 1995 to 2004, in terms of the
detection rate and the false alarm rate. The crowded region to the top left as seen
in the graph can be identified as that due to the recent systems. Thus the environment in which most of the IDSs of recent times operate requires very low
false alarm rates (much lower than the 0.1% designated by DARPA) for useful
detection. The overall accuracy is not a good metric since the class of interest
is extremely rare. In cases with low base rate, the IDS has to be evaluated based
on its performance in terms of both recall as well as precision. Figure 2.4 shows
Chapter 2
34
DETECTION RATE
0.9
0.8
0.7
0.6
0.5
0
0.05
0.1
0.15
FALSE ALARM RATE
0.2
PRECISION
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.2
0.3
0.4
0.5
0.6
RECALL
0.7
0.8
0.9
Chapter 2
35
the tradeoff between the two metrics precision and recall. The precision of an
IDS refers to the fraction of intrusions detected from the total alerts generated.
Similarly the recall refers to the IDS completeness; the more complete an IDS is
the fewer are the intrusions that remain undetected. The plots of precision and
completeness of the IDS over the years of IDS development can be understood
from Figure 2.5 with both the plots on a single graph, where the top plot refers
to the precision and the bottom plot refers to the recall.
detected intrusions
T rue P ositives
P recision(P ) = number of correctly
= T rue P ositives+F
number of alerts
alse P ositives
Recall(R) =
T rue P ositives
T rue P ositives+F alse N egatives
The behavior of the IDS can be generalized in terms of F-score, which scores a
balance between precision and recall as:
F -score =
1
;
. P1 +(1). R1
where takes a value between 0 and 1 and corresponds to the relative importance of precision over recall. Thus F-score is expected to take values between
0 and 1 depending on the relative importance of precision over recall. Considering equal importance to precision and recall; is assigned a value of 0.5
and F-score takes a value, which is the harmonic mean of precision and recall.
Higher value of F-score indicates that the IDS is performing better on recall as
well as precision. The plot of F-score over a period of time, as shown in Figure
2.6, gives an idea of the effectiveness of the IDSs developed over that period.
Hence a study was undertaken to highlight the performance of the various IDSs
over a period of time in terms of the F-score.
Usually it is expected that both technology as well as the performance of any
system improve with time. However, in the case of IDSs, this need not be the
case, since the attackers also gain a lot of expertise with time and the false
alarms can be increased so as to confuse the security analyst regarding the correct picture of the attack. Analysis of the Figure 2.6 can be carried out in terms
of the study of the growth in Internet insecurity from the incidents reported to
Computer Emergency Response Team/ Coordination Center (CERT/CC). The
Chapter 2
36
RECALL/PRECISION vs YEAR
FSCORE vs YEAR
1
0.9
0.95
0.8
0.9
FSCORE
RECALL/PRECISION
0.7
0.6
0.85
0.8
0.75
0.5
PRECISION
RECALL
0.4
1995
1996
1997
0.7
1998
1999 2000
YEAR
2001
2002
2003
2004
0.65
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
YEAR
14
4500
4000
3500
VULNERABILITIES
REPORTED INCIDENTS
12
10
8
6
4
2500
2000
1500
1000
2
0
1988
3000
500
1990
1992
1994
1996 1998
YEAR
2000
2002
2004
0
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
YEAR
Figure 2.7 and Figure 2.8 show the growth rate of incidents reported to CERT
as well as the vulnerabilities reported by CERT respectively. It looks as if the
attackers understanding of security weaknesses has also increased over time,
as is the case of improvements in their attack tools.
The statistics released by CERT [53] show disturbing trends in incident and
vulnerability reporting since it has started increasing exponentially in the recent
times whereas the increase was more linear in the early times. In order to bring
down the exponential growth of attacks, it is required that sufficient number
of IDSs have to be deployed for detection along with better and advanced IDS
Chapter 2
37
techniques for better detection. The plot of F-score over the years is a clear
indication that the IDS techniques are not improving in a regular and steady
manner, but instead following a prey/predator relationship.
Just after the introduction of IDS, it was becoming more and more competitive
trying to detect the existing intrusions caused by attackers. This can be seen
by an initial steep rise in the F-score. With the advancement in the IDS technology, the attackers also try to acquire more expertise and will try to launch
more unidentified and confusing attacks which causes the F-score to come down
drastically. As a result it becomes a need for security experts to overcome these
attacks, where both the IDS researchers and implementors will strive for new
techniques and modify the available IDSs and also find patches for the known
vulnerabilities. This effort will again bring up the F-score and this process of
the attacker or the detector gaining more and more expertise with time continues as seen in the Figure 2.3. This trend is seen in the work of Shimeall and
Williams [54] and Browne et al. [55] where the incidents reported behaves in an
oscillatory manner. The practical data available in their work explains this type
of system behavior. The plot is reproduced in Figure 2.6, from the data observed
by Shimeall and Williams. The plot shows the number of incidents with serious effects on a reporting site as reported to the CERT/CC each week between
June 24, 2000 and Feb 17, 2001. The incident counts excludes the simple portscanning, unexploited vulnerabilities, false alarms, failed frauds and hoaxes.
Shimeall and Williams [54] have found a temporal, spatial and associative trend
in the relationship of attacks on the Internet. One type of such temporal trends
relates to the timing between an event that may trigger an incident (say, a new
product announcement or the release of a new intrusion tool) and the corresponding incidents. Hacking conventions like Defcon, Black Hat Briefings, etc.
appeared at or close to local peaks in the incident reporting. Taking into account the exploits published on website, the peaks in the exploit publication
rate were weakly correlated with the peaks in the incident reporting rate one to
three weeks later. The exploitation of vulnerabilities in reported security incidents is very common. The intruders have the habit of modifying their tactics
too quickly. It is also true that while substantive periods of time may elapse
between the discovery of a vulnerability and its wide spread exploitation, there
Chapter 2
38
NUMBER OF INCIDENTS
90
80
70
60
50
40
30
20
10
5
10
15
20
25
30
NUMBER OF WEEKS SINCE 24JUN2000
35
Chapter 2
39
delay between the time attacks are detected and steps are taken to immunize
the systems to such attacks. After a time delay the attackers develop new types
of attack. This process continues and hence the variation of F-score with time
shows an oscillatory behavior in the case of IDSs.
On further analyzing the F-score plot, one of the possible models for estimating
the performance of IDS with time is a prey/predator relationship identified in
the attack/detector relationship. The attacks to a host/network increases, causing the detection rate to go high; as the detection becomes more and more competitive, the attacks will naturally come down since many of the attacks will no
longer be successful. As the attacks reduce significantly, the research and development taking place in the field of countermeasures like IDSs decreases. This
is because the intrusion detection analysts may be inclined to try new methods
only when they see their old methods to be inadequate, particularly when the
new methods require considerable knowledge and skill to be used effectively.
This again causes the attacks to increase considerably.
Considering the performance of a single IDS over the years, it will be seen
that the performance significantly deteriorate with time. It is difficult for an
IDS to keep up with the new attacks generated by the sophisticated attackers.
This infers that a constant update will be required at all times to maintain a
steady performance. This occurs only if the security researchers keep up with
the attackers steadily in their capabilities. However, this rarely happens with
full efficiency. Hence the performance of a single IDS deteriorates with time.
The performance of a single IDS at no stage has caused the effect of attacks to
totally disappear. Similar is the case with attacks also. A successful attack of
one time may not be successful at a later stage. This is mainly because of the
patches and the advanced security measures that the security developers introduce from time-to-time. Hence the attacks also show a decreasing performance
over a period of time.
To model this scenario, certain assumptions are made in the initial phase while
working with the model trying to define the attack-detector population dynamics and is given as follows:
Chapter 2
40
Chapter 2
41
to consider that the detectors search randomly for attacks, and more the
number of detectors, more is the chance of attacks becoming ineffective or
unsuccessful.)
Attack density cannot exceed some carrying capacity, which is imposed
by an ultimate resource limit on the attacks due to technical bottlenecks on
the system and network resources. The attacker knowledge also restricts
the sophistication in the attacks beyond a certain level.
The assumptions made for the pursuit-evasion processes are given as follows:
The attackers classified as script kiddies try to identify the security measures in terms of the density of the detectors in a particular domain and
try to attack sparsely monitored or secured networks. Thus it is a trend
that majority of the attacks move away from the domain which is intensely
watched and protected.
In the case of professional hackers, it is expected that they concentrate on
well protected and highly gainful domains. In such cases, sophisticated
attacks are seen to successfully concentrate more on critical domains intensely watched and protected. Their interest is to attack such domains for
various goals.
The trivial and average detectors find application by identifying the density
of attacks in a particular domain and then try to monitor sparsely monitored
resources and networks to fight against the existing attack scenario.
Highly sophisticated IDSs concentrate on intensely secured resources and
networks to still improve the security measures.
In this section, an attempt has been made to model the dynamic relationship
existing between the detectors and the attacks. This knowledge can be used to
enrich the design and development of IDSs. For each combination of a and d
there is an unstable attack-detector equilibrium, with the slightest disturbance
leading to expanding population oscillations. Depending on the initial state, the
system can evolve towards a simple steady state or a limit cycle, in which the
attack-detector populations oscillate periodically in time. The attack-detector
relationship may thus exhibit coupled oscillations.
Chapter 2
42
Usually detectors respond to attack distributions. A constant searching efficiency is more difficult to accept. Searching efficiency depends on the speed of
the traffic, attack density on a priori grounds and also on the detector density.
When a network intrusion happens, the sequence of attacks does not take place
in a totally random order. Intruders come with a set of tools trying to achieve
a specific goal. The selection of the cracking tools and the order of application depends heavily on the situation as well as the responses from the targeted
system. Typically there are multiple ways to invade a system. Nevertheless, it
usually requires several actions/tools to be applied in a particular logical order
to launch a sequence of effective attacks to achieve a particular goal. It is this
logical partial order that reveals the short and long-term goals of the invasion.
Real attacks are likely to be distributed in a patchwork of high and low densities, and the detectors can be expected to respond to the attacks by orienting
towards high density patches. It is natural that the detector searches for certain
traffic features for signs of attack. This provides a strong selective advantage for
detectors that result in a more focused search. Since the detector aggregation
has the effect of giving attacks a partial refuge at low densities, it is potentially important stabilizing factor in the dynamic interactions. Detectors usually
search for certain traffic features for the signs of attack. The normal traffic is
uncorrelated whereas the attack traffic is correlated. This provides a strong selective advantage for detectors that results in a more focused search.
The detections show a negative binomial pattern. The negative binomial distribution does not assume randomness but only proneness, i.e. certain traffic
feature(s) have a higher chance of disclosing attacks than the other features. If
the variance is larger than the mean, the level of proneness of the population is
high. If a Poisson distribution is used to model such a data, the model mean and
variance are equal. In that case, the observations are over dispersed with respect
to the Poisson model. Data are examined that relate to two-population interaction models based on the negative binomial parameter g. The negative binomial
pattern is specified by its mean , and a clumping parameter, g. The expected
proportion of attacks, not getting detected at all is given by P (0) = [1 + g ]g .
Chapter 2
43
variance
(mean)2
+ g1 .
In the limit, as g tends to infinity, the random or Poisson distribution is recovered, with the variance equal to the mean. Thus with large g values, say g > 8,
the Poisson randomness is assumed. If g < 1, CV gets larger, the detectors are
strongly aggregated in patches of high attack density.
Let At and Dt denote the number of attacks and detectors at any time t. Let
d denote the detector efficiency and a denote the attack increase rate ignoring
detection. The attack-detector model giving the attacks and the detectors in the
successive generations, t + 1 and t can be expressed as:
At+1 = aAt (1 + dDt /g)g
and
Dt+1 = cAt [1 (1 + (dDt /g)g ],
with g being the negative binomial dispersion parameter, which can be interpreted as a coefficient of variance of detector density among patches and dDt
being the mean detector density. The dynamics of the model show diverging
oscillations if g > 1; for g < 1, the system is always stable, at first showing damped oscillations and then approaches the equilibrium state. Figure 2.10
shows the attack-detector relationship using the negative binomial distribution
model. Besides the densities of the attacks and the detectors; namely At and
Dt , the parameters of the system are non-negative values. Figure 2.10 shows
the attack and detector distributions with typical values and initial conditions
for the coexistence of the attacks and the detectors as: a = 0.25(from Figure
2.7), A1 = 20000, d = 0.9 (from Figure 2.6), D1 = 6, g = 0.7 and t varying
from 2000 to 2005. The attacks as well as detectors are seen to increase with
time and clumping of attacks was significantly reduced upon detection. The
Chapter 2
44
x 10
ATTACKDETECTOR RELATIONSHIP
6
attacks
detectors
5
4
3
2
1
0
1
1.5
2.5
3
Time
3.5
4.5
model is seen to be in total agreement with the attack incident reports published
by CERT for the same period of time. In order to bring down the attacks, it is
necessary to deploy more detectors or/and improve the detection efficiency of
the detectors. With the detector efficiency maintained steady, the number of de5
x 10
ATTACKDETECTOR RELATIONSHIP
3.5
1.8
ATTACKDETECTOR RELATIONSHIP
1.6
attacks
detectors
attacks
detectors
2.5
A(t) and D(t)
1.4
A(t) and D(t)
x 10
1.2
1
0.8
0.6
2
1.5
1
0.4
0.5
0.2
0
2000
2001
2002
2003
2004
Time
2005
0
2000
2001
2002
2003
2004
Time
tectors deployed is first decreased and then increased to demonstrate the effect.
Figures 2.11 and 2.12 show the attack-detector interactions with less number
of initial deployment of detectors and more number of initial deployment of
detectors respectively. Also, with the deployed detectors remaining unchanged
Figures 2.13 and 2.14 show the effect of decreasing the detector efficiency and
2005
Chapter 2
45
x 10
ATTACKDETECTOR RELATIONSHIP
x 10
ATTACKDETECTOR RELATIONSHIP
4.5
attacks
detectors
attacks
detectors
3.5
5
4
3
3
2.5
2
1.5
2
1
1
0
2000
0.5
2001
2002
2003
2004
2005
0
2000
2001
Time
2002
2003
2004
Time
(2.6)
(2.7)
and
where c is the factor that incorporates the effect of detectors increasing in number depending on the number of attacks it detects. A special case of non-random
search is where some attacks are completely free from detection within a certain
time span (time refuge). In such a case, the detectors may aggregate in patches
of high attack density. This tendency of detectors aggregating in patches of high
attack density gives the attacks a refuge at low densities; a powerful stabilizing
force in the interaction. This aggregation is most extreme at low values of g.
2005
Chapter 2
46
(2.9)
This modification of the attack-detector model can completely alter the outcome
of an attack-detector model. Instead of always being unstable, this new model is
stable over a wide range of conditions depending on the attack growth rate (a),
and the amount of correlation (m). With a being very large, even small values
of m (say m = 0.3) will contribute markedly to the stability and may even give
complete stability. Apart from contributing to the stability of the attack-detector
interactions, detector correlation can also account for the frequent coexistence
of several detector varieties on one attack. The value of m increases as the
detector density increases. The detector searching efficiency tends to become
independent of the detector density as the change in correlation becomes very
small. The detector curve as shown in Figure 2.11 is seen to come down with a
decrease in D(1), d and g. The detector picks up with a decrease in the value of
correlation coefficient m.
Chapter 2
47
Thus, the introduction of the interference factor into the model contributes to
the stability of the model. As m increases, the stability increases; this is due
to the detectors being distributed in a more aggregative manner to counter a
sophisticated attack - an effect termed as pseudo-interference. The heterogeneous attack distributions, coupled with detector aggregation at high attack
density, must be the main stabilizing mechanism. The clumping of detections
can be formally described as pseudo interference, same as the stabilizing dynamical effects of mutual interference among detectors. Detector correlation
can also account for the frequent coexistence of several detector varieties on
one attack.
The correlation coefficient and the aggregative behavior are, in reality, closely
related. The common effect of correlation coefficient is to reduce the required
searching time in direct proportion to the frequency of encounters. Searching efficiency d declines as detector density increases. The modification has a
marked effect on stability. Instead of always being unstable, there can now be a
stable equilibrium, given suitable values of the interference constant m and the
attack growth rate a. These notions can be applied to the present model to get a
pseudo-interference coefficient of magnitude
0
m =
g(1a(1/g) )
.
ln(a)
That is, the overdispersion of detector attacks has much the same dynamical
consequences as would be produced by pure mutual interference among detec0
tors in a homogeneous world. The pseudo-interference coefficient m corresponds to a stable equilibrium if, and only if, g < 1. A special case of nonrandom search is where some attacks are completely free from detection within
a certain time span (time refuge). In such a case, the detectors may aggregate
in patches of high attack density. This tendency of detectors to aggregate in
patches of high attack density is a powerful stabilizing force in the interaction.
This aggregation is most extreme at low values of g. The density dependent
attack growth is also a stabilizing factor. Since this behavior gives the attacks
a refuge at low densities, it is a potentially important factor in stabilizing the
attack-detector interaction. In short, there are both empirical and theoretical
Chapter 2
48
18
x 10
16
d=0.25
Number of Attacks
14
12
10
8
d=0.5
6
4
d=0.75
2
0
1999
2000
2001
2002
2003
2004
Time in years
Chapter 2
49
2.7 Summary
Intrusion detection systems are becoming an indispensable and integral component of any comprehensive enterprize security system. The reason being that
Chapter 2
50
the IDS has the potential to alleviate many of the problems facing current network security.
A review of the issues connected with single IDSs has offered a critical analysis
to understand the need for further work in this field of research. This chapter
is integral to the whole thesis supporting the correctness of the track, and reinforcing that there is a contribution to make in this field. It demonstrates that the
existing work in the field of intrusion detection has been understood critically
along with the most important issues and their relevance to this work, its controversies, and its omissions.
In this chapter, issues connected with single IDSs are discussed. The problems associated with data skewness is exemplified. The need for improving the
performance of individual IDSs using advanced techniques is established. This
chapter makes certain inferences about the intrusion detection environment. The
normal traffic in any environment comprises of a majority of non-attacks and a
minority of attacks. The cost of missing an attack is higher than the cost of false
positives. Within the attack traffic, some attacks are even rarer. Rarer attacks
may also cause significant damage. The IDSs are normally characterized by
the overall accuracy. The imbalance in data degrades the prediction accuracy.
Though an IDS can give very high overall accuracy, its performance for the
class of rarer attacks has been found to be less than acceptable. Hence, it is not
appropriate to evaluate the IDSs using predictive accuracy when the data is imbalanced and/or the cost of different errors vary markedly. The data skewness
in the network traffic demands for an extremely low false positive rate of the
order of the prior probability of attack for an acceptable value of the Bayesian
attack detection rate.
The trends of F-score and precision/recall for IDSs over a period of 10 years
is analyzed and a model proposed to characterize the attack-detector behaviors
and formalize the attack-detector interactions. The modeling is based on deduction rules that are used to model the capabilities of the attacker and the detector.
The proposed model is validated with the empirical values. This modeling helps
in enriching the understanding and to further the design and research in IDSs.
Chapter 2
51
Chapter 2
52
IDS will re-mould rapidly and overcome many of the existing limitations and
hurdles. As the field grows, the attack-detection scenario will also be refined.
For more proactive defense, it is essential to understand the network defensive
and offensive strategies. With the attack-detector scenario better understood,
the future evolution of attacks can be estimated in a certain way thereby aiding better attack detection and in turn reduced false negatives. This knowledge
helps the security community to become proactive rather than reactive with respect to incident response.
Chapter 3
Evaluation and Test-bed of Intrusion
Detection Systems
The strongest arguments prove nothing so long as the conclusions are not verified by experience. Experimental Science is the queen of sciences and the goal
of all speculation.
Roger Bacon
3.1 Introduction
The poor understanding of the performance of IDSs available in literature may
be in-part caused by the shortage of an effective, unbiased evaluation and testing methodology that is both scientifically rigorous and technically feasible.
The choice of IDSs for a particular environment is a general problem, more
concisely stated as the intrusion detection evaluation problem, and its solution
usually depends on several factors. The most basic of these factors are the false
alarm rate and the detection rate, and their tradeoff can be intuitively analyzed
with the help of the Receiver Operating Characteristic (ROC) curve [43], [57],
[12], [58], [59]. However, as pointed out by the earlier investigators [49] [60]
[61], the information provided by the detection rate and the false alarm rate
alone might not be enough to provide a good evaluation of the performance of
an IDS. Hence, the evaluation metrics need to consider the environment the IDS
is going to operate in, such as the maintenance costs and the hostility of the operating environment (the likelihood of an attack). In an effort to provide such an
evaluation method, several performance metrics such as Bayesian detection rate
53
Chapter 3
54
[49], expected cost [60], sensitivity [62] and intrusion detection capability [63],
have been proposed in literature. These metrics usually assume the knowledge
of some uncertain parameters like the likelihood of an attack, or the costs of
false alarms and missed detections. Yet despite the fact that each of these performance metrics makes their own contribution to the analysis of IDSs, these
are rarely applied in the literature when proposing a new IDS.
In Appendix D, we review the method of evaluation and also describe the evaluation methodology used in this thesis. Appendix D introduces some new metrics
for IDS evaluation. Classification accuracy in IDSs deals with such fundamental problems as how to compare two or more IDSs, how to evaluate the performance of an IDS, and how to determine the best configuration of an IDS. In an
effort to analyze and solve these related problems, evaluation metrics such as
Area Under ROC Curve, precision, recall, and F-score, have been introduced.
Additionally, we introduce the P-test [36], which is more of an intuitive way
of comparing two IDSs and also more relevant to intrusion detection evaluation
problem. We also introduce a formal framework for reasoning about the performance of an IDS and the proposed metrics against adaptive adversaries.
We provide simulations and experimental results with these metrics using the
real-world traffic data as well as the DARPA 1999 data set in order to illustrate
the benefits of the algorithms proposed in the chapters five to eight of this thesis. The main reason for using the DARPA data set is that we need relevant data
that can easily be shared with other researchers, allowing them to duplicate and
improve our results. The common practice in intrusion detection to claim good
performance with real-time traffic makes it difficult to verify and improve previous research results, as the traffic is never quantified or released for privacy
concerns. We use both the DARPA data sets and the real-world traffic data.
Doing so and being able to compare and contrast the results should help alleviate most of the criticism against work based solely on the DARPA data, and
still allow work to be directly compared. Being the only comprehensive data
set that can be shared for IDS evaluation it becomes reasonable to analyze the
shortcomings and also its importance and strengths for such a critical evaluation. Since this data set was made publicly available nine years back, the IDSs
Chapter 3
55
that were developed after this time were taken for analyzing whether the data
set has become obsolete. The analysis shows that the inability of the IDSs far
outweigh the limitations of the data set. This section is supposed to give enough
support to IDS researchers using the DARPA data set in their evaluations. This
chapter also highlights the inability of single IDSs to make a complete coverage
of the entire attack domain. This clearly establishes the need for multiple and
heterogeneous IDSs for a wide coverage of the present-day attacks.
Chapter 3
56
usefulness of the DARPA data set is included in section 3.3. Some of the publicly available data sets [66] have been investigated, but they are not entirely
suitable for the analysis mainly due to the absence of the application payload.
Two anomaly detectors, PHAD [67] and ALAD [68], which give extremely low
false alarm rate of the order of 0.00002 and a third IDS which is the popularly
used open source IDS, Snort [69] and the fourth IDS being the commercially
accepted CISCO IDS 4215 [70] are considered in this study.
To improve the performance of the IDSs PHAD and ALAD, more data has
been incorporated in their training. Normal data was collected from a secured
University internal network and this has been randomly divided into two parts.
PHAD is trained on week three of the data set and one portion of the internal
network traffic data, and ALAD is trained on week one of the data set and the
other portion of the internal network traffic data. Hence the two anomaly-based
IDSs PHAD and ALAD are trained on disjoint sets of training data. The correlation among the classifiers is lowered due to factors like more training data
and that too disjoint and also more training time.
Chapter 3
57
Chapter 3
58
background traffic were too simple with the DARPA data set, and if real background traffic was used, the false positive rate would be much higher. Mahoney
and Chan [65] comments on the irregularities in the data, like the obvious difference in the TTL value for the attacks as well as the normal packets, which
makes even a trivial detector showing appreciable detection rate. They have
conducted an evaluation of anomaly-based network IDS with an enhanced version of the DARPA data set created by injecting benign traffic from a single host.
All the above criticisms have been well researched comments and these works
have made it clear that there remain several issues unsolved in design and modeling of the resultant data set. However, we cannot agree to the comment made
by Pickering [72] that benchmarking, testing and evaluating with the DARPA
data set is useless unless serious breakthroughs are made in machine learning.
The DARPA data set has the drawback that it was not recorded on a network
connected to the Internet. Internet traffic usually contains a fairly large amount
of anomalous traffic that is not caused by any malicious behavior [73]. Hence
the DARPA data set being recorded in a network isolated from the Internet
might not include these types of anomalies. The unsolved problems clearly remain. However, in the lack of better benchmarks, vast amount of the research
is based on the experiments performed on the DARPA data set. The general
thought that even with all the criticisms, the DARPA data set is still rigorously
used by the research community for evaluation of IDSs bring to the fore the
motivation for this section.
3.3.2 Facts in support of the DARPA IDS evaluation data set
A data set that is seen to be used for IDS evaluation other than the DARPA data
set is the Defcon Capture The Flag (CTF) data set. Defcon is an yearly hacker
competition and convention. However, this data set has several properties that
makes it very different from the real-world network traffic. The differences include an extremely high volume of attack traffic, the absence of background
traffic, and the availability of a very small number of IP addresses. The nonavailability of any other data set that includes the complete network traffic was
probably the initial reason to make use of the DARPA data set for evaluation by
a researcher in IDS. Also, the experience while trying to work with the real data
Chapter 3
59
traffic was not good; the main reason being the lack of the information regarding the status of the traffic. Even with intense analysis the prediction can never
be 100 percent accurate because of the stealthiness and sophistication of the attacks and the unpredictability of the non-malicious user. It involves high cost
if an attempt is made to properly label the network connections with raw data.
Hence most of the research work that used the real network data were not able
to report the detection rate or other evaluation metrics for a comparison purpose.
Mahoney and Chan [65] comment that if an advanced IDS could not perform
well on the DARPA data set, it could also not perform acceptably on realistic
data. Hence before thinking of junking the DARPA data set, it is wise to see
whether the state-of-the-art IDSs perform well, in the sense that it detects all the
attacks of the DARPA data set.
With the general impression that the data set used was old and hence not appropriate for IDS evaluation, the poor performance of some of the evaluated
IDSs were expected and hence acceptable. Assuming that the data set is not
generalized and hence counting that as a drawback of the data set, fine tuning
of the IDSs to the data set was considered. Snort has a main configuration file
that allows one to add and remove preprocessor requirements as well as the included rules files. The limit of fragmentation to be taken into notice and the
requirement of packet reconstruction are typically specified in this file. Snort
can be customized to perform better in certain situations using the DARPA data
set by improving the Snort rule-set. Thus, we tried to manipulate the benchmark
system.
3.3.3 Results and discussion
Test setup
The test setup for the experimental evaluation consisted of three Pentium machines with Linux Operating System. The experiments were conducted with the
simulated IDSs Snort version 2.3.4, PHAD, and ALAD and also the Cisco IDS
4215, distributed across a single subnet observing the same domain. This collection of heterogeneous IDSs was to examine how the different IDSs perform
in detecting the attacks of the DARPA 1999 data set.
Chapter 3
60
Experimental evaluation
The IDS Snort was evaluated with the DARPA 1999 data set and the results are
shown in Table 3.2. It can be noted in table 3.2 that some of the attacks for a
certain attack type may get detected whereas some other attacks from the same
attack type may not get detected. Hence some of the attack types appears in
both rows of Table 3.2. The performance of PHAD and ALAD on the same
Table 3.2: Attacks detected by Snort from the DARPA 1999 data set
Attacks detected
teardrop, dosnuke, portsweep, sshtrojan, sechole
by Snort
ftpwrite, yaga, phf, netcat, land, satan,
nc-setup, imap,nc-breakin, ncftp, guessftp,
tcpreset, secret, selfping, dosnuke, crashiis,
sqlattack, ntinfoscan, neptune, httptunnel, udpstorm,
ls, xlock, xsnoop, named, loadmodule, ppmacro
Attacks not detected
ps, portsweep, crashiis, sendmail, netcat,
by Snort
nfsdos, sshtrojan, ftpwrite, back,
guesspop, xsnoop, pod, snmpget, eject, dict,
guesstelnet, syslogd, guestftp, netbus, crashiis,
secret, smurf, httptunnel, loadmod, ps, ntfsdos,
arppoison, sqlattack, sechole, mailbomb, secret,
queso, processtable, sqlattack, fdformat, apache2, warez,
arppoison ffbconfig, named, casesen, land, xterm1
data set are given in Tables 3.3 and 3.4 respectively. The duplication in both the
Table 3.3: Attacks detected by PHAD from the DARPA 1999 data set
Attacks detected
fdformat, teardrop, dosnuke, portsweep, phf,
by PHAD
land, satan, neptune
Attacks not detected
loadmodule, anypw, casesen, ffbconfig, eject,
by PHAD
ntfsdos, perl, ps, sechole, sqlattack, sendmail,
nfsdos, sshtrojan, xlock, guesspop, xsnoop, snmpget,
guesstelnet, guestftp, netbus, crashiis,
secret, smurf, httptunnel, loadmod, arppoison, land,
mailbomb, processtable, ppmacro, fdformat,
warez, arppoison, named
rows as appeared in Table 3.2 is avoided in the rest of the tables to the maximum
extent possible by making the entry depending on the majority of detections or
missings from a certain attack type. The attacks detected by the Cisco 4215 IDS
is given in the Table 3.5.
Chapter 3
61
Table 3.4: Attacks detected by ALAD from the DARPA 1999 data set
Attacks detected
casesen, eject, fdformat, ffbconfig, sechole, xterm,
by ALAD
yaga, phf, ncftp, guessftp, crashiis, ps
Attacks not detected
loadmodule, anypw, nfsdos, perl, sqlattack, sendmail,
by ALAD
sshtrojan, xlock, guesspop, xsnoop, snmpget,
netbus, secret, smurf, httptunnel, loadmodule,
arppoison, sqlattack, sechole, land, mailbomb,
processtable, sqlattack, ppmacro, warez, arppoison, named
Table 3.5: Attacks detected by Cisco IDS from the DARPA 1999 data set
Attacks detected
portsweep, land, crashiis, ppmacro, mailbomb, netbus,
by Cisco IDS
sechole, sshtrojan, imap, phf
Attacks not detected
teardrop, dosnuke, ps, ftpwrite, yaga, sendmail,
by Cisco IDS
nfsdos, xlock, guesspop, xsnoop, snmpget,
guesstelnet, guestftp, secret, smurf, httptunnel,
loadmod, ps, ntfsdos, arppoison, sqlattack,
processtable, sqlattack, fdformat, warez,
arppoison, named, satan, nc-setup, nc-breakin, ncftp
Discussion
Chapter 3
62
of the time gap within 10seconds between the two. However, the Table 3.2
shows that the DARPA 1999 data set does in fact model attacks that Snort has
trouble detecting or the Snorts signature database is still not updated with those
signatures. Isnt it reasonable to think that the attacks for which the signatures
are not available with an IDS like Snort, which has its rule set regularly updated, are the ones that still exist undetected? The attackers also are vigilant of
the detection trend and hence cant we think that some of the latest attacks are
variants of those undetected attacks since those attacks were successful in terms
of detection avoidance. Or cant we say that if an IDS is capable of detecting
those attacks in addition to the ones detected by the Snort, it is a better performing IDS than Snort? Or is it reasonable to think of changing the testbed when
the IDS is suboptimum in performance on that test bed?
In a study made by Sommers et al. [74], after comparing the two IDSs Snort
and Bro, they comment that Snorts drop rates seem to degrade less intensely
with volume for the DARPA data set. They have also concluded in the paper
that Snorts signature set has been tuned to detect DARPA attacks. Even then,
if we cannot detect all the attacks of this nine year old data set, it clearly shows
the inability of reproducing the signatures of all the available attacks in the data
set of a signature-based IDS. This shows the inability of the IDSs rather than
the deficiency of the data set.
Preprocessing of the DARPA data set is required before applying to any machine learning algorithm. With the anomaly based IDSs, PHAD and ALAD, we
tried to train them by mixing the normal data from an isolated network along
with the week 1 and week 3 respectively of the training data set. Even then,
the algorithms produce less than 50% detection and around 100 false alarms for
the entire DARPA test data set. Again, there are enough reasons to think of the
failure on the part of the learning algorithms.
The usual reasoning for the poor performance of the anomaly detectors is that
the training and the test data are not correlated; but that happens in real-world
network traffic as well. The normal user behavior changes so drastically from
what the algorithm has been trained with, and hence we expect the machine
Chapter 3
63
learning algorithms to be extremely sophisticated and learn the changing behavior. Hence, the uncorrelated test bed is good for evaluating the performance
of learning algorithms. Then again, it is the failure on the part of the learning algorithms rather than the data set if the anomaly detectors are performing
poorly. Hence it can be concluded that the DARPA data set, even though old,
still carries a lot of novelty and sophistication in attacks.
The Cisco IDS is a network-based intrusion detection system that uses a signature database to trigger intrusion alarms. As any other network IDS, the Cisco
IDS also has only a local view. This feature-gap is pointed out indirectly in
[75]; ...does not operate properly in an asymmetrically routed environment.
Thus, the main reasons for the poor performance of the IDSs with the DARPA
1999 IDS evaluation data set are the following:
The training and test data sets are not correlated for R2L and U2R attacks
and hence most of the pattern recognition and machine learning algorithms
except for the anomaly detectors that learn only from the normal data, will
perform badly while detecting the R2L and the U2R attacks.
The normal traffic in real networks and also in the data set are not correlated and hence the trainable algorithms are expected to generate a lot of
false alarms.
None of the network based systems did very well against host based, U2R
attacks [76].
The DoS and the R2L attacks have a very low variance and hence difficult
to detect with a unique signature by a signature-based IDS or to observe as
an anomaly by an anomaly detector [14].
Several of the surveillance attacks probe the network and retrieve significant information, and they go undetected, by limiting the speed and scope
of probes [76].
The data set provides a large sample of computer attacks embedded in
normal background traffic; several realistic intrusion scenarios conducted
in the midst of normal background data.
Chapter 3
64
Many threats and thereby the exploits that are available on the computer
systems and networks are undefined and open-ended.
The above limitations have to be overcome by sophisticated detection techniques for an improved and acceptable IDS performance. We have also seen
that Snort performs exceptionally well in detecting the U2R attacks and DoS
attacks, PHAD performs well in detecting the probes and ALAD performs well
in detecting the R2L attacks. This clearly shows that each IDS is designed to
focus on a limited region of the attack domain rather than the entire attack domain. Hence IDSs are limited in their performance at the design stage itself.
On analyzing certain IDS alerts, the doubt arises as to whether it is justifiable
to say that the IDS detects the particular attack. Considering for instance the
attacker executes the command : $./exploit. In the real data set especially for
per packet model, it will get translated to many packets, with the first packet
containing $, second packet containing ., third packet containing /, fourth
packet containing e. Is it justifiable to say that the IDS detects the particular attack when the IDS detects the fourth packet as anomalous? It depends on
the implementation of the IDS. Some IDS buffers the data before matching it
against the stored patterns. In that case, it is able to see the whole string $./exploit and hence detects the anomaly. For an IDS that analyzes on a per packet
basis, it is able to find some anomalous pattern in one packet before the connection is terminated and then flags it as an anomalous connection. If the aim
is to find intrusive connections, then any packet corresponding to intrusive
connection, detected as malicious, should be good.
Chapter 3
65
primary reasons for choosing the IDSs PHAD and ALAD was the requirement
of the acceptability in terms of the number of false alerts that does not overload
a system analyst. The other reason for the choice of PHAD and ALAD was
that most of the existing IDS algorithms neglect the minority attack types, R2L
and U2R in comparison to the majority attack types, probes and DoS. ALAD
is highly successful in detecting these rare attack types. Also, Snort detects
the U2R/Data attacks exceptionally well. All the above IDSs are average in
terms of detection performance. Hence an attempt was made to improve the
performance of the individual IDSs.
3.4.1 Snort: Improvements by adding new rules
Snort has been identified to have a lot of rules that are named differently from
that in the DARPA 99 data set. For example the land attack which comes
under the DoS attack class is found in the bad traffic rules folder of Snort and
not in the DoS rules. The attack warezclient which downloads illegal copies
of software has been identified by Snort with a rule that looks for the executable
code on the FTP port. Also, many of the rules are very generic and hence
the chances of false positive were very high. However, it was identified that it
requires tremendous effort to modify those generic rules and we have succeeded
only to a very small extent. We seek for a higher recall objective in the first
phase and the fusion is expected to reduce the false alarms to some extent. Snort
rules were modified for the DoS attacks like land, dosnuke and selfping. When
trying to incorporate the rules, care has been taken not to overfit or make it very
generic. This avoids FNs and FPs to the maximum extent possible. For example
when the signature is connection type=ftp, the misclassification should not
happen because it can be due to DoS also in the flooding attempt. Hence, rule
R2L has to be refined for the absence of DoS attacks [110]. The rules may thus
incorporate more conditions for refinement and thus avoid misclassification and
hence the misclassification cost also. This has increased the Snort detection of
DoS.
Chapter 3
66
3.4.2 PHAD/ALAD
PHAD was highly reliable in detecting all the probes except for the stealthy
slow scans which have been included in the DARPA99 data set. The stealthy
probes which PHAD missed are ipsweep, lsdomain, portsweep and resetscan.
However, Snort was effective in identifying those stealthy ones by waiting for
longer that one minute between the successive network transmissions. PHAD
has the disadvantage that it classifies attacks based on a single packet. We have
improved PHAD by examining a session and detecting the anomalies in the
connection rather than only at the packet level. A connection (record) is a sequence of TCP packets starting and ending at some well-defined times, between
which data flows from the source IP address to the target IP address under some
well-defined protocol.
The detection performance of the anomaly detectors PHAD and ALAD can
be improved further by training them on additional normal traffic other than the
traffic of weeks one and three of the DARPA 1999 data set. To improve the
performance of the IDSs PHAD and ALAD, more data has been incorporated
in their training. Normal data was collected from an University internal network and this has been randomly divided into two parts. PHAD was trained on
week three of the data set and one portion of the internal network traffic data,
and ALAD is trained on week one of the data set and the other portion of the
internal network traffic data. Hence, the two anomaly-based IDSs PHAD and
ALAD are trained on disjoint sets of the training data. The correlation among
the classifiers is lowered by incorporating more of training time and that too
being disjoint. The disjoint data sets given to PHAD and ALAD for training
has also helped to an extent in feature selection and also in reducing the correlation between the two IDSs. Both PHAD and ALAD look into almost disjoint
features of the traffic. PHAD detects anomaly based on the intrinsic features
of the TCP, UDP, IP, ICMP and the Ethernet headers. ALAD detects anomaly
based on almost disjoint features of the traffic by looking at the inbound TCP
stream connection to well-known server ports.
There are a number of DoS as well as R2L attacks that are difficult to get detected since they exploit a large number of different network or system services.
Chapter 3
67
There is no regular pattern for such attacks for detection by misuse detection
systems. The anomaly detection systems are also unable to detect them since
they may look like normal traffic because of the attacker evading some trusted
hosts and using them for an attack. These attacks are highly sophisticated and
needs a thorough analysis by a specialized detector. In addition, there is an observable imbalanced intrusion result due to DoS having more connections than
any other attack. Most of the IDSs will try to minimize the overall error rate,
but this leads to increase in the error rate of rare classes. Hence, more efforts
should be made to improve the detection rate of the rare classes.
This section has highlighted that even with an effort to improve the available
IDSs PHAD, ALAD and Snort, these IDSs still remain suboptimum with detection rates less than 50%.
3.5 Summary
The whole world has a growing interest in network security. The DARPAs
sponsorship, the AFRLs evaluation and the MIT Lincoln Laboratorys support
in security tools have resulted in a world class IDS evaluation setup that can
be considered as a ground breaking intrusion detection research. The DARPA
evaluation data set has the required potential in modeling the attacks that are
commonly found in the network traffic. Hence we conclude by commenting
that it can be used to evaluate the IDSs in the present scenario, even though any
effort to make the data set more real and therefore fairer for IDS evaluation is
to be welcomed. If a system is evaluated on the DARPA data set, then it cannot
claim anything more in terms of its performance on the real network traffic.
Hence this data set can be considered as the base line of any research.
In an effort to analyze and solve the IDS evaluation problems, evaluation
metrics such as Area Under ROC Curve, precision, recall, and F-score have
been introduced in Appendix D. Additionally, the P-test, which is more of an
intuitive way of comparing two IDSs and also more relevant to intrusion detection evaluation problem has been included in Appendix D. The metrics used for
IDS evaluation like F-score and P-test are highly effective for a perfect comparison of IDSs.
Chapter 4
Mathematical Basis for Sensor Fusion
Mathematics possesses not only truth, but supreme beauty - a beauty cold and
austere like that of sculpture, and capable of stern perfection, such as only great
art can show.
Bertrand Russell
4.1 Introduction
Chapter two and chapter three established the issues as well as limitations of a
single IDS respectively. Sensor fusion was identified as a viable solution for enhancing the performance of IDSs. The primary objective of the proposed thesis
is to develop a theoretical and practical basis for enhancing the performance of
intrusion detection systems using advances in sensor fusion with easily available IDSs. This chapter introduces the mathematical basis for sensor fusion in
order to provide enough support for the acceptability of sensor fusion in intrusion detection applications. Clearly, sensor fusion for performance enhancement of IDSs requires very complex observations, combinations of decisions
and inferences via scenarios and models. The basic problem involves selecting
IDSs and choosing the appropriate sensor fusion algorithms that provide sufficient enhancement in the performance of the fused IDS. Although, fusion in the
context of enhancing the intrusion detection performance has been discussed
earlier in literature, there is still a lack of theoretical analysis and understanding, particularly with respect to correlation of detector decisions. In this chapter,
we formulate the problem of fusion of multiple heterogeneous IDSs and examine whether the improvement in performance could be achieved through sensor
68
Chapter 4
69
fusion. This chapter describes the central concept underlying the work and a
theme that ties together all the arguments in this work. It provides an answer to
the questions posed in the introduction at a conceptual level.
With a precise understanding as to why, when, and how particular sensor fusion methods can be applied successfully, progress can be made towards a powerful new tool for intrusion detection: the ability to automatically exploit the
strengths and weaknesses of different IDSs. The theoretical model is undertaken, initially without any knowledge of the available detectors or the monitoring data. The empirical evaluation to augment the mathematical analysis is
illustrated in the chapters five to eight using two data sets; 1) the real-world
network traffic and 2) the DARPA 1999 data set. The results in those chapters
confirm the analytical findings in this chapter.
This chapter is organized as follows: Section 4.2 discusses the sensor fusion
algorithms. Section 4.3 and section 4.4 provide a survey on the related work in
sensor fusion and also the related work of sensor fusion in intrusion detection
applications respectively. Section 4.5 includes the theoretical formulation and
section 4.6 includes the solution approaches in the intrusion detection applications using sensor fusion. The chapter is summarized in section 4.7.
Chapter 4
70
Statistical approaches are mostly based on modeling the data based on its statistical properties and using this information to estimate whether a test samples
comes from the same distribution or not. The simplest approach can be based
on constructing a density function for data of a known class, and then assuming that data is normal computing the probability of a test sample of belonging
to that class. The probability estimate can be thresholded to signal the intrusion.
Two main approaches exist to the estimation of the probability density function,
Chapter 4
71
Some issues for sensor fusion such as their ability to generalize, computational
expense during training and further expense when they need to be retrained are
critical to neural networks in comparison to statistical methods. A subjective
view supports the use of neural network for sensor fusion in order to achieve
novelty detection in intrusion detection applications. The neural networks gain
experience by training the system to correctly identify the preselected examples
of the problem. The back-propagation algorithm can be used in the learning
phase to adapt the weights of the neural network. The computational complexity
of neural networks has always been an important consideration for practical
applications. One important consideration with neural networks is that they
cannot be as easily retrained as statistical models. Retraining is done when new
class data is to be added to the training set or when the training data no longer
reflects the environmental conditions.
Support Vector Machines
Chapter 4
72
Bayesian classifiers
Decision trees (D-trees) dominate SVMs which in turn dominate NB in both the
precision and recall values. However, D-trees show a much larger fluctuation
in accuracy in the initial stages. This is to be expected because decision trees
are known to be unstable classifiers. SVMs are better in the initial stages of active learning when the training data is small but they loose out later. SVMs are
known to excel on accuracy but the uncertainty value measured as the distance
from SVM separator is perhaps not too meaningful. D-trees turn out to be better
in the combined metric.
An intuitive method for measuring uncertainty for separator based classifiers
like SVMs is to make it inversely proportional to the distance of the instance
from the separator. Similarly for Bayesian classifiers, the posterior probabilities
of classes can be used as an estimate of certainty. For decision trees, typically
uncertainty is derived from the error of the leaf into which the instance falls.
NB tree
The complementary behavior of NB and the D-trees has given rise to their hybrid which outperforms most of the earlier methods for intrusion detection application.
4.2.2 Evidence Theory
The Dempster-Shafer (DS) method is a powerful tool that can deal with subjective hypothesis for evidence as well as statistical data combination. The DS
Chapter 4
73
method does not have the requirement like Bayesian that the sensor set be predefined and sensors joint observation probability distribution be known beforehand. The DS rule corresponds to a conjunction operator: it builds the belief
induced by accepting two pieces of evidence, i.e., by accepting their conjunction. Shafer developed the DS theory of evidence based on the model that all
the hypotheses in the FoD are exclusive and that the frame is exhaustive, which
is true with the decisions of multiple IDSs that are to be fused.
The DS infers the true state of the system without having an explicit model
of the system. It is based only on some observations that can be considered as
hints (with some uncertainty) towards some system states. DS theory makes
the distinction between uncertainty and ignorance, so it is a very useful way to
reason with uncertainty based on incomplete and possibly contradictory information extracted from a stochastic environment.
4.2.3 Kalman filter
The Kalman filter is an efficient recursive filter that estimates the state of a dynamic system from a series of noisy measurements. A linear system in which
the mean squared error between the desired output and the actual output is minimized when the input is a random signal generated by white noise.
4.2.4 Bayesian network
A Bayesian network or a belief network is a probabilistic graphical model that
represents a set of variables and their probabilistic independencies. Formally,
Bayesian networks are directed acyclic graphs whose nodes represent variables,
and whose missing edges encode conditional independencies between the variables. Nodes can represent any kind of variable. Efficient algorithms exist that
perform inference and learning in Bayesian network.
Chapter 4
74
Chapter 4
75
Chapter 4
76
method adopted by Chair and Varshney [35] for solving the data fusion problem
for fixed binary local detectors with statistically independent decisions. Kam et
al. [92] use Bahadur-Lazarsfeld expansion of the probability density functions.
Blum and Kassam [93] study the problem of locally most powerful detection
for correlated local decisions. The approach for an optimal data fusion for individual decisions that are correlated is in terms of the conditional correlation
coefficients of all orders.
Chapter 4
77
evaluation, the ingoing and outgoing traffic ratio and service rate are selected as
the detection metrics, and the prior knowledge in the DDoS domain is proposed
to assign probability to evidence.
Siraj et al. [98] discuss a Decision Engine for an Intelligent Intrusion Detection
System (IIDS) that fuses information from different intrusion detection sensors
using an artificial intelligence technique. The Decision Engine uses Fuzzy Cognitive Maps (FCMs) and fuzzy rule-bases for causal knowledge acquisition and
to support the causal knowledge reasoning process. Thomopolous in one of his
work [88], concludes that with the individual sensors being independent, the
optimal decision scheme that maximizes the probability of detection at the fusion for fixed false alarm probability consists of a Neyman-Pearson test at the
fusion unit and the likelihood ratio test at the sensors. In the work of Lee et al.
[51] they note that, the best way to make intrusion detection models adaptive is
by combining existing models with new models trained on new intrusion data
or new normal data. In that work, they combined the rule sets that were inductively generated on separate days to produce a more accurate composite rule set.
The other somewhat related works albeit distantly are the alarm clustering method
by Perdisci et al. [99], aggregation of alerts by Valdes et al. [100], combination of alerts into scenarios by Dain et al. [101], the alert correlation by Cuppens et al. [102], the correlation of Intrusion Symptoms with an application of
chronicles by Morin et al. [103], and aggregation and correlation of intrusiondetection alerts by Debar et al. [104], etc. The correlation of alerts is mainly
by grouping alerts that are part of the same attack trend and hence completely
avoids the duplicate alerts. The aggregation of alerts is based on certain criteria
to aggregate severity level, reveal trends, and clarify attackers intentions. The
work of Valeur et al. [105] presents a general correlation model that includes a
comprehensive set of components and a framework based on this model. These
works address the issue of efficiently managing the large number of alerts by
providing an unified description of the alerts from individual IDSs.
Considering the literature on various sensor fusion techniques used for intrusion detection applications, it is seen that many machine learning algorithms do
Chapter 4
78
not handle skewed data sets well. To counter the effect of data skewness, either
downsampling the number of normal events or upsampling the number of attack
events is normally done. Sampling the normal set might reduce the information
content and only present a subset of all available normal events, in turn leading to false positives being reported by the system. This again establishes that
sensor fusion is the only promising approach for performance enhancement of
IDSs. The mathematical basis of sensor fusion is attempted in the remaining
sections of this chapter.
Chapter 4
79
Chapter 4
80
Chapter 4
81
minimize the overall cost of misclassification and improve the overall detection
rate. The fundamental problem of network intrusion detection can be viewed
Figure 4.1: Fusion architecture with decisions from n IDSs
IDS1
IDS2
INPUT (x)
S1
S2
.
.
.
.
.
IDSn
OUTPUT (y)
FUSION UNIT
Sn
(4.3)
Chapter 4
82
(4.4)
where f (.) corresponds to the fusion function. The independent variables (i.e.,
information about any group of variables does not change the belief about others) s1 , ..., sn , are imprecise and dependent on the class of observation and hence
given as:
sj = f (sj1 , ..., sjn )
(4.5)
where j refers to the class of the observation.
Variance of the IDSs determines the average quality when each IDS acts individually. Lower variance corresponds to better performance. Covariance among
detectors measures the dependence of the detectors. The more the dependence,
the lesser the gain benefited out of fusion. Let us consider two cases here. In the
first case, for each access, n responses are available and are used independently
of each other. The average of variance of sj over all i = 1, 2, ..., n, denoted as
j 2
(av
) is given as:
n
1X j 2
j 2
( )
(av ) =
(4.6)
n i=1 i
In the second case, all n responses are used together and are combined using
the mean operator; the variance of over many accesses, denoted as (fj usion )2 is
called the variance of average and can be calculated as follows:
(fj usion )2 =
1
n2
= n1
Pn
j 2
i=1 (i )
Pn
1
j 2
i=1 (av ) + n2
1
n2
Pm
Pn
j
j j
k=1,k<i i,k i k
i=1,i<k
Pm
i=1,i<k
Pn
j
j j
k=1,k<i i,k i k
(4.7)
where ji,k is the correlation coefficient between the ith and kth detectors and
for j taking the different class values. The first term is the average variance of
the base-experts while the second term is the covariance between ith and kth
Chapter 4
83
(4.8)
It can be observed that the resultant variance of the final score will be reduced
with respect to the average variance of the two original scores when two detector scores are merged by a simple mean operator. Since 0 jm,n 1,
1 j 2
(4.9)
(av ) (fj usion )2
n
The two equations 4.8 and 4.9 give the lower and upper bound of (fj usion )2 ,
attained with correlation and uncorrelation respectively. Any positive correlation results in a variance between these bounds. Hence, by combining responses
using the mean operator, the resultant variance is assured to be smaller than the
average (not the minimum) variance. Fusion of the scores reduces variance,
which in turn results in reduction of error (with respect to the case where scores
are used separately). To measure explicitly the factor of reduction in variance,
1 j 2
j 2
(av ) (fj usion )2 (av
)
n
Factor of reduction in variance, vr =
j 2
(av
)
;1
k
(f usion )2
(4.10)
vr n
This clearly indicates that the reduction in variance is more when more detectors are used, i.e., increasing n, the better will be the combined system, even
if the hypotheses of underlying IDSs are correlated. This comes at a cost of increased computation, proportional to the value of n. The reduction in variance
of the individual classes results in lesser overlap between the class distributions.
Thus the chances of error reduces, which in turn results in improved detection.
This forms the argument in this work for why fusion using multiple detectors
works for intrusion detection application. Experimental results provide strong
evidence to support this claim.
Following common possibilities encountered on combining two detectors are
analyzed:
Chapter 4
84
< (1j )2
Chapter 4
85
fusion will definitely lead to better performance. On the other hand, for the
fourth case where 1, fusion may not necessarily lead to better performance.
From the above analysis using a mean operator as fusion, the conclusion drawn
are the following:
The analysis explains and shows that fusing two systems of different performances is not always beneficial. The theoretical analysis shows that if the
weaker IDS has (class-dependent) variance three times larger than the variance
of the best IDS, the gain due to fusion breaks down. This is even more true
for correlated base-experts as correlation penalizes this limit further. It is also
seen that fusing two uncorrelated IDSs of similar performance always result in
improved performance. Finally, fusing two correlated IDSs of similar performance will be beneficial only when the covariance of the two IDSs are less than
the variance of the IDSs. It is necessary to show that a lower bound of accuracy
results in the case of sensor fusion. This can be proved as below:
P
Given the fused output as s = i wi si , the quadratic error of a sensor indexed
i, ei , and also the fused sensor, ef usion are given by:
ei = (si c)2
(4.11)
(4.12)
and
respectively, where wi is the weighting on the ith detector, and c is the target.
The ambiguity of the sensor is defined as:
ai = (si s)2
(4.13)
The squared error of the fused sensor is seen to equal the weighted average
squared error of the individuals, minus a term which measures average correlaP
tion. This allows for non-uniform weights (with the constraint i wi = 1) so
P
the general form of the ensemble output is s = i wi si . The ambiguity of the
fused sensor is given as:
Chapter 4
af usion =
=
=
=
=
=
86
P
P
P
P
P
P
i wi ai
i wi (si
s)2
i wi (si
c + c s)2
i wi ((si
c) (s c))2
i wi ((si
i w i ei
2
i wi 2(si c)(s c)
(4.14)
On solving equation 4.14, the error due to the combination of several detectors
is obtained as the difference between the weighted average error of individual
detectors and the ambiguity among the fusion member decisions.
X
X
ef usion =
wi (si c)2
wi (si s)2
(4.15)
i
The ambiguity among the fusion member decisions is always positive and hence
the combination of several detectors is expected to be better than the average
over several detectors. This result turns out to be very important for the focus
of this work.
Chapter 4
87
Chapter 4
88
Chapter 4
89
evidence in a novel way and, possibly, are better able to discriminate the relevant aspects of emergent phenomena. Novel categories detect novel empirical
evidence, that may be fragmentary, irrelevant, contradictory or supportive of
particular hypotheses. The DS theory approach for quantifying the uncertainty
in the performance of a detector and assessing the improvement in system performance, consists of three steps:
1. Model uncertainty by considering each variable separately. Then a model
that considers all variables together is derived.
2. Propagate uncertainty through the system, which results in a model of uncertainty in the performance of the system.
3. Assess the system performance enhancement.
In the case of Dempster-Shafer theory, the FoD is expected to contain all propositions of which the information sources (IDSs) can provide evidence. When a
proposition corresponds to a subset of a frame of discernment, it is said that the
frame discerns that proposition. It is expected that the elements of the frame of
discernment, are assumed to be exclusive propositions. This is a constraint,
which gets always satisfied in intrusion detection application because of the discrete nature of the detector decision. The belief of likelihood of the traffic to
be in an anomalous state is detected by various IDSs by means of a mass to the
subsets of the FoD.
The DS theory is a generalization of the classical probability theory with its
additivity axiom excluded or modified. The probability mass function (p) is a
mapping which indicates how the probability mass is assigned to the elements.
The Basic Probability Assignment (BPA) function (m) on the other hand is the
P
set mapping, and the two can be related A as m(A) = BA p(B) and
hence obviously m(A) relates to a belief structure. The mass m is very near to
the probabilistic mass p, except that it is shared not only by the single hypothesis but also to the union of the hypotheses. In DS theory, rather than knowing
exactly how the probability is distributed to each element B , we just know
by the BPA function m that a certain quantity of a probability mass is somehow divided among the focal elements. Because of this less specific knowledge
about the allocation of the probability mass, it is difficult to assign exactly the
Chapter 4
90
probability associated with the subsets of the FoD, but instead we assign two
measures: the (1) belief (Bel)and (2) plausibility (P l), which correspond to the
lower and upper bounds on the probability,
i.e., Bel(A) p(A) P l(A)
where the belief function, Bel(A), measures the minimum uncertainty value
about proposition A, and the Plausibility, P l(A), reflects the maximum uncertainty value about proposition A.
The following are the key assumptions made with the fusion of intrusion detectors:
If some of the detectors are imprecise, the uncertainty can be quantified
about an event by the maximum and minimum probabilities of that event.
Maximum (minimum) probability of an event is the maximum (minimum)
of all probabilities that are consistent with the available evidence.
The process of asking an IDS about an uncertain variable is a random experiment whose outcome can be precise or imprecise. There is randomness
because every time a different IDS observes the variable, a different decision can be expected. The IDS can be precise and provide a single value or
imprecise and provide an interval. Therefore, if the information about uncertainty consists of intervals from multiple IDSs, then there is uncertainty
due to both imprecision and randomness.
If all IDSs are precise, they give pieces of evidence pointing precisely to specific values. In this case, a probability distribution of the variable can be build.
However, if the IDSs provide intervals, such a probability distribution cannot be
build because it is not known as to what specific values of the random variables
each piece of evidence supports.
In the case of DS theory, the additivity axiom of probability theory p(A) +
= 1 is modified as m(A) + m(A)
+ m() = 1, with uncertainty introp(A)
is the mass
duced by the term m(). m(A) is the mass assigned to A, m(A)
assigned to all other propositions that are not A in FoD and m() is the mass assigned to the union of all hypotheses when the detector is ignorant. This clearly
Chapter 4
91
A
the unnormalized DS method. Also, independence of evidence is yet another
requirement for the DS combination method. The problem is formalized as follows: Considering the network traffic, assume a traffic space , which is the
union of the different classes, namely, the attack and the normal. The attack
class have different types of attacks and the classes are assumed to be mutually
exclusive. Each IDS assigns to the traffic, the detection of any of the traffic
sample x, that denotes the traffic sample to come from a class which is an
element of the FoD, . With n IDSs used for the combination, the decision of
each one of the IDSs is considered for the final decision of the fusion IDS.
This chapter presents a method to detect the unknown traffic attacks with an
increased degree of confidence by making use of a fusion system composed of
detectors. Each detector observes the same traffic on the network and detects
the attack traffic with an uncertainty index. The frame of discernment consists
of singletons that are exclusive (Ai Aj = , i 6= j) and are exhaustive since
the FoD consists of all the expected attacks which the individual IDS detects
or else the detector fails to detect by recognizing it as a normal traffic. All the
constituent IDSs that take part in fusion are assumed to have a global point of
view about the system rather than separate detectors being introduced to give
specialized opinion about a single hypothesis.
The DS combination rule gives the combined mass of the two evidence m1
Chapter 4
92
m1 (X)m2 (Y )
X Y =A
1
m1 (X)m2 (Y )
X Y =
(4.16)
pq
(1p)(1q)
Chapter 4
93
Specifically, if a particular detector indexed i taking part in fusion has probability of detection mi (A) for a particular winning class A. It is expected that
fusion results in the probability of that class as m(A), which is expected to
be more that mi (A) i and A. Thus the confidence in detecting a particular winning class is improved, which is the key aim of sensor fusion. Thus,
Dempster-Shafer theory for sensor fusion aids in attaining an increased value of
confidence in detection by means of increased probability of detection of individual classes. Note that the Dempster-Shafer rule is independent of the order
in which evidence are combined.
The above analysis is simple since it considers only one class at a time. The
variance of the two classes can be merged and the resultant variance is the sum
of the normalized variances of the individual classes. Hence, the class label can
be dropped.
4.6.2 Analysis of detection error assuming traffic distribution
The previous sections analyzed the system without any knowledge of the underlying traffic or detectors. In this section, the Gaussian distribution is assumed
for both the normal and the attack traffic due to its acceptability in practice.
Often, the data available in databases is only an approximation of the true data.
When the information about the goodness of the approximation is recorded,
the results obtained from the database can be interpreted more reliably. Any
database is associated with a degree of accuracy, which is denoted with a probability density function, whose mean is the value itself. Formally, each database
value is indeed a random variable; the mean of this variable becomes the stored
value, and is interpreted as an approximation of the true value; the standard deviation of this variable is a measure of the level of accuracy of the stored value.
Assuming the attack connection and normal connection scores to have the mean
values yij=1 = 1 and yij=0 = 0 respectively, 1 > 0 without loss of generality. Let 1 and 0 be the standard deviation of the attack connection and normal
connection scores. The two types of errors committed by IDSs are often measured by False Positive Rate (F Prate ) and False Negative Rate (F Nrate ). F Prate
is calculated by integrating the attack score distribution from a given threshold
Chapter 4
94
F Nrate =
(pk=1 )dy
(4.18)
The threshold T is an unique point where the error is minimized, i.e., the difference between F Prate and F Nrate is minimized by the following criterion:
T = argmin(F Prate F Nrate )
(4.19)
At this threshold the resultant error due to F Prate and F Nrate is a minimum.
This is because the F Nrate is an increasing function (a cumulative density function, cdf ) and F Prate is a decreasing function (1 cdf ). T is the point where
these two functions intersect. Decreasing the error introduced by the F Prate
and the F Nrate implies an improvement in the performance of the system. The
fusion algorithm accepts decisions from many IDSs, where a minority of the
decisions are false positives or false negatives. A good sensor fusion system is
expected to give a result that accurately represents the decision from the correctly performing individual sensors, while minimizing the decisions from erroneous IDSs. Approximate agreement emphasizes precision, even when this
conflicts with system accuracy. However, sensor fusion is concerned solely with
the accuracy of the readings, which is appropriate for sensor applications. This
is true despite the fact that increased precision within known accuracy bounds
would be beneficial in most of the cases. Hence the following strategy is being
adopted:
. The false alarm rate F Prate can be fixed at an acceptable value 0 and then
the detection rate can be maximized. Based on the above criteria a lower
bound on accuracy can be derived.
. The detection rate is always higher than the false alarm rate for every IDS,
an assumption that is trivially satisfied by any reasonably functional sensor.
. Determine whether the accuracy of the IDS after fusion is indeed better
Chapter 4
95
than the accuracy of the individual IDSs in order to support the performance enhancement of fusion IDS.
. To discover the weights on the individual IDSs that gives the best fusion.
Given the desired false alarm rate which is acceptable, F Prate = 0 , the threshold (T ) that maximizes the T Prate and thus minimizes the F Nrate ;
T Prate = P r[alert|attack] = P r[
n
X
wi si T |attack ]
(4.20)
i=1
F Prate
n
X
= P r[alert|normal] = P r[
wi si T |normal] = 0
(4.21)
i=1
n
X
wi si
i=1
and
T Pi = P r[si = 1|attack],
F Pi = P r[si = 1|normal]
where T Pi and F Pi are the detection rate and the false positive rate of an individual IDS indexed i. It is required to provide low value of weight to any individual sensor that is unreliable, hence meeting the constraint on false alarm as
given in equation 4.21. Similarly the fusion improves the T Prate as the detectors
get appropriately weighted according to their performance. One justification for
this evaluation metric is that when searching large databases like the network
traffic, it is more reasonable to have the results to be most relevant (precision),
without caring whether all the relevant examples are seen (recall) or not. We
chose the number 0 depending on the proportion of attacks in the normal traffic (base-rate). This threshold is of course adjustable and one may vary the scale
of the measured performance numbers by adjusting it. It also happens that these
Chapter 4
96
features precision-at-0 scores are quite distinct from one another, facilitating
meaningful comparison.
Fusion of the decisions from various IDSs is expected to produce a single decision that is more informative and accurate than any of the decisions from the individual IDSs. Then the question arises as to whether it is optimal. Towards that
end, a lower bound on variance for the fusion problem of independent sensors,
or an upper bound on the false positive rate or a lower bound on the detection
rate for the fusion problem of dependent sensors is presented in this work.
Fusion of Independent Sensors
Consider n independent IDSs with the decisions of each being a random variable with Gaussian distribution of zero mean vector and covariance matrix diagonal (12 , 22 , . . . , n2 ). Assume s to be the expected fusion output, which is the
unknown deterministic scalar quantity to be estimated and s to be the estimate
Chapter 4
97
Chapter 4
98
score =
ln(L( 2 ; s))
2
Fisher information is thus the expectation of the squared score. A random variable carrying high Fisher information implies that the absolute value of the score
is often high. Cramer-Rao inequality expresses a lower bound on the variance
of an unbiased statistical estimator, based on the Fisher information.
2
1
1
=
F isher inf ormation E [ 2 ln(L( 2 ; X))]2 | 2
If the prior probability of detection of the various IDSs are known, the weights
wi|i=1,n can be assigned to the individual IDSs. The idea is to estimate the
local accuracy of the IDSs. The decision of the IDS with the highest local accuracy estimate will be having the highest weighting on aggregation. The best
fusion algorithm is supposed to choose the correct class if any of the individual IDS did so. This is a theoretical upper bound for all fusion algorithms. Of
course, the best individual IDS is a lower bound for any meaningful fusion algorithm. Depending on the data, the fusion may sometimes be no better than
Bayes. In such cases, the upper and lower performance bounds are identical and
there is no point in using a fusion algorithm. A further insight into CRB can be
gained by understanding how each IDS affects it. With the architecture shown
P
in Fig. 4.1, the model is given by s = ni=1 wi si . The bound is calculated from
2
the effective variance of each one of the IDSs as i2 = wi2 and then combining
i
them to have the CRB as Pn 1 1 .
i=1
2
i
Chapter 4
99
(4.23)
wi2
i=1 i2
It can be observed from equation 4.23 that any IDS decision that is not reliable
will have a very limited impact on the bound. This is because the non-reliable
IDS will have a much larger variance than other IDSs in the group; n2 12 ,2 and hence 1 1 , - - , 1 . The bound can then be approximated
- - , n1
2
2
2
as
1
Pn1
i=1
2
i
n1
Also, it can be observed from equation 4.23 that the bound shows asymp2 =
totically optimum behavior of minimum variance. Then, i2 > 0 and min
min[i2 , , n2 ], then
1
CRB = Pn
2 2
< min
i
1
i=1 2
i
(4.24)
1
12
1
++
1
n2
(4.25)
1
n
2
= Ltn
2
=0
n
(4.26)
From equation 4.25 and equation 4.26, it can be easily interpreted that increasing the number of IDSs to a sufficiently large number will lead to the performance bounds towards perfect estimates. Also, due to monotone decreasing
nature of the bound, the IDSs can be chosen to make the performance as close
Chapter 4
100
to perfect.
Fusion of Dependent Sensors
In most of the sensor fusion problems, individual sensor errors are assumed to
be uncorrelated so that the sensor decisions are independent. While independence of sensors is a good assumption, it is often unrealistic in the normal case.
Considering the general case of statistically dependent decisions, the BahadurLazarsfeld expansion of probability density functions can be used for analysis.
Consider s = [s1 , ..., sn ] to be a vector of the
correlated decisions from individual sensors and P (s) be the probability density function of s. With the prior probabilities of normal traffic and attack
traffic being P0 and P1 respectively, the conditional pdf P (s|attack) is insi pi
troduced as normalized random variable ri1 =
pi qi , where pi = P (si =
1|attack) and qi = 1 pi , i = 1, 2, ..., n, and pdf P (s|normal) as norsi pi
malized random variables ri0 =
pi qi , where pi = P (si = 1|normal) and
qi = 1 pi , i = 1, 2, ..., n. The normalized random variables ri1 and ri0 have
zero mean and unit variance. The Bahadur-Lazarsfeld polynomials are defined
as: i (s) = [1, r1 , r2 , ..., rn , r1 r2 , r1 r3 , , ..., r1 r2 ..rn ] for the respective values
of i = [0, 1, 2, ..., n, n + 1, n + 2, ..., 2n 1] and recalling that the BahadurLazarsfeld polynomial is a product of normalized variables ri , and simplifying,
P
the correlation coefficients of {ri }ni=1 by order ij = s ri rj P (s) (second-order
P
correlation coefficient) and ij...n = s ri rj ...rn P (s) (nth -order correlation coefficient).
Bahadur-Lazarsfeld polynomials
Using the decisions of the local sensors as its input, the fusion unit performs
a likelihood ratio test in order to make a global decision. The optimal fusion
rule of the fusion unit is given by the likelihood ratio as below:
Y (1 F N )
FN
P (S|attack)
=
(
)(si ) (
)(1si)
L(S) =
P (S|normal) i=1
FP
(1 F P )
Chapter 4
X
1+
i<j
X
1+
101
1
ij1 ri1 rj1 rk1 + ... + 12...n
r11 r21 ...rn1
i<j<k
i<j
0
ij0 ri0 rj0 rk0 + ... + 12...n
r10 r20 ...rn0
i<j<k
The log-likelihood ratio for the problem of deciding between the hypotheses,
attack or normal is given by:
logL(s) =
n
X
i=1
1+
log
X
(1 F N )(1 F P )
FN
]+
log(
)+
si [log
FN FP
(1
F
P
)
i=1
i<j
1+
1
r11 r21 ..rn1
ij1 ri1 rj1 rk1 + .. + 12..n
i<j<k
i<j
0
r10 r20 ..rn0
ij0 ri0 rj0 rk0 + .. + 12..n
i<j<k
where the global detections are given in terms of r1 , ..., rn . The log-likelihood
ratio gives the data fusion rule for a distributed detection system with correlated
local decisions. If the conditional correlation coefficients above a certain order
can be neglected, as in many practical applications, the computational burden
can be reduced. If most correlation coefficients of the local decisions are zero,
the computation gets simplified to the optimal data fusion rule developed by
Chair and Varshney [35] for independent local decisions.
As an illustration, let us consider
a system with three individual IDSs, with a joint density at the IDSs having a
covariance matrix of the form:
Setting bounds on false positives and true positives
1 12 13
V
= 21 1 23
31 32 1
With fusion doing an aggregation of the individual decisions, the false alarm
Chapter 4
102
Z
= 1P r(s1 = 0, s2 = 0, s3 = 0|normal) = 1
Ps (s|normal)ds
(4.29)
where Ps (s|normal) is the density of the sensor observations under the hypothesis normal and is a function of the correlation coefficient, . Assuming a
single threshold, T, for all the sensors, and with the same correlation coefficient,
between different sensors, a function Fn (T |) = P r(s1 = 0, s2 = 0, s3 = 0)
can be defined.
Z
Fn (T |) =
y
F n(
)g(y)dy
1
(4.30)
where g(.) and F (.) are the standard normal density and cumulative distribution
function respectively.
F n (X) = [F (X)]n
(4.31)
1
Equation 4.29 can be written depending on whether > n1
or not, as:
Z
y
3 T
max = 1
F (
)f (y)dy
0<1
(4.32)
1
and
max = 1 F 3 (T |)
0.5 < 1
(4.33)
With this threshold T , the probability of detection at the fusion unit can be
computed as:
Z
y
T
T Pmin = 1
F 3(
)f (y)dy
0 < 1 (4.34)
1
and
T Pmin = 1 F 3 (T S |)
0.5 < 1
(4.35)
The above equations 4.32, 4.33, 4.34, 4.35, show the performance improvement of sensor fusion where the upper bound on false positive rate and lower
bound on detection rate are fixed. The system performance was shown to deteriorate when the correlation between the sensor errors is positive and increasing,
Chapter 4
103
4.7 Summary
One of the common reasons for the avoidance of the IDSs as the second and last
stage of defense is due to its less than satisfactory performance. Consequently,
improving the IDS performance is a significant research challenge. In this chapter, we prove that it is possible to improve the performance with multiple IDSs
using advances in sensor fusion. The chapter includes mathematical basis for
sensor fusion in IDS with the theoretical formulation and analysis on the acceptability of sensor fusion in intrusion detection. The sensor fusion system is
characterized and modeled with no knowledge of the IDSs and the intrusion detection data. The need of sensor fusion in IDS is envisaged. The evidence theory
is the method most suited for the fusion of IDSs as seen in this chapter. Having
chosen the sensor fusion method, we address the issues related to sensor fusion
like choosing the threshold bounds, rule-based fusion, Data-dependent Decision
fusion and the modified evidence theory in chapters 5, 6, and 7 respectively.
The study undertaken in this chapter contributes to fusion field in several aspects. It is expected that positive correlation improves reliability of fusion while
negative correlation, improves fusion by means of improved coverage. In this
theoretical study, independent as well as dependent detectors were considered
and the study clarifies the intuition that independence of detectors is crucial in
determining the success of fusion operation. In the case when they are dependent, fusion will lead to improved results but the gain will be smaller. This
is explained by variance reduction due to the combination. The later half of
the chapter takes into account the analysis of the sensor fusion system with a
knowledge of the network traffic distribution. This analysis also resulted in the
acceptance of sensor fusion for enhancing the performance of the intrusion detection. These results are further supported by empirical evidence in the later
chapters.
Chapter 5
Selection of Threshold Bounds for
Effective Sensor Fusion
I have not failed. I have just found 10,000 ways that wont work.
Thomas Alva Edison
5.1 Introduction
In this chapter, we prove the distinct advantages of sensor fusion over individual IDSs. Fusion threshold bounds are derived using the principle of Chebyshev inequality at the fusion center using the false positive rates and detection rates of the IDSs. The goal is to achieve best fusion performance with
the least amount of model knowledge, in a computationally inexpensive way.
The anomaly-based IDSs detect anomalies beyond a set threshold level in the
features it detects. Threshold bounds instead of a single threshold give more
freedom in steering system properties. Any threshold within the bounds can
be chosen depending on the preferred level of trade-off between detection and
false alarms.
All the related work in the field of sensor fusion has been carried out mainly
with one of the methods like probability theory, evidence theory, voting fusion
theory, fuzzy logic theory or neural network in order to aggregate information.
The Bayesian theory is the classical method for statistical inference problems.
The fusion rule is expressed for a system of independent learners, with the distribution of hypotheses known a priori. The Dempster-Shafer decision theory
104
Chapter 5
105
is considered a generalized Bayesian theory. It does not require a priori knowledge or probability distribution on the possible system states like the Bayesian
approach and it is mostly useful when modeling of the system is difficult or
impossible [106]. An attempt to prove the distinct advantages of sensor fusion
over individual IDSs is done in the next section using the Chebyshev inequality,
as an extension to the work done by Zhu et al. [107].
i=1
Let D and F denote the unanimous detection rate and the false positive rate
respectively. The mean and variance of s in case of attack and no-attack, are
given by the following equations:
E[s|alert] =
n
X
Di , V ar[s|alert] =
i=1
E[s|alert] =
n
X
i=1
n
X
Di (1 Di ); in case of attack
i=1
Fi , V ar[s|alert] =
n
X
Fi (1 Fi ); in case of no-attack
i=1
The fusion IDS is required to give a high detection rate and a low false positive rate. Hence the threshold T has to be chosen well above the mean of the
false alerts and well below the mean of the true alerts. The Figure 5.1 shows a
typical case where the threshold T is chosen at the point of overlap of the two
parametric curves for normal and attack traffics. Consequently, the threshold
bounds are given as:
Chapter 5
106
n
X
i=1
Fi < T <
n
X
Di
i=1
The detection rate and the false positive rate of the fusion IDS is desired to
surpass the corresponding weighted averages and hence:
n
X
D>
Di2
i=1
n
X
(5.1)
Di
i=1
and
n
X
F <
(1 Fi )Fi
i=1
n
X
(5.2)
(1 Fi )
i=1
i=1
i=1
Chapter 5
107
P r {|s E(s)| k}
V ar(s)
k2
With the assumption that the threshold T is greater than the mean of normal
activity,
n
X
Di (1 Di )
n
n
X
X
P r{|s
Di (
Di T )|attack}
1 i=1n
X
i=1
i=1
(
Di T )2
n
X
i=1
n
X
Di (1 Di )
i=1
n
X
Di2
i=1
n
X
Di T )2
i=1
Di
i=1
i=1
n
X
Fi (1 Fi )
i=1
(T
n
X
i=1
Fi )2
i=1
n
X
(1 Fi )
i=1
i=1
Fi (1 Fi )
Chapter 5
108
v
uX
N
u n
P
Fi + t (1 Fi ),
i=1
i=1
n
X
u n
uX
Di t
Di .
i=1
i=1
Since the threshold T is assumed to be greater than the mean of normal activity,
the upper bound of false positive rate F can be obtained from the Chebyshev
inequality as:
V ar[s]
F
(5.3)
(T E[s])2
In a statistical intrusion detection system, a false positive is caused due to the
variance of network traffic during normal operations. Hence, to reduce the false
positive rate, it is important to reduce the variance of the normal traffic. In
the ideal case, with normal traffic the variance is zero. The above equation
5.3 shows that as the variance of the normal traffic approaches zero, the false
positive rate should also approach zero. Also, since the threshold T is assumed
to be less than the mean of the intrusive activity, the lower bound of the detection
rate D can be obtained from the Chebyshev inequality as:
D 1
V ar[s]
(E[s] T )2
(5.4)
For an intrusive traffic, the factor Di (1 Di ) remains almost steady and hence
the variance given as:
n
X
Di (1 Di ), is an appreciable value. Since the variance of
V ariance =
i=1
the attack traffic is above a certain detectable minimum, from equation 5.4, it
is seen that the correct detection rate can approach an appreciably high value.
Similarly the true negatives will also approach a high value since the false positive rate is reduced with IDS fusion.
It has been proved above that with IDS fusion, the variance of the normal traffic
is clearly dropping down to zero and the variance of the intrusive traffic stays
above a detectable minimum. This additionally supports the proof that the fusion IDS gives better detection rate and a tremendously low false positive rate.
Chapter 5
109
Chapter 5
110
Table 5.1: Types of attacks detected by PHAD at 0.00002 FP rate (100 FPs)
Attack type Total attacks Attacks detected % detection
Probe
37
22
60%
DOS
63
24
38%
R2L
53
6
11%
U2R/Data
37
2
5%
Total
190
54
28%
Table 5.2: Types of attacks detected by ALAD at at 0.00002 FP rate (100 FPs)
Attack type Total attacks Attacks detected % detection
Probe
37
6
16%
DOS
63
19
30%
R2L
53
25
47%
U2R/Data
37
10
27%
Total
190
60
32%
Table 5.3: Types of attacks detected by the combination of ALAD and PHAD at 0.00004 FP
rate (200 FPs)
Attack type Total attacks Attacks detected % detection
Probe
37
24
65%
DOS
63
39
62%
R2L
53
26
49%
U2R/Data
37
10
27%
Total
190
99
52%
Chapter 5
111
Table 5.6: F-score of fused IDS for different choice of false positives
FP TP Precision Recall Overall Accuracy F-score
50
44
0.46
0.23
0.99
0.31
100 73
0.42
0.38
0.99
0.40
200 99
0.33
0.52
0.99
0.40
500 108
0.18
0.57
0.99
0.27
for various values of false positives by setting the threshold appropriately. The
improved performance of the combination of the alarms from each system can
be observed in Table 5.6, corresponding to the false positives between 100 and
200, by fixing the threshold bounds appropriately. Thus the combination works
best above a false positive of 100 and much below a false positive of 200. In
each of the individual IDSs, the number of detections were observed at false
positives of 50, 100, 200 and 500, when trained on inside week 3 and tested
on weeks 4 and 5. Figures 5.2, 5.3, 5.4 and 5.5 show the selected thresholds
for the false positives of 50, 100, 200 and 500. The fusion IDS has improved
performance than single IDSs for all the threshold values. The performance is
seen to be optimized within the bounds of 100 to 200 false positives.
The improved performance of the fusion IDS over some of the fusion alternatives using the real-world network traffic is shown in Table 5.7.
Chapter 5
112
Chapter 5
113
Detector/
Total
Fusion Type Attacks
PHAD
45
ALAD
45
OR
45
AND
45
SVM
45
ANN
45
Fusion IDS
45
TP
FP
Precision
Recall
F-score
10
18
22
9
19
19
20
45
45
77
29
49
68
37
0.18
0.29
0.22
0.24
0.3
0.22
0.35
0.22
0.4
0.49
0.2
0.42
0.42
0.44
0.2
0.34
0.30
0.22
0.35
0.29
0.39
Table 5.7: Comparison of the evaluated IDSs using the real-world network traffic
5.4 Summary
Simple theoretical model is initially illustrated in this chapter for the purpose
of showing the improved performance of fusion IDS. The detection rate and
the false positive rate quantify the performance benefit obtained through the
fixing of threshold bounds. Also, the more independent and distinct the attack
space is for the individual IDSs, the better the fusion IDS performs. The theoretical proof was supplemented with experimental evaluation, and the detection
rates, false positive rates, and F-score were measured. In order to understand
the importance of thresholding, the anomaly-based IDSs, PHAD and ALAD
have been individually analyzed. Preliminary experimental results prove the
correctness of the theoretical proof. The chapter demonstrates that our technique is more flexible and also outperforms other existing fusion techniques
such as OR, AND, SVM, and ANN using the real-world network traffic embedded with attacks. The experimental comparison using the real-world traffic
has thus confirmed the usefulness and significance of the method. The unconditional combination of alarms avoiding duplicates as shown in Table 5.3 results
in a detection rate of 52% at 200 false positives, and F-score of 0.4. The combination of highest scoring alarms as shown in Table 5.6 using the DARPA 1999
data set results in a detection rate of 38% and threshold fixed at 100 false positives, and F-score of 0.4.
Chapter 6
Performance Enhancement of IDS using
Rule-based Fusion and Data-dependent
Decision Fusion
There is no greatness where there is no simplicity, goodness and truth.
Leo Tolstoy
6.1 Introduction
In the previous chapter the utility of sensor fusion for improved sensitivity and
reduced false alarm rate was illustrated. In this chapter we have further explored
the general problem of the poorly detected attacks. The poorly detected attacks
reveal the fact that they are characterized by features that do not discriminate
them much. This chapter discusses the improved performance of multiple IDSs
using rule-based fusion and Data-dependent Decision fusion (or DD fusion for
the purposes of this document). The DD fusion approach gathers an in-depth
understanding about the input traffic and also the behavior of the individual
IDSs by means of a neural network learner unit. This information is used to
fine-tune the fusion unit since the fusion depends on the input feature vector.
Thus fusion implements a function that is local to each region in the feature
space. It is well-known that the effectiveness of sensor fusion improves when
the individual IDSs are uncorrelated. The training methodology adopted in this
work takes note of this fact. The performance of Snort has been improved by
114
Chapter 6
115
enhancing its rule base. The overall performance of the fused IDSs using rulebased fusion shows an overall enhancement in the performance with respect to
the performance of individual IDSs. For illustrative purposes two different data
sets, namely the DARPA 1999 data set as well as the real-world network traffic
embedded with attacks, have been used. The DD fusion shows a significantly
better performance with respect to the performance of individual IDSs.
The related work of sensor fusion in intrusion detection application is discussed in chapter 4. The problem of designing IDSs to work effectively and
yield higher accuracies for minority attacks like R2L and U2R even in the mix
of data skewness has been receiving serious attention in recent times. Other
than the related work discussed in chapter 4, predictive classifier models for
rare events are given in [33] and [109]. But, none of these attempts have shown
any significant contribution in overcoming the data skewness problems. Hence
in spite of all the earlier attempts, there is still room for a significant improvement in the detection of rare attacks.
The chapter is organized as follows. Section 6.2 discusses the rule-based
fusion of IDSs. Section 6.3 explains the proposed Data-dependent Decision
(DD) fusion architecture. Section 6.4 describes the algorithm of the proposed
data-dependent decision fusion architecture. This chapter also illustrates and
discusses the results of the proposed architecture. In section 6.5 the conclusion
of the chapter is drawn.
Chapter 6
116
Chapter 6
117
Table 6.1: Types of attacks detected by the rule-based combination of ALAD and PHAD at a
FP rate of 0.000025 (125 FPs)
Attack type Total attacks Attacks detected % detection
Probe
37
24
65%
DOS
63
39
62%
R2L
53
26
49%
U2R/Data
37
10
27%
Total
190
99
52%
Experimental results show that the rule-based fusion performs better with a detection rate of 52% at 125 false positives, F-score being 0.48, than the other
two combinations in the previous chapter. The rule-based fusion works significantly well, compared to the threshold based fusion in the case of detection of
known attacks. However, the threshold-based approach offers an advantage in
identifying the unknown attacks. The rule-based fusion IDS can detect some of
the well-known intrusions with high detection rate, but it is difficult to detect
novel intrusions, and its rule set has to be updated manually and frequently [7].
Thus, while the results were encouraging, it was realized that rule-based fusion
has no possibility of generalizing from previously observed behavior. As a result, the research was pursued further to generalize rule-based fusion and also
to overcome the other disadvantages of rule-based fusion.
Chapter 6
118
in-depth understanding about the input traffic and also the behavior of the individual IDSs. This helps in automatically learning the individual weights for the
combination when the IDSs are heterogeneous and shows difference in performance. The architecture should thus be data-dependent and hence the rule set
has to be developed dynamically.
A new data-dependent architecture underpinning sensor fusion to significantly enhance the IDS performance was introduced and implemented in this
work. To this end, the decisions of various IDSs were combined with weights
derived using a machine learning approach. This architecture is different from
conventional fusion architectures and guarantees improved performance in terms
of detection rate and false alarm rate, works well even for large data sets, is capable of identifying novel attacks since the rules are dynamically updated and
has improved scalability.
6.3.1 Motivation
After the 1998 DARPA IDS evaluation, the MIT Lincoln Laboratory has reported that if the best performing systems against each one of the different categories of attacks were combined into a single system, then roughly between 60
to 70 percentage of the attacks would have been detected with a false positive
rate of lower than 0.01%, i.e., lower than 10 false positives a day. However,
none of the previous work of sensor fusion in IDS has reached the Lincoln
Laboratory prediction. None of these approaches can avoid the effect due to
systematic errors of the individual IDSs. They are also prone to mistakes for
unrealistic confidences of certain IDSs. The availability of large volume of experimental data has motivated us to use the machine learning concepts to fuse
the data. The individual weights of the IDSs can be obtained by learning the
behavior of various IDSs for different attack classes, and these weighted decisions can be combined in efficient ways.
Chapter 6
119
fusion techniques which are either implicitly data dependent or data independent, is due to the unrealistic confidence of certain IDSs. The idea in the proposed architecture is to properly analyze the data and understand when the individual IDSs fail. The fusion unit should incorporate this learning from input
as well as from the output of detectors to make an appropriate decision. We
Figure 6.1: Data-dependent Decision fusion architecture
INPUT (x)
IDS1
IDS2
S1
IDSn
S2
Sn
S1
S2
Sn
S1
S2
Sn
OUTPUT (y)
FUSION UNIT
w1
w2
Chapter 6
120
different IDSs when they are presented with a certain attack clearly tell which
sensor generated more precise result. The output of the neural network unit corresponds to the weights which are assigned to each one of the individual IDSs.
The IDSs can be fused to produce an improved resultant output with these improved weight factors. Thus, the proposed architecture refers to a collection
of diverse IDSs that respond to an input traffic and the weighted combination
of their predictions. The weights are learned by looking at the response of the
individual sensors for every input traffic connection. The fusion output can be
represented as:
y = Fj (wij (x, sji ), sji ),
where the weights wij are dependent on both the input x as well as individual IDSs output Si , the suffix i refers to the IDS index and the prefix j refers to
the class label. The fusion unit gives a value of zero or one depending on the set
threshold being higher or lower than the weighted aggregation of the decisions
of individual IDSs respectively.
In the case of intrusion detectors ALAD and PHAD, the training is done by
considering more of the data and at the same time optimally, which will likely
decrease the bias of the individual detectors. So the individual IDSs chosen are
of low bias, comparatively high variance (which gets reduced on fusion), and a
low error correlation (or they make different error or a high variance component
of error). Hence the proposed data-dependent architecture allows the IDSs to
develop diversity while being trained. What is required of the final fusion unit
is that it generalizes well after training (reduced bias) on an unexpected traffic stream and additionally avoid over-fitting, which ensures variance reduction.
Thus the proposed architecture exploits the experimental observation made in
work of Giacinto et al.[95] that training is done in different feature subspaces.
The test has been conducted on the entire test set and then the evidence is combined to produce the final decision. The neural network learner was introduced
to process the entire available feature set to extract more effective signatures
than the ones hand-coded by the rule-based fusion. The algorithm of the proposed data-dependent architecture is given in section 6.5.3.
Chapter 6
121
Chapter 6
122
Chapter 6
123
Chapter 6
124
The DARPA 1999 test dataset {(xj )}where j refers to the number of the
record in the dataset.
The testing of the three IDSs are done on the test data of weeks four and
five.
Output:
The IDSs output {si } where i corresponds to the IDS number.
Training of Neural Network learner
Input:
The IDSs output {si } where i corresponds to the IDS index.
The DARPA training data set {(xn , yn )}where n refers to the number of
the record in the data set.
The IDS outputs as well as the training class labels are such that si , yi Ck
where Ck are the 58 class labels, and k varies from 1 to 58. With the IDSs used
in this experiment, it was simplified as a binary detector with class labels either
zero or one depending on the anomaly score of the anomaly detectors or the
severity of the Snort alert.
Training:
MATLAB Neural Network tool box is used.
Algorithm:Feed Forward Back Propagation
:Four input neurons
:One hidden layer with 25 sigmoidal units
:Output layer of three neurons
The three inputs correspond to the outputs of the three constituent IDSs
and the fourth input neuron is a vector which corresponds to a single record
of the DARPA data set where all values of the vector are run.
Chapter 6
125
Testing:
MATLAB Neural Network tool box was used.
Algorithm:Feed Forward Network
:Four input neurons
:One hidden layer with 25 sigmoidal units
:Output layer of three neurons
Output:
n o
wij is the output of the NN learner which is expected to give a measure
of the reliability of each of the IDS, i depending on the observed data class
type j.
Fusion Unit
Input:
The IDSs output {si } where i corresponds to the index of IDSs.
Chapter 6
126
Table 6.2: Types of attacks detected by PHAD at a false positive rate of 0.00002 (100 FPs)
Attack Type Total attacks Attacks detected % detection
Probe
37
26
70%
DoS
63
27
43%
R2L
53
6
11%
U2R/Data
37
4
11%
Total
190
63
33%
n o
The weight factor for the fusion process, wij which is the output of the
NN learner which is expected to give a measure of the reliability of each
of the IDS, i depending on the observed data class type j.
Output:
Binary fusion output is one if the weighted linear aggregation of the output
from all the IDSs is greater than zero and zero otherwise.
y = 1,
0,
if
wij si > 0
otherwise
Chapter 6
127
Table 6.3: Types of attacks detected by ALAD at a false positive rate of 0.00002 (100 FPs)
Attack Type Total attacks Attacks detected % detection
Probe
37
9
24%
DoS
63
23
37%
R2L
53
31
59%
U2R/Data
37
15
31%
Total
190
78
41%
Table 6.4: Types of attacks detected by Snort at a false positive rate of 0.0002 (1000 FPs)
Attack Type Total attacks Attacks detected % detection
Probe
37
15
41%
DoS
63
35
56%
R2L
53
30
57%
U2R/Data
37
34
92%
Total
190
115
61%
Table 6.5: Types of attacks detected by DD fusion IDS at a false positive rate of 0.00002 (100
FPs)
Attack Type Total attacks Attacks detected % detection
Probe
37
28
76%
DoS
63
40
64%
R2L
53
34
64%
U2R/Data
37
34
92%
Total
190
136
70%
Table 6.6: Comparison of the evaluated IDSs with various evaluation metrics
Detector
P
R
F-score Accuracy AUC
PHAD
0.39 0.33
0.36
0.99
0.66
ALAD
0.44 0.41
0.42
0.99
0.71
Snort
0.10 0.61
0.17
0.99
0.80
Data-dependent fusion 0.42 0.7
0.53
0.99
0.85
Table 6.7: Detection of different attack types by single IDSs and data-dependent decision fusion
IDS
Fusion/
Attack Type PHAD ALAD Snort Data-dependent decision fusion
Detection %
Probe
70%
24%
41%
76%
DoS
43%
37%
56%
64%
R2L
11%
59%
57%
64%
U2R/Data
11%
31%
92%
92%
False Positive% 0.002% 0.002% 0.02%
0.002%
Chapter 6
128
IDSs and the DD fusion IDS are given in Figure 6.3. The log-scale was used
for the x-axis to identify the points which would otherwise be crowded on the
y-axis. The results presented in Table 6.7 and Figure 6.4 indicate that the DD
fusion method performs significantly better with high recall as well as high precision as against achieving the high accuracy alone using the DARPA data set.
In the case of an IDS, there are both the security requirements and the acceptability requirements. The security requirement is determined by the T Prate and
the acceptability requirement is decided by the number of F P s because of the
low base rate in the case of network traffic. The hypothesis that the proposed
model is suitable for the detection of rare classes of attacks is empirically evaluated in the next section using the DARPA 1999 data set. It may be noted that
the false positive rates differ in the case of Snort as it was extremely difficult to
try for a fair comparison with equal false positive rates for all the IDSs because
of the unacceptable ranges for the detection rate under such circumstances.
Figure 6.2: Performance of evaluated systems
0.8
PHAD
ALAD
Snort
DD
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0 6
10
10
10
10
10
10
10
Chapter 6
129
Chapter 6
130
Detector/
Fusion Type
PHAD
ALAD
Snort
OR
AND
SVM
ANN
Data-dependent
Decision Fusion
Total
Attacks
45
45
45
45
45
45
45
45
TP
FP
Precision
Recall
F-score
11
20
13
34
9
23
25
27
45
45
400
470
29
44
94
42
0.2
0.31
0.03
0.07
0.22
0.24
0.21
0.39
0.24
0.44
0.29
0.76
0.2
0.51
0.56
0.6
0.22
0.36
0.05
0.13
0.22
0.33
0.31
0.47
Table 6.8: Comparison of the evaluated IDSs using the real-world data set
Table 6.9: Performance comparison of single IDSs and DD fusion IDS
Detector pairs/
DD fusion and PHAD DD fusion and ALAD DD fusion and Snort
Z-number
ZR
7.2
5.7
1.9
ZP
0.3
-0.2
4.1
Table 6.8 demonstrates that the DD fusion method outperforms other existing
fusion techniques such as OR, AND, SVM, and ANN using the real-world network traffic.
The comparison of IDSs with the metric F-score has the limitation in directly applying tests of significance to it in order to determine the confidence
level of the comparison. The primary goal of this work is to achieve improvement in recall as well as precision for the rare classes. Hence an improved IDS
comparison test called the P-test [110] was included to take into account the
improvement in both recall as well as precision. This advanced test to compare
IDS performance takes into account the improvement in both recall as well as
precision. The result in Table 6.9 shows that the DD fusion performs significantly better than any of the individual IDSs. It performs better than PHAD
and ALAD in terms of recall and is comparable to PHAD and ALAD in case of
precision. The DD fusion works exceptionally better than the Snort in terms of
both precision and recall. Hence the proposed approach outperforms the existing state-of-the-art techniques of its class, for an optimum performance in terms
of both recall and precision.
Chapter 6
131
In real world network environment, the rare attacks like U2R and R2L are more
dangerous than probe and DoS attacks. Hence, it is essential to improve the
detection performance of these rare classes of attacks while maintaining a reasonable overall detection rate. The results presented in Table 6.7 and Figure
6.4 indicate that the proposed method performs significantly better for rare attack types with a high recall as well as a high precision as against achieving
the high accuracy alone. The claim that the proposed method performs better is supported by a statement from Kubat et al. [1998], which states that
a classifier that labels all regions as majority class will achieve an accuracy of
96%...a system achieving 94% on the minority class and 94% on the majority
class will have worse accuracy yet be deemed highly successful. With the proposed method, an intrusion detection of 70% with a false positive of as low as
0.002% has been achieved. The F-score has been improved to 0.53.
6.4.5 Discussion
Most of the U2R attacks like loadmodule, perl and sqlattack are made stealthy
by running the attack over multiple sessions. These attacks are detected by Snort
but at the expense of a higher false positive rate. Snort was comparably better
in the detection of all the new attacks like queso, arppoison, dosnuke, selfping, tcpreset, ncftp, netbus, netcat, sshtrojan, ntfsdos, and sechole for which
Snort had the signatures available. Snort identifies the attack warezclient which
downloads illegal copies of software by an addition of the rule of looking for
executable codes on the FTP port. Thus each rule in the rule set uses the most
discriminating feature values for classifying a data item into one of the class
types.
Although the research discussed in this thesis has thus far focused on the
three IDSs, namely, PHAD, ALAD and Snort, the algorithm works well with
any IDS. The proposed system provides great benefit to a security analyst. Snort
was comparably better in the detection of all the new attacks for which the signatures were available with Snort. The computational complexity introduced by
the proposed method can be justified by the possible gains which are illustrated.
The result of the data-dependent decision fusion method is better than what
has been predicted by the Lincoln Laboratory after the DARPA IDS evaluation.
Chapter 6
132
With the fusion architecture proposed in this chapter, an improved intrusion detection of 70% with a false positive of as low as 0.002% and F-score to 0.53
were achieved.
6.5 Summary
We have adapted and extended notions from the field of multisensor data fusion
for the rule-based fusion and the data-dependent decision fusion. An enhancement in the performance of the combined detector using simple rule-based fusion is demonstrated in the initial part of this chapter, with fusion making use
of the objective certainty of a hypothesis to occur given a particular sensor as
the component in fusion. The extensions are principally in the area of generalizing feature similarity functions to comprehend observances in the intrusion
detection domain. The approach has the ability to fuse decisions from multiple,
heterogeneous and sub-optimal IDSs.
In the proposed data-dependent decision fusion architecture, a neural network unit was used to generate a weight factor depending on the input as well
as the IDSs outputs. The method considers appropriate weights to various individual IDSs that take part in fusion. This results in a more accurate and precise
detection for a wider class of attacks. If the individual sensors were complementary and looked at different regions of the attack domain, then this DD fusion
enriches the analysis on the incoming traffic to detect attack with appreciably
low false alarms. The individual IDSs that are components of this architecture
in this particular work were PHAD, ALAD and Snort with detection rates 0.33,
0.41 and 0.61 respectively after modifications to these IDSs. The false positive
rates of PHAD and ALAD were acceptable whereas that of Snort was unexceptionally high. The results obtained by the proposed architecture illustrate that
the DD approach improved beyond the existing fusion approaches with the best
performance in terms of improved accuracy. The marginal increase in the computational requirement introduced by the data-dependency can be justified by
the acceptable ranges of false alarms and an overall detection rate of 0.7, which
have resulted with exceptionally large data set and suboptimal constituent IDSs.
It is also shown that our technique is more flexible and also outperforms other
Chapter 6
133
existing fusion techniques such as OR, AND, SVM, and ANN. The experimental comparison of the DD fusion method using the real-world traffic has
confirmed its usefulness and significance.
The data skewness in the network traffic demands for an extremely low false
positive rate of the order of the prior probability of attack for an acceptable value
of the Bayesian attack detection rate. The research and development efforts in
the field of IDS, and the state-of-the-art IDSs, all are still with marginal detection rates and high false positive rates, especially in the case of stealthy, novel
and R2L attacks. In the environment in which an IDS is expected to operate, the
attacks are the minority, requiring very low false positive rates for acceptable
detection. A basic domain knowledge about network intrusions makes us understand that U2R and R2L attacks are intrinsically rare. The poor performance
of the detectors has been improved by discriminative training of anomaly detectors and incorporating additional rules into the misuse detector. This chapter proposes a new approach of machine learning method where corresponding
learning problem is characterized by a number of features, skewness in data and
the class of interest being the minority class and the minority attack type, and
the non uniform misclassification cost. The proposed method has successfully
demonstrated that the neural network learner encapsulates expert knowledge for
the weighted fusion of individual detector decisions. This creates an adaptable
algorithm that can substantially outperform state-of-the art methods for minority class type detection in both coverage and precision. The evaluations show
the strength and ability of the proposed approach to perform very well with
64% detection for R2L attacks and 92% detection for U2R attacks with an overall false positive rate of 0.002%. The experimental comparison of this method
has confirmed its usefulness and significance.
Chapter 7
Modified Dempster-Shafer Theory for
Intrusion Detection Systems
A consensus mean everyone agrees to say collectively what no one else believes individually.
Abba Eban
7.1 Introduction
Sensor fusion using heterogeneous IDSs are employed to aggregate different
views of the same event. This helps in achieving improved detection through
detector reinforcement or complementarity. There is a factor of uncertainty in
the results of most of the IDSs available in literature. The main reasons for uncertainty are vagueness and imprecision. One of the techniques of sensor fusion
is the Dempster-Shafer evidence theory [112, 113, 114], which can be used to
characterize and model various forms of uncertainty. In DS theory, evidence
can be associated with multiple possible events, e.g., sets of events. As a result,
evidence in DS theory can be meaningful at a higher level of abstraction without
having to resort to assumptions about the events within the evidential set.
The use of data fusion in the field of DoS anomaly detection is presented by
Siaterlis and Maglaris [79]. The Dempster-Shafer theory of evidence is used
as the mathematical foundation for the development of a novel DoS detection
engine. The detection engine is evaluated using the real network traffic. The
superiority of data fusion technology applied to intrusion detection systems is
134
Chapter 7
135
presented in the work of Wang et al. [96]. The method used is information collection from the network and host agents and application of Dempster-Shafer
theory of evidence. Another work incorporating the Dempster-Shafer theory of
evidence is by Hu et al. [97]. The Dempster-Shafer theory of evidence in data
fusion is observed to solve the problem of how to analyze the uncertainty in a
quantitative way. In the evaluation, the ingoing and outgoing traffic ratio and
service rate are selected as the detection metrics, and the prior knowledge in the
DDoS domain is proposed to assign probability to evidence.
The most prominent of the alternative combination rules to the Dempster-Shafer
method is a class of unbiased operators developed by Ron Yager [115]. Yager
points out that an important feature of combination rules is the ability to update
an already combined structure when new information becomes available. This
is frequently referred to as updating and the algebraic property that facilitates
this is associativity. Dempsters rule is an example of an associative combination
operation and the order of the information does not impact the resulting fused
structure. Yager [116] points out that in many cases a non-associative operator is
necessary for combination. A familiar example of this is the arithmetic average.
The arithmetic average is not itself associative, i.e., one cannot update the information by averaging an average of a given body of data and a new data point to
yield a meaningful result. However, the arithmetic average can be updated by
adding the new data point to the sum of the pre-existing data points and dividing by the total number of data points. This is the concept of a quasi-associative
operator that Yager introduced in his work [116]. Quasiassociativity means that
the operator can be broken down into associative sub-operations. Through the
notion of quasi-associative operator, Yager develops a general framework to
look at combination rules where associative operators are a proper subset.
The Transferable Belief Model (TBM) is an elaboration on the Dempster-Shafer
theory of evidence developed by Smets [117], based on the intuition that in the
case of conflict, the result should allocate most of the belief weight to the empty
set. Technically, this would be done by using the TBM conjunction rule for
non-interactive sources of information, which is the same as Dempsters rule
of combination without renormalization. While most other theories adhere to
Chapter 7
136
the axiom that the probability (or belief mass) of the empty set is always zero,
Smet has an intuitive reason to drop this axiom: the open-world assumption. It
applies when the frame of reference is not exhaustive, when there are reasons
to believe that an event not described in the frame of reference will occur.
Murphy [118] presents another problem of classical Dempsters combination
rule, the failure to balance multiple bodies of evidence. The averaging best
solves the normalization problems and has much attractive features such as
identifying combination problems, showing the distribution of the beliefs and
preserving a record of ignorance. However, averaging does not offer convergence toward certainty. The fusion technique proposed in this chapter was expected to combine the output of various IDSs with subjective judgements. The
feasibility of this idea has been demonstrated via an analysis case study with
several IDSs distributed over a LAN network and using the replayed DARPA
data set. It was expected to have a sensor fusion architecture to face the new
challenges in sensor fusion. It can be later modified to a generalizable solution
beyond any specific application.
This chapter is organized as follows: In section 7.2, we briefly recall the DempsterShafer theory of evidence, its weighted extension and also its disadvantages.
Section 7.3 illustrates the disjunctive combination of evidence which helps in
evidence aggregation. Section 7.4 discusses the modified evidence approach
with a more detailed observation on the performance of this approach. A general discussion on the proposed approach for the particular application of sensor
fusion in intrusion detection is also included. Section 7.5 includes experimental
evaluation with a brief discussion on the impact of this work. In section 7.6 the
summary of this chapter is drawn.
Chapter 7
137
of the traffic sample x, that denotes the traffic sample to come from the class
which is an element of the FoD, . With n IDSs used for the combination,
the decision of each one of the IDSs is considered for the final decision of the
fusion IDS.
7.2.1 Motivation for choosing the Dempster Shafer combination method
Even though the research work discussed in chapter five has given encouraging results, we have realized that there was no possibility of detecting the novel
attacks because of the difficulty in generalizing from any previously observed
behavior. As a result, we pursued further by using a neural network learner that
understands the reliability of each one of the IDSs corresponding to the data
and accordingly provided a weight to every IDS decision and then make use of
an appropriate fusion operator. Specifically, we are interested in the capability
of neural network to learn the confidence to be assigned to every IDS and then
the fusion unit that optimally fuses the IDSs.
This thesis presents a method to detect the traffic attacks with an increased degree of confidence by a fusion system composed of different detectors. Each
IDS observes the same network traffic from various points on the network and
also detects the attack traffic with an uncertainty index. The frame of discernment consists of singletons that are exclusive (Ai Aj = , i 6= j) and are
exhaustive since the FoD consists of all the expected attacks which the individual IDS detects or else the detector fails to detect by recognizing it as normal
traffic. The DS rule corresponds to a conjunction operator: it builds the belief
induced by accepting two pieces of evidence, i.e., by accepting their conjunction. Shafer developed the DS theory of evidence based on the model that all
the hypotheses in the FoD are exclusive and that the frame is exhaustive. The
purpose is to combine/aggregate several independent and equi-reliable sources
of evidence expressing their belief on the set. The DS combination rule gives
Chapter 7
138
the combined mass of the two evidence m1 and m2 on any subset A of the FoD.
P
m(A) =
m1 (X)m2 (Y )
X Y =A
1
(7.1)
m1 (X)m2 (Y )
X Y =
The denominator of the equation 7.1 is of the form 1 k, where k is the conflict between the two evidence. This denominator is for normalization, which
spreads the resultant uncertainty of any evidence with a weight factor, over all
focal elements and results in an intuitive decision. Thus, the effect of normalization consists of eliminating the conflicting pieces of information between the
two sources to combine, consistently with the intersection operator. Whether
normalized or not, the DS method satisfies the two axioms of combination:
0 m(A) 1 and
m() = 0 is not
A
satisfied by the unnormalized DS method. Also, independence of evidence is
yet another requirement for the DS combination method. The classical DS theory treats all sensors democratically, but this is not reality since some are more
precise and accurate than others. Hence, we have the weighted Dempster-Shafer
method. In simplified situations this weight factor matches with the prior probability in the classical Bayesian inference method. The weighted and extended
DS can be used to:
Realize differential trust scheme on sensors
Mitigate conflicts that cause counter-intuitive results using classical DS
evidence combination rule.
7.2.2 Limitations of the Dempster-Shafer combination
In the case of full contradiction between the bodies of evidence, k = 1, and such
=1
a case occurs when there exists A such that Bel1 (A) = 1 and Bel2 (A)
as in Table 7.1. The computation of the combined evidence is done for DS
Chapter 7
139
m1
m2
A B
1 0
0 1
method and its other alternative methods like Yager [115, 116], Smets TBM
[117], and the Murphys averaging [118].
DS method: m(A) = 0, m(B) = 0
Yager method: m(A) = 0, m(B) = 0, m() = 1
Smets TBM: m(A) = 0, m(B) = 0, m() = 1
Murphys averaging: m(A) = 0.5, m(B) = 0.5
One more case of contradiction between the bodies of evidence is shown with a
different example in Table 7.2.
m1
m2
A
0.9
0
B
C
0.1 0
0.1 0.9
Chapter 7
140
PHAD(m1 )
ALAD(m2 )
Snort(m3 )
N ormal
0.4
0.1
0
P robe
0.6
0.1
0.3
DoS
0
0
0
U 2R
0
0
0
R2L
0
0.8
0.7
Table 7.3: Evidence from four sensors with one unreliable using the DS method
Another major drawback with DS and its alternatives except for Murphys is
that since it uses conjunctive combination, if any one or more sensors fail to
give evidence on a particular class, the evidence from other sensors on that particular class will have no effect and the intersection becomes a null set. A sensor
might fail to give evidence in cases when it is not tuned for that particular class
of attack due to the shortcomings of the technology used or due to some other
reason. This disadvantage was overcome by Murphys averaging, but this result
also looks counterintuitive since if one evidence fails, the belief of that hypothesis gets weakened. The DS combination with one of the sensors in the fusion
being totally unreliable gives rise to counter-intuitive results as illustrated with
an example in Table 7.3.
DS method: m(N ormal) = 0, m(P robe) = 1, m(DoS) = 0, m(U 2R) = 0,
m(R2L) = 0
Yager method: m(N ormal) = 0, m(P robe) = 0.018, m(DoS) = 0, m(U 2R) =
0, m(R2L) = 0, m() = 0.982
Smets TBM: m(N ormal) = 0, m(P robe) = 0.018, m(DoS) = 0, m(U 2R) =
0, m(R2L) = 0, m() = 0.982
Murphys averaging: m(N ormal) = 0.17, m(P robe) = 0.33, m(DoS) = 0,
m(U 2R) = 0, m(R2L) = 0.5
In this particular case with the evidence m1 unreliable, the result turns out to
be counter-intuitive on providing equal weight factor to all the evidence. If
proper weight factor is given to the evidence depending on their reliability, the
Murphys method gives acceptable results as in the modified averaging method.
We propose to use the union operator for aggregating the evidence in all the
above cases where the DS fails. This is because in case of conflict, further evidence aids convergence. Also in case of zero evidence from any one or more of
Chapter 7
141
the sensors, the union operator works the same as the averaging operator, which
is the best that can be thought of. Thus, if the intersection of the evidence is
not empty, the sources overlap and the combination rule can be intersection.
Otherwise at least one of the sources is necessarily wrong, and a more natural
combination rule is union, which assumes that all the sources are not wrong.
P
=
(m1 (X) + m2 (Y ))
X=Y =A
P
(m1 (X) + m2 (Y ))
X=Y
(7.2)
The numerator of the above equation relates to the disjunctive combination and
the final mass is calculated by the normalization with respect to the entire power
Chapter 7
142
set 2 which is closed under union, intersection and complement and hence is a
sigma algebra. This normalization allows the disjunctive combination equation
to satisfy all the axioms of the evidence theory.
Additionally, conflicts can be thought of as due to uncertainty whereby the IDS
cannot take the decision correctly and the collective information will also be
ambiguous. Hence only if the reliability of the sensors are known, a conclusion
can be drawn by suppressing some evidence over the others, else it is better
to aggregate all the evidence and combine conjunctively with other evidence
agreeable to the aggregated evidence. The normalization is not done till the
conjunctive combination is done for converging the results.
The properties of associativity and commutativity are satisfied with a disjunctive combination if normalization is done only at the final combination stage.
In intrusion detection application, with singletons as the expected hypotheses,
Bel(AU B) = Bel(A) + Bel(B), which is same as Bayesian method, since
in case of singletons DS simplifies to Bayesian. However, the advantage of
evidence combination is that more evidence can be combined in a single step,
without the knowledge of the associated probability distribution.
Chapter 7
143
The context-dependent operator is expected to have an adaptive feature of combining the information related to one class in one way, and the information
related to another class in another way.
The proposed hybrid operator works the same way as the DS operator except in
the case of conflict and when any belief mass happens to be zero, and also when
varying reliability need to be introduced on the different sensors. The combined
operator has a mass l referring to the modified mass, since it can exceed one at
intermittent stages due to para-consistency and is given by:
l=
n
P
i=1
wi l i
1k
n
Q
i=1
k li 6= 0
wi li
n
P
i=1
k = 0, any li = 0
1k
wi li
(7.3)
where wi is the weight associated to each sensor, k is the conflict between the
combining sensors and li is the mass associated with each sensor. The conditions and requirements of using this operator are the following:
The proportionate sensor weighting factor is used since the intrusion detection systems used for the combination are binary in nature. The axiom of
combination; i.e.,
A
weighting factors are made use of.
The weights assigned to the sensors should add to one;
n
P
i.e.,
wi = 1.
i=1
Chapter 7
144
The value of k lies between 0 and 1 and is the parameter that controls
the degree of compensation between the intersection and union parts. The
value of the conflict factor (k) between any two sensors can be calculated
as the Euclidean distance between the two sensors. k takes a value of zero
if the sensors are in consensus and non-zero in case of conflict.
The method that was proposed with this operator gives the most intuitive result
and works as follows:
Disjunctive combination is done on diverse pair of IDSs (averages out
without suppressing any evidence). Pair-wise disjunctive combination is
done on all the IDSs which are necessarily not redundant (since it is intuitive to think of a stronger evidence in case of redundancy rather than averaging out which does not give any additional support even though both
IDSs support it).
The results are then conjunctively combined if not totally contradicting
after the pair-wise aggregation, since at this stage suppression of any evidence will not happen to a higher extent and also suppression of certain
evidence helps in faster convergence.
In the case of redundancy, if we use disjunctive combination it is required
to work without normalization in all the intermittent combinations for the
sake of making the strong evidence still stronger.
In order to satisfy all axioms of evidence theory it is required to do normalization at the final stage so that all the masses of a particular evidence
sum to one.
It can be concluded that the proposed method combines IDSs reasonably
well under all conditions by disjunctively combining diverse or contradicting pairs and finally suppressing the weak hypotheses by a pair-wise
conjunctive combination.
In the very specific case of binary evidence to singletons, it can be observed that there is no additional support happening with the addition of
redundant evidence. Hence, it is better to go in for disjunctive combination
in all the intermittent combinations till the final step where the conjunctive
combination helps in a faster convergence.
Chapter 7
145
m(A) = 1,
A
m() = 0 and 0 m(A) 1. Also, independence of the sources which combine is another assumption taken in the case of applicability of the combination.
Most of the commercial IDSs are signature-based which fail to identify the zeroday attacks while working with real-time traffic. In such cases, misclassification
or False Negative (which is again misclassification, since the FoD contains the
hypothesis normal which is an expected output of the IDS) is expected. Hence
the combination operator can assume a closed-world assumption as in the case
of DS method.
Chapter 7
146
The same example which was used to illustrate the performance of the DS
and other operators is taken again to illustrate the performance of the contextdependent operator. Applying the context-dependent method, with aggregation
PHAD(m1 )
ALAD(m2 )
Snort(m3 )
N ormal
0.4
0.1
0
P robe
0.6
0.1
0.3
DoS
0
0
0
U 2R
0
0
0
R2L
0
0.8
0.7
Table 7.4: Evidence from four sensors with one unreliable using the context-dependent operator
Chapter 7
147
This operator can combine evidence from two IDSs with different FoD.
Then the combined FoD will be the union of the FoDs.
The closure property is satisfied so as to stay within a given mathematical
framework.
This operator works under all conditions and states of the individual IDS.
This operator has been developed quite intuitively and hence the result is
most intuitive. The conjunctive operator is acceptable when all sources
happen to be reliable and similar, whereas the union operator corresponds
to the data aggregation from weaker reliability sources. Thus, conjunctive
combination makes sense when the mass distribution significantly overlap
and if not, the combination will have at least one of the sources as wrong
and it is better to choose disjunctive combination. Also, our intuition was
that a certain diversity among classifiers assures versatility whereas a certain redundancy assures reliability.
The combination operation is simple and easy.
Chapter 7
148
Chapter 7
149
Choice of the operator function depend on the context and hence has to be
carefully chosen.
Sources need to be independent for the combination. The Demspters idea
on the combination of independent sources of information can be stated
as follows: Suppose there are n pieces of evidence which are given in the
form of n probability spaces (i , i , mi ) where i is a subset of P (), the
power set of the FoD, each of which has a mapping relation with the same
space S through a multivalued mapping. These n sources are independent
and the explanation by Dempster is as follows: opinion of different sensors based on overlapping experiences could not be regarded as independent sources. Dempster assumes statistical independence of sources as
different measurements by different observations on different equipment
would often be regarded as independent ...
For a parallel combination of any model, the basic requirement is that the
combination should be associative.
7.4.2 Discussion
1. The selection of the IDSs has been done by choosing the sensors depending
on minimum correlation among sensors. The correlation coefficient n of
the available sensors is given by the formula:
f
+N )
,
n = N N fn(N
N t +n(N f +N t )
where n is the number of sensors, N is the number of experiments, N f is
the number of experiments where all classifiers fail to detect, and N t is the
number of experiments where all IDSs detect correctly. We refer to redundancy of classifiers when the correlation coefficient is one, similarity when
the correlation coefficient is greater than 0.5, diversity of classifiers when
correlation coefficient is less than or equal to 0.5 and totally contradicting
when the correlation coefficient is zero.
Chapter 7
150
2. It is quite intuitive to think that the fusion method should work with minimum number of IDSs and get the advantages of fusion techniques whatever be the fusion technique used. Every best IDS when merged should improve the confidence of the existing evidence and thereby converge faster
i.e., the fusion method makes strong evidence stronger so that confusion
(uncertainty ) is eliminated.
3. There are inherent advantages in using best IDSs in the fusion scheme.
This also makes sense intuitively in the case of evidence theory of fusion.
i.e., if one sensor gives its evidence and the second also gives a similar
evidence, the belief is reinforced stronger, or the contradiction in evidence
reduces the belief.
4. The DS method of combination implicitly has a closed world assumption,
i.e. the set of possible hypotheses is perfectly known. We assume that
represents a set of states that are mutually exclusive and exhaustive. In
intrusion detection, when dealing with real time traffic, need not necessarily be exhaustive, since the traffic may contain many novel attacks not
included in the . However in such cases, the intrusion detection systems
may be unable to detect it and hence it appears as normal which is also a
hypothesis included in the FoD. Hence an additional label denoting none
of the above is not needed because the none of the above attacks gets
included in the hypothesis normal by the evidence only or else this novel
attack gets misclassified as some other attack only.
5. The sources are assumed to be independent with DS, (m(A) = m1 (A)
m2 (A)), even though we may use IDSs that are trained from the same
training set, the two are independent of each other and they in turn depend
on the training data set. Decisions acquired from multiple IDSs are more
likely to be independent when they look at entirely different features of the
traffic. Else if the two IDSs make use of the same features for detection, the
detectors may give a consensus in their decision, which is different from
the dependence between the sensors. When fusing by means of mathematical decision rules, it is necessary to have independent detectors, because
this will simplify construction of the rule and enhance its efficiency.
6. It is important to note that when we apply the union operator, (m(A) =
Chapter 7
151
Chapter 7
152
PHAD
ALAD
Snort
Fusion output
Normal
1
0
0
0
Probe
0
0
0
0
DoS
0
0
0
0
R2L
0
1
1
1
U2R/ Data
0
0
0
0
PHAD
ALAD
Snort
Fusion output
Normal
0
1
1
0
Probe
1
0
0
1
DoS
0
0
0
0
R2L
0
0
0
0
U2R/ Data
0
0
0
0
Total attacks
37
63
53
37
190
Attacks detected
26
27
6
4
63
% detection
70%
43%
11%
11%
33%
Total attacks
37
63
53
37
190
Attacks detected
9
23
31
15
78
% detection
24%
37%
59%
31%
41%
Total attacks
37
63
53
37
190
Attacks detected
15
35
30
34
115
% detection
41%
56%
57%
92%
61%
Chapter 7
153
Attack type
Probe
DoS
R2L
U2R/ Data
Total
Total attacks
37
63
53
37
190
Attacks detected
31
44
34
34
143
% detection
84%
70%
64%
92%
75%
Table 7.10: Type of attacks detected by context-dependent fusion at 100 false alarms
Chapter 7
154
Detector/
Fusion Type
PHAD
ALAD
Snort
OR
AND
Data-dependent
Decision Fusion
Total
Attacks
45
45
45
45
45
45
TP
FP
Precision
Recall
F-score
11
20
13
34
9
31
45
45
400
470
29
39
0.20
0.31
0.03
0.07
0.24
0.44
0.24
0.44
0.29
0.76
0.2
0.69
0.22
0.36
0.05
0.13
0.22
0.54
Table 7.11: Comparison of the evaluated IDSs using the real-world data set
feasibility of this idea was demonstrated via an analysis case study with several IDSs distributed over a LAN network and using the replayed DARPA data
set. The technique gives a performance better than any of the individual intrusion detection systems which were fused. Even though it was validated for a
particular application, it should be a generalizable solution beyond any specific
application case.
7.6 Summary
Different IDSs have different detection rate and false alarm rate and these may
be complementary, competitive or cooperative. What sensor fusion is all about
is how to combine multiple sensor outputs to reveal the best truth regarding the
objects of interest in terms of practical utility. The context dependent operator proposed in this chapter was demonstrated to be feasible for sensor fusion.
The research in this thesis have improved over the existing DS alternatives in
that it can better handle uncertainty and ambiguity in sensed context. The individual IDSs that are components of this architecture in this particular work
were PHAD, ALAD and Snort with detection rates 0.33, 0.41 and 0.61 respectively. The false alarm rates for PHAD and ALAD were acceptable whereas
that of Snort was unexceptionally high. Our algorithm has resulted in acceptable ranges of false alarms and the significant improvement in detection rate for
all types of attacks with an overall detection rate of 0.75 which have resulted
with exceptionally large data set and suboptimal constituent intrusion detection
systems. The detection rate for the real-world network traffic has improved to
0.69. The F-Score has improved to 0.66 and 0.54 for the DARPA data set and
Chapter 7
155
the real-world traffic respectively. The evaluations show the strength and ability of the data-dependent decision fusion approach using the modified evidence
theory to perform very well for the real-world network traffic as well as for the
DARPA data set. It is also shown that our technique is more flexible and also
outperforms other existing fusion techniques such as OR and AND. The experimental comparison of this method using the real-world traffic has confirmed its
usefulness and significance.
The experiments in this work used only three IDSs. It is possible that the use of
more sensors will necessarily lead to higher performance improvement of the
fusion IDS. Also, the context-dependent operator can provide a generalizable
solution for a wide range of applications. Thus it supports the claim that synergistic interaction between sensor fusion and intrusion detection facilitates the
sensor fusion for detection improvement.
Chapter 8
Modeling of Intrusion Detection Systems
and Sensor Fusion
I find that the harder I work, the more luck I seem to have.
Thomas Jefferson
8.1 Introduction
This chapter addresses the problem of optimizing the performance of intrusion
detection systems using sensor fusion. Considering the utility of sensor fusion
for improved sensitivity and false alarm reduction as demonstrated in the earlier chapters, we have explored the general problem of deciding the threshold
for differentiating the malicious traffic from the normal traffic and also modeling the individual components of proposed sensor fusion architecture. In the
proposed method, the performance optimization of the individual IDSs is first
addressed. The neural network supervised learner has been designed to determine the weights of the individual IDSs, which incorporates data-dependency
in the architecture. A sensor fusion unit doing the weighted aggregation in order to make an appropriate decision, forms the final stage of the data-dependent
decision fusion architecture. This chapter theoretically models the fusion of intrusion detection systems for the purpose of demonstrating the improvement in
performance, in order to supplement the empirical evaluation in the previous
two chapters.
156
Chapter 8
157
The remaining of this chapter is organized as follows. In section 8.2, the motivation for this chapter is discussed. In section 8.3 model of the proposed DD
fusion architecture is presented by modeling the constituent parts. Algorithms
for optimizing the local detectors along with a data-dependent decision fusion
architecture for optimizing the fusion criterion are also presented in section 8.3.
Finally, the concluding comments are presented in section 8.4.
8.2 Motivation
This chapter attempts to realize that there exists more effective means of analyzing the information provided by existing IDSs using sensor fusion resulting
in an effective data refinement for knowledge recovery. The improved performance of the DD fusion architecture is shown experimentally with an approach
adopted for optimizing both the local sensors and the fusion unit with respect
to the error rate. The optimal performance along with the complexity of the
task bring to the fore the need for theoretically sound basis for the sensor fusion
techniques in IDSs. The theoretical analysis of the improved performance of
the architecture has been done in chapter 4.
The motivation of the present work was the fact that the empirical evaluation
as seen in previous chapter was extremely promising with the DD fusion. The
modeling can be extremely useful with a complete addressing of the problem
with sound mathematical and logical concepts as introduced in chapter 4. Thus
the present work employs modeling to augment the effective mathematical analysis of the improved performance of sensor fusion and to develop a rational
basis which is free from the various techniques used.
Chapter 8
158
optimizing the individual IDSs as the first stage, the neural network learner determining the weights of the individual IDSs as the second stage, and the fusion
unit doing the weighted aggregation as the final stage.
8.3.1 Modeling of Intrusion Detection Systems
Consider an IDS that either monitors the network traffic connection on the network or the audit trails on the host. The network traffic connection or the audit
trails monitored are given as x X, where X is the entire domain of network
traffic features or the audit trails respectively. The model is based on the hypothesis that the security violations can be detected by monitoring the network
for traffic connections of malicious intent in the case of network-based IDS
and a systems audit records for abnormal patterns of system usage in the case
of host-based IDS. The model is independent of any particular operating system, application, system vulnerability or type of intrusion, thereby providing a
framework for a general-purpose IDS.
When making an attack detection, a connection pattern is given by xj <jk
where j is the number of features from k consecutive samples used as input
to an IDS. As seen in the DARPA data set, for many of the features the distributions are difficult to describe parametrically as they may be multi-modal
or very heavy-tailed. These highly non-Gaussian distributions has led to investigate non-parametric statistical tests as a method of intrusion detection in the
initial phase of IDS development. The detection of an attack in the event x is
observed as an alert. In the case of network-based IDS, the elements of x can
be the fields of the network traffic like the raw IP packets or the pre-processed
basic attributes like the duration of a connection, the protocol type, service etc.
or specific attributes selected with domain knowledge such as the number of
failed logins or whether a superuser command was attempted. In host-based
IDS, x can be the sequence of system calls, sequence of user commands, connection attempts to local host, proportion of accesses in terms of TCP or UDP
packets to a given port of a machine over a fixed period of time etc. Thus IDS
can be defined as a function that maps the data input into a normal or an attack
event either by means of absence of an alert (0) or by the presence of an alert
(1) respectively and is given by:
Chapter 8
159
IDS : X {0, 1}
To detect attacks in the incoming traffic, the IDSs are typically parameterized
by a threshold T. The IDS uses a theoretical basis for deciding the thresholds for
analyzing the network traffic to detect intrusions. Changing this threshold allows the change in performance of the IDS. If the threshold is very low, then the
IDS tends to be very aggressive in detecting the traffic for intrusions. However,
there is a potentially greater chance for the detections to be irrelevant which result in large false alarms. A large value of threshold on the other hand will have
an opposite effect; being a bit conservative in detecting attacks. However, some
potential attacks may get missed this way. Using a 3 based statistical analysis,
the higher threshold (Th ) is set at +3 and the lower threshold (Tl ) is set at 3.
This is with the assumption that the traffic signals are normally distributed. In
general the traffic detection with s being the sensor output is given by:
Sensor Detection =
Signature-based IDS functions by looking at the event feature x and checking whether it matches with any of the records in the signature database Db .
Signature-based IDS : X {1}
: X {0}
x Db
x
/ Db
Anomaly-based IDS generates alarm when the input traffic deviates from the
established models or profiles Pf .
Anomaly-based IDS : X {1}
: X {0}
x
/ Pf
x Pf
Chapter 8
160
unit f . This architecture is often referred to as the parallel decision fusion network and is given in Figure 8.1. The fusion unit makes a global decision, s,
about the true state of the hypothesis based on the collection of the local decisions gathered from all the sensors. The problem is casted as a binary detection
S1
IDS1
IDS2
INPUT (x)
S2
.
.
.
.
.
OUTPUT (y)
FUSION UNIT
Sn
IDSn
si =
The fusion center collects these local decisions si and forms a binomial disn
X
tribution s as given by s =
si , where n is the total number of IDSs that
take part in fusion.
i=1
Theorem 1
The output of a binary fusion unit is decided by a function f given by:
f : s1 x s2 ..... x sn x x {0, 1}, where the decisions of the individual detectors given by si are deterministic and the data x is a random parameter.
Lemma 1
The decision rule used by each of the individual detectors is deterministic and
Chapter 8
161
fi (xj ) =
0,
1,
if p(sji = 0|xj ) = 1
otherwise
where j corresponds to the class of the network traffic on which the fusion
rule as well as the respective sensor outputs depend. Since fusion center makes
the final decision, the assumption is made that the output of the fusion rule is
binary, i.e., either N ormal or Attack. It is the same case with all the individual
IDSs: each IDS classifies the incoming traffic as N ormal or Attack.
8.3.3 Statement of the problem
The problem statement is defined in the following steps:
The random variable x represents the observation to be made. This observation belongs to either of the two groups of the hypothesis: N ormal or
Attack with probabilities p or q = 1 p, respectively.
A set of n IDSs monitors the random variable x and detects the presence
of attack in the traffic. The set of detections by the n sensors is given by
{s1 , s2 .....sn }, where si is the output of the IDS indexed i. Each si is a
function of the input x, i.e., si = fi (x).
The problem of optimum detection with n IDSs selecting either of the two
possible hypotheses is considered from the decision theory point of view.
The loss function ` is defined in terms of the decisions made by each IDS
along with the observation and is given by:
(8.1)
The average of the loss is then minimized. The objective of the decision
strategy is to minimize the expected penalty (loss function) incurred as:
Chapter 8
162
Chapter 8
163
approach towards solving of the fusion problem is taken by noticing that the decision function fi (.) is characterized by the threshold Ti and the likelihood ratio
(if independence is assumed). Thus the necessary condition for optimal fusion
decision occurs if the thresholds (T1 , T2 , ...Tn ) are chosen optimally. However,
this does not satisfy the sufficient condition. These refer to the many local minima, each need to be checked to assure the global minimum.
The counterintuitive results at the individual sensors with the proper choice
of thresholds will be advantageous in getting an optimum value for the fusion
result. They are excellent paradigms for studying distributed decision architectures, to understand the impact of the limitations, and even suggest empirical
experiments for IDS decisions.
The structure of the fusion rule plays a crucial role regarding the overall performance of the IDS since the fusion unit makes the final decision about the state of
the environment. While a few inferior IDSs might not greatly impact the overall performance, a badly designed fusion rule can lead to a poor performance
even if the local IDSs are well designed. The fusion IDS can be optimized by
searching the space of fusion rules and optimizing the local thresholds for each
candidate rule. Other than for some simple cases, the complexity of such an
approach is prohibitive due to exponential growth of the set of possible fusion
rules with respect to the number of IDSs. Searching for the fusion rule that
leads to the minimum probability of error is the main bottleneck due to discrete
nature of this optimization process and the exponentially large number of fusion rules. In our experiment we are trying to maximize the true positive rate
by fixing the false positive rate at 0 . 0 determines the threshold T by trial and
error. We have noticed that within two or three trials in our case. This is done
with the training data and hence it is done off line.
The computation of thresholds couples the choice of the local decision rules
so that the system-wide performance is optimized, rather than the performance
of the individual detector. This requirement is taken care of by the DD fusion
architecture proposed and discussed in the previous chapter. The weights assigned to the individual sensors is determined by the neural network learner.
Chapter 8
164
The architecture is independent of the data set and the structures employed, and
can be used with any real valued data set.
8.3.5 Modeling of neural network learner unit
The neural network unit in the data-dependent architecture is a supervised learning system which learns from a training data set. The training of the neural
network unit by back propagation involves three stages: the feed forward of the
output of all the IDSs along with the input training pattern, which collectively
form the training pattern for the neural network learner unit, the calculation and
the back propagation of the associated error, and the adjustments of the weights.
After the training, the neural network is used for the computations of the feedforward phase. Learning can be defined over an input space X, an output space
Chapter 8
165
Y and a loss function `. The training data can be specified as {(xi , yi )}, where
xi X and yi Y . The output is a hypothesis function fw : X Y . fw
is chosen from a hypothesis space F to minimize the prediction error given by
the loss function. The hypothesis function is that of the neural network and it
represents the non-linear function from the input space X to the output space Y .
It is simple to assume stationarity by assuming the distribution of data points
encountered in the future to be the same as the distribution of the training set.
For simplicity, the DARPA data set is assumed to represent the real time traffic pattern distribution. Stationarity allows us to reduce the predictive learning
problem to a minimization of the sum of the loss over the training set.
P
`(fw (xi ), yi )
fw = argmin
s.t fw F & (xi , yi ) s
(8.2)
Loss functions are typically defined to be non-negative over all inputs and zero
when fw (xi ) = yi .
8.3.6 Dependence on the data and the individual IDSs
Often, the data in the databases is only an approximation of the true data. When
the information about the goodness of the approximation is recorded, the results
obtained from the database can be interpreted more reliably. Any database is
associated with a degree of accuracy, which is denoted with a probability density function, whose mean is the value itself.
In order to maximize the detection rate it is necessary to fix the false alarm
rate to an acceptable value, taking into account the trade-off between the detection rate and the false alarm rate. The threshold (T ) that maximizes the T Prate
and thus minimizes the F Nrate is given as:
Chapter 8
166
FPrate = P [alert|normal] = P
" n
X
#
wi si T |normal
= 0
(8.3)
i=1
TPrate = P [alert|attack] = P
" n
X
#
wi si T |attack
(8.4)
i=1
Chapter 8
167
errors.
Assuming independence between the local detectors, the likelihood ratio is
given by:
n
Y P (si |Attack)
P (s1 , s2 , ..., sN |Attack)
P (s|Attack)
=
=
P (s|N ormal) P (s1 , s2 , ..., sN |N ormal) i=1 P (si |N ormal)
T heoptimumdecisionrulef orthef usionunitf ollows :
f (s) = log
P (s|Attack)
P (s|N ormal)
Depending on the value of f (s) being greater than or equal to the decision
threshold, T , or less than the decision threshold, T , the decision is made for the
hypothesis as Attack or N ormal respectively. Thus the decisions from
the n detectors are coupled through a cost function. It is shown that the optimal
decision is characterized by thresholds as in the decoupled case. As far as the
optimum criterion is concerned, the first step is to minimize the loss function
of equation 8.1. This leads to sets of simultaneous inequalities in terms of the
generalized likelihood ratios at each detector, the solutions of which determine
the regions of optimum detection.
8.5 Summary
The sensor fusion techniques works effectively by gathering complementary
information that can improve the overall detection rate without adversely affecting the false alarm rate. Simple theoretical model is initially illustrated in
Chapter 8
168
Fusion method
PHAD
ALAD
Snort
DD Fusion
DD Fusion with
modified evidence theory
Detection rate
0.33
0.41
0.61
0.72
0.75
this chapter for the purpose of showing the improved performance of IDS using
sensor fusion. The detection rate and the false positive rate quantify the performance benefit obtained through the optimization. The theoretical model is also
validated.
Chapter 9
Conclusions
Whether you think you can, or that you cant, you are usually right.
Henry Ford
Chapter 9
170
In this thesis, we presented a framework for the performance enhancement of intrusion detection systems using advances in sensor fusion. The data-dependent
decision fusion architecture has been implemented and the results of implementation has been observed to be better than that of the individual detectors that
take part in fusion, thus validating the approach. We have also demonstrated the
importance of detecting the rarer and the most significant attacks. The fusion algorithm used modified evidence theory, which aids in providing the better than
the best protection. This chapter presents the conclusions drawn from this thesis
work and discusses the directions for future research.
Chapter 9
171
modified evidence theory has evolved in this thesis work as a viable method for
empirical model generation for intrusion detection. The improved performance
of the IDS that has been demonstrated, if deployed fully would contribute to
53% reduction of the successful attacks in two years and 66% reduction of the
successful attacks in four years. This is a right step towards making the cyber
space safer over a period of time with the proper deployment of highly efficient
and sophisticated detectors .
Chapter 9
172
9.3 Summary
This thesis discussed the assertion that it is possible to perform intrusion detection for both rare and new attacks using advances in sensor fusion. The previous
chapters described the theoretical and experimental work done to show its validity and the results of experiments provide evidence in support of this thesis.
The experiments emphasize proof-of-concept demonstrating the viability of the
technique and also its efficiency in comparison to the existing methods. The
proposed approaches are shown to significantly improve detection rate and reduce the false alarm rate and hence results in an acceptable and usable intrusion
detection system.
While experimenting with a simple treatment for an enhanced intrusion detection, it was found that the data-dependent decision fusion using modified
Chapter 9
173
evidence theory was highly successful. Hence, it was worthwhile to devise investigations of other applications. This is suggested since it is often useful to
cast the net a bit wider to give the argument presented in this thesis a further
support, or a comparative focus.
Appendix A
Attacks on the Internet: A study
In all science, error precedes the truth, and it is better it should go first than
last.
Hugh Walpole
A.1 Introduction
An attack is realization of threat, the harmful action aiming to find and exploit
the system vulnerability. Computer attacks may involve destroying or accessing data, subverting the computer or degrading its performance. Traditionally
attacks on computers have included methods such as viruses, worms, bufferoverflow exploits and denial of service attacks. Network attacks on the other
hand are mostly attacks on computers that use a network in some way. A network could be used to send the attack (such as a worm), or it could be the
means of attack (such as Distributed Denial of Service attack). In general, network attacks are a subset of computer attacks. However, there are several types
of network attacks that do not attack computers, but rather the network they are
attached to. Flooding a network with packets does not attack an individual computer, but clogs up the network. Although a computer may be used to initiate the
attack, both the target and the means of attacking the target are network related.
There are lots of known computer system attack classifications, and taxonomies
available in literature [120, 15, 122, 123, 125, 126, 127, 128].
Howard [120] classified attacks according to Attackers, Tools, Access, Results
174
Appendix A
175
and Objectives. In the Ph.D. thesis of Kumar [15], he has introduced a classification based on attack signatures used within the IDS IDIOT. This classification
is based on the type of observation required to detect a given attack. Lindqvist
and Jonsson [122] presented an attack taxonomy using two dimensions of an
attack. Probably one of the best known taxonomies is the Defence Advanced
Projects Agency (DARPA) attack taxonomy. This taxonomy was developed in
1998 for classifying attacks in order to simplify the process of evaluating IDSs
[123]. Work done by Chris Rodgers [124] covers many computer and network
attacks with regards to TCP/IP networking. His research was carried out in
2001 and provides a good overview of the threats and attacks that face TCP/IP
networking, as well as attacks such as viruses, worms, trojans and denial of service attacks.
In the most widely used open source network intrusion prevention and detection system, namely the Snort, attack classification is based on its impact on the
computer system. The attacks whose effect is the most critical have the highest
priority. The priority levels are divided into high, medium and low ones. Highlevel priority attacks are such as the attempted administrator privilege gain, the
network Trojan, or the web application attack. Medium priority attacks are
the Denial of service (DoS) attacks, a nonstandard protocol or event, potentially
bad traffic, attempted log-in using a suspicious user etc. Low-level priority
attacks are the ICMP event, the network scan, the generic protocol command
etc.[130].
Appendix A
176
Figure A.1: Plot of Attack sophistication vs Intruder Knowledge over the years
Palo Alto Research Center: The Morris Worm [132]. The first viruses were
released in 1981, among them Apple Viruses 1, 2 and 3 which targeted the Apple II operating system. In 1983, Fred Cohen was the first person to formally
introduce the term computer virus in his thesis [133], which was published in
1985. More recently, new attacks such as denial of service (DoS) (mid 1990s),
distributed DoS (DDoS) attacks (in 1999), botnets and storm botnets have been
developed. Two major recent developments in computer and network attacks
are blended attacks and information warfare. The blended attacks first appeared
in 2001 with the release of Code Red [134] and then followed by Nimda [135],
Slammer [136] and Blaster [137]. Blended attacks contain two or more attacks
merged together to produce a more potent attack.
Appendix A
177
Appendix A
178
File infectors
File infector viruses infect files on the victims computer by inserting themselves
into a file. Usually the file is an executable file, such as a .EXE or .COM in
Windows. When the infected file is run, the virus executes as well.
System and boot record infectors
System and boot record infectors were the most common type of virus until the
mid 1990s. These types of viruses infect system areas of a computer such as the
Master Boot Record (MBR) on hard disks and the DOS boot record on floppy
disks. By installing itself into boot records, the virus can run itself every time
the computer is booted up.
Macro viruses
Macro viruses are simply macros for popular programs, such as Microsoft Word,
that are malicious. For example, they may delete information from a document
or insert phrases into it. Propagation is usually through the infected files. If a
user opens a document that is infected, the virus may install itself so that any
subsequent documents are also infected. Often the macro virus will be attached
as an apparently benign file to fool the user into infecting themselves. The
Melissa virus [145] is the best known macro virus. The virus worked by emailing a victim with an email that appeared to come from an acquaintance. The
email contained an Microsoft Word document as an attachment, that if opened,
would infect Microsoft Word and if the victim used the Microsoft Outlook 97
or 98 email client, the virus would be forwarded to the first 50 contacts in the
victims address book. Melissa caused a significant amount of damage, as the
email sent by the virus flooded email servers.
Virus properties
Appendix A
179
damaging, and resistant. A stealth virus is one that attempts to hide its presence. This may involve attaching itself to files that are not usually seen by the
user. Viruses can use encryption to hide their payload. A virus using encryption
will know how to decrypt itself to run. As the bulk of the virus is encrypted, it
is harder to detect and analyze. Some viruses have the ability to change themselves as either time goes by, or when they replicate themselves. Such viruses
are called polymorphic viruses. Polymorphic viruses can usually avoid being
eradicated longer than other types of viruses as their signature changes.
A.4.2 Worms
A worm is a self-replicating program that propagates over a network in some
way. Unlike viruses, worms do not require an infected file to propagate. There
are two main types of worms: mass-mailing worms and network-aware worms.
Mass-mailing Worms
A mass-mailing worm is a worm that spreads through email. Once the email
has reached its target it may have a payload in the form of a virus or trojan.
Network-aware Worms
Appendix A
180
the malicious attacker and gives them the following abilities: Session logging,
Keystroke logging, File transfer, Program installation, Remote rebooting, Registry editing, and Process management.
Logic bombs
Logic bombs are a special form of trojans that only release their payload once a
certain condition is met. If the condition is not met, the logic bomb behaves as
the program it is attempting to simulate.
A.4.4 Buffer overflows
Buffer overflows are probably the most widely used means of attacking a computer or network. They are rarely launched on their own, and are usually part of
a blended attack. Buffer overflows are used to exploit flawed programming, in
which buffers are allowed to be overfilled. If a buffer is filled beyond its capacity, the data filling it can then overflow into the adjacent memory, and then can
either corrupt data or be used to change the execution of the program. There are
two main types of buffer overflows described below.
Stack buffer overflow
A stack is an area of memory that a process uses to store data such as local
variables, method parameters and return addresses. Often buffers are declared
at the start of a program and so are stored in the stack. Each process has its
own stack, and its own heap. Overflowing a stack buffer was one of the first
types of buffer overflows and is one that is commonly used to gain control of
a process. In this type of buffer overflow, a buffer is declared with a certain
size. If the process controlling the buffer does not make adequate checks, an
attacker can attempt to put in data that is larger than the size of the buffer. An
attacker may place malicious code in the buffer. Part of the adjacent memory
will often contain the pointer to the next line of code to execute. Thus, the buffer
overflow can overwrite the pointer to point to the beginning of the buffer, and
hence the beginning of the malicious code. Thus, the stack buffer overflow can
give control of a process to an attacker.
Appendix A
181
Heap overflows
Heap overflows are similar to stack overflows but are generally more difficult to
create. The heap is similar to the stack, but stores dynamically allocated data.
The heap does not usually contain return addresses like the stack, so it is harder
to gain control over a process than if the stack is used. However, the heap
contains pointers to data and to functions. A successful buffer overflow will
allow the attacker to manipulate the processs execution. An example would be
to overflow a string buffer containing a filename, so that the filename is now an
important system file. The attacker could then use the process to overwrite the
system file (if the process has the correct privileges).
A.4.5 Denial of Service attacks
Denial of Service (DoS) attacks [146], sometimes known as nuke attacks, are
designed to deny legitimate users of a system from accessing or using the system
in a satisfactory manner. DoS attacks usually disrupt the service of a network or
a computer, so that it is either impossible to use, or its performance is seriously
degraded. There are three main types of DoS attacks: host based, network based
and distributed.
Host-based DoS
Appendix A
182
Network-based DoS
Distributed DoS (DDoS) attacks are a recent development in computer and network attack methodologies. The DDoS attack methodology was first seen in
1999 with the introduction of attack tools such as The DoS Projects Trinoo[36,
21], The Tribe Flood Network[1, 21] and Stacheldraht8[37]. DDoS attacks
work by using a large number of attack hosts to direct a simultaneous attack on
a target or targets. Figure A.2 shows the process of a DDoS attack. Firstly, the
attacker commands the master nodes to launch the attack. The master nodes
then order all daemon nodes under them to launch the attack. Finally the daemon nodes attack the target simultaneously, causing a denial of service. With
enough daemon nodes, even a simple web page request will stop the target from
serving legitimate user requests.
The DDoS attack takes place when many compromised machines infected by
the malicious code act simultaneously and are coordinated under the control of
a single attacker in order to break into the victims system, exhaust its resources,
Appendix A
183
and force it to deny service to its customers. There are mainly two kinds of
DDoS attacks[ 10]: typical DDoS attacks and distributed reflector DoS (DRDoS) attacks. In a typical DDoS attack, the army of the attacker consists of
master zombies and slave zombies. The hosts of both categories are compromised machines that have arisen during the scanning process and are infected by
malicious code. The attacker coordinates and orders master zombies and they,
in turn, coordinate and trigger slave zombies. More specifically, the attacker
sends an attack command to master zombies and activates all attack processes
on those machines, which are in hibernation, waiting for the appropriate command to wake up and start attacking. Then, master zombies, through those
processes, send attack commands to slave zombies, ordering them to mount a
DDoS attack against the victim. In that way, the agent machines (slave zombies) begin to send a large volume of packets to the victim, flooding its system
with useless load and exhausting its resources.
Unlike typical DDoS attacks, in DRDoS attacks the army of the attacker consists of master zombies, slave zombies, and reflectors[11]. The scenario of this
Appendix A
184
type of attack is the same as that of typical DDoS attacks up to a specific stage.
The attackers have control over master zombies, which, in turn, have control
over slave zombies. The difference in this type of attack is that slave zombies are led by master zombies to send a stream of packets with the victims
IP address as the source IP address to other uninfected machines (known as
reflectors), exhorting these machines to connect with the victim. Then the reflectors send the victim a greater volume of traffic, as a reply to its exhortation
for the opening of a new connection, because they believe that the victim was
the host that asked for it. Therefore, in DRDoS attacks, the attack is mounted
by noncompromised machines, which mount the attack without being aware of
the action. Comparing the two scenarios of DDoS attacks, we should note that a
DRDoS attack is more detrimental than a typical DDoS attack. This is because
a DRDoS attack has more machines to share the attack, and hence the attack
is more distributed. A second reason is that a DRDoS attack creates a greater
volume of traffic because of its more distributed nature. Figure A.3 graphically
depicts a DRDoS attack. The general taxonomy of the DDoS is shown in figure
Appendix A
185
A.4.
A.4.6 Network-based attacks
This section describes several kinds of attacks that operate on networks and
the protocols that run the networks. Network spoofing is the process in which
an attacker passes themselves off as someone else. There are several ways of
spoofing in the standard TCP/IP network protocol stack, including: MAC address spoofing at the data-link layer and IP spoofing at the network layer. By
spoofing who they are, an attacker can pretend to be a legitimate user or can
manipulate existing communications from the victim host.
Session Hijacking Session hijacking is the process by which an attacker takes
over a session taking place between two victim hosts. The attack essentially
cuts in and takes over the place of one of the hosts. Session hijacking usually
takes place at the TCP layer, and is used to take over sessions of applications
such as Telnet and FTP. TCP session hijacking involves use of IP spoofing, as
mentioned above, and TCP sequence number guessing. To carry out a successful TCP session hijacking, the attacker will attempt to predict the TCP sequence
number that the session being hijacked is up to. Once the sequence number has
been identified, the attacker can spoof their IP address to match the host they are
cutting out and send a TCP packet with the correct sequence number. The other
Appendix A
186
host will accept the TCP packet, as the sequence number is correct, and will start
sending packets to the attacker. The cut out host will be ignored by the other
host as it will no longer have the correct sequence number. Sequence number
prediction is most easily done if the attacker has access to the IP packets passing
between the two victim hosts. The attacker simply needs to capture packets and
analyze them to determine the sequence number. If the attacker does not have
access to the IP packets, then the attacker must guess the sequence number.
A.4.7 Password attacks
An attacker wishing to gain control of a computer, or a users account, will often
use a password attack to gain the needed password. Many tools exist to help
the attacker uncover passwords. Password Guessing/Dictionary Attack Password guessing is the most simplest of password attacks. It simply involves the
attacker attempting to guess the password. Often the attacker will use a form
of social engineering to gain clues as to what the password is. A dictionary
attack is similar, but is a more automated attack. The attacker uses a dictionary of words containing possible passwords and uses a tool to see if any are
the required password. Brute force attacks work by calculating every possible combination that could make up a password and testing it to see if it is the
correct password.
A.4.8 Information gathering attacks
The attack process usually involves information gathering. Information gathering is the process by which the attacker gains valuable information about potential targets, or gains unauthorized access to some data without launching an
attack. Information gathering is passive in the sense that no attacks are explicitly launched. Instead networks and computers are sniffed, scanned and probed
for information.
Sniffing
Packet sniffers are a simple but invaluable tool for anyone wishing to gather
information about a network or computer. For the attacker, packet sniffers provide a way to glean information about the host or person they wish to attack,
Appendix A
187
Security scanning is similar to mapping, but is more active and more information is gathered. Security scanning involves testing a host for known vulnerabilities or weaknesses that could be exploited by the attacker. For example, a
security scanning tool may be able to tell the attacker that port 80 of the target
is running an HTTP server, with a specific vulnerability.
Appendix A
188
Appendix A
189
Economic espionage will be increasingly common as nation-states use cyber theft of data to gain economic advantage in multinational deals. The
attack of choice involves targeted spear phishing with attachments, using
well-researched social engineering methods to make the victim believe that
an attachment comes from a trusted source, and using newly discovered
Microsoft Office vulnerabilities and hiding techniques to circumvent virus
checking.
4. Mobile phone threats, especially against iPhones and android-based phones;
plus VOIP
Mobile phones are general purpose computers, so worms, viruses, and
other malware will increasingly target them. A truly open mobile platform
will usher in completely unforeseen security nightmares. The developer
toolkits provide easy access for hackers.
Attacks on VoIP systems are on the horizon and may surge in the coming
years. VoIP phones and the IP PBXs have had numerous published vulnerabilities. Attack tools exploiting these vulnerabilities have been written
and are available on the Internet.
5. Insider attacks
Insider attacks are initiated by rogue employees, consultants and/or contractors of an organization. Insider-related risk has long been exacerbated
by the fact that insiders usually have been granted some degree of physical
and logical access to systems, databases, and networks that they attack,
giving them a significant head start in attacks that they launch. More recently, however, security perimeters have broken down, something that
allows insiders to attack both from the inside and from outside an organizations network boundaries.
6. Advanced identity theft from persistent bots
A new generation of identity theft is being powered by bots that stay
on machines for three to five months collecting passwords, bank account
Appendix A
190
Appendix A
191
and the CDs packaged with those devices sometimes contain malware that
infect victims computers and connect them into botnets.
A.6 Conclusion
Even though a lot of attacks have been listed above, many of the terms tend not
to be mutually exclusive. For example, the virus may contain a logic bomb, so
that the categories overlap. Also, any successful attack may get classified into
multiple categories since attackers use multiple methods. This makes the classification ambiguous and difficult to repeat.
We conclude by stating Cohen [133], ...a complete list of the things that can go
wrong with the information systems is impossible to create. People have tried
to make comprehensive lists, and in some cases have produced encyclopedic
volumes on the subject, but there are a potentially infinite number of different
problems that can be encountered, so any list can only serve a limited purpose.
Appendix B
Intrusion Detection Systems: A survey
If I have been able to see farther than others, it was because I stood on the
shoulder of giants.
Sir Isaac Newton
B.1 Introduction
Intrusion Detection is a rapidly evolving and changing technology. Even though
the blooming took place in the early 1980s, all the early intrusion detection work
was done as research projects for US government and military organizations.
The major works in intrusion detection has happened in the mid and late 1990s
along with the explosion of the Internet. The early research work in the field of
intrusion detection often focused on host-based solutions, but the drastic growth
of networking changed the later efforts to be concentrated on network-based
systems. The tools discussed here reflect a core of active research that has
happened in the last two decades. Several surveys have indeed been published
in the past [147, 148, 149, 150, 152, 155], but the growth of IDSs has been
such that a lot of IDSs have appeared in the meantime. This survey hence tries
to present an updated view by starting with the historical developments in the
field of intrusion detection from the perspective of the people who did the initial
research and development and their projects, providing us with a better insight
into the motivation behind it.
192
Appendix B
193
Appendix B
194
Appendix B
195
session. It used a two-stage statistical analysis to detect anomalies in system activities. The first stage checked each session for unusual activity and the second
stage used a statistical test to detect trends in sessions. The combination of the
two techniques was designed to allow detection of both out-of-bounds activities
as well as activities that gradually deviated from normal over a period of time.
The principal investigators were Smaha and Stephen.
Almost the same time, Multics Intrusion Detection and Alerting System (MIDAS) was developed by the National Computer Security Center to monitor
NCSCs Dockmaster system, which is a highly secure operating system. The
MIDAS was designed to take data from Dockmasters answering system audit
log and used a hybrid analysis strategy, combining statistical anomaly detection
with expert system rule-based approaches. In 1989, Wisdom and Sense from
Los Alamos National Laboratory and Information Security Officers Assistant
(ISOA) from Planning Research Corporation were developed.
In 1990 Kerr and Susan reported all the experimental as well as actually implemented IDSs in the Datamation report titled Using AI to improve security.
In the same year, an audit trail analysis tool Computer Watch was developed
by AT&T and was designed to consume operating system audit trails generated
by UNIX system. An expert system was used to summarize system securityrelevant events and a statistical analyzer and query mechanism allowed statistical characterization of system-wide events. Network System Monitor (NSM)
was developed at the University of California at Davis in 1990, to run on a Sun
UNIX workstation. NSM was the first system, monitoring network traffic and
using that traffic as the primary data source. NSM was a significant milestone in
intrusion detection research because it was the first attempt to extend intrusion
detection to heterogeneous network environments. Principal researchers were
Levitt, Heberlein and Mukherjee.
Network Audit Director and Intrusion Reporter (NADIR) was developed in 1991
by the Computer division of Los Alamos National Laboratory to monitor user
activities on the Integrated Computing Network (ICN) at Los Alamos. NADIR
performs a combination of expert rule-based analysis and statistical profiling.
Appendix B
196
Appendix B
197
by two stationary stochastic processes, the normal process and the misuse process. Misuse detection is identification of transactions most likely to have been
generated by misuse process.
In 1994, Crosbie and Spafford suggested the use of autonomous agents in order to improve the scalability, maintainability, efficiency and fault tolerance of
an Intrusion Detection System [165]. Next generation Intrusion Detection Expert System (NIDES) [157] which was developed in 1995 is the successor to
the IDES project. It has a strong anomaly detection foundation using innovative
statistical algorithms, complemented with a signature based expert system component that encodes known intrusion scenarios. NIDES is highly modularized
and is designed to operate in real time to detect intrusions as they occur.
Christoph and Gray in 1995 has expanded NADIR to include processing of
audit and activity records for the Cray UNICOS operating system and called
UNICORN: misuse detection for UNICOS [197]. An approach to address the
scalability deficiencies in most contemporary intrusion detection systems was
proposed with the design and implementation of GrIDS. A Graph based Intrusion Detection System for large networks (GrIDS) [166] was developed in 1996,
with graphs typically codifying hosts on the network as nodes and connections
between hosts as edges between these nodes. The choice of traffic taken to represent activity in the form of edges is made on the basis of user supplied rule
sets. The graph and the edges have global and local attributes, including time
of connection etc., that are computed by user supplied rule sets. These graphs
present network events in a graphic fashion that enables the viewer to determine
if suspicious network activity is taking place.
Kosoresow and Hofmeyr in 1997 published a paper on Intrusion Detection via
System Call Traces [204]. Computer user leaves trails of activity that can reveal signatures of misuse as well as of legitimate activity. Depending on the
audit method used, one can record a users keystrokes, the system resources
used, or the system calls made by some collection of processes. Event Monitoring Enabling Responses to Anomalous Live Disturbances (EMERALD) [30]
is a framework for scalable, distributed, inter-operable computer and network
Appendix B
198
intrusion detection. It was developed in 1997 and targets both external and internal threat agents that attempt to misuse system or network resources. It is
an advanced highly software engineered environment that combines signature
based and statistical analysis components with a resolver that interprets analysis
results, all of which can be used iteratively and hierarchically.
In 1998, Anderson and Khattak offered an innovative approach to intrusion
detection, by incorporating information retrieval techniques into intrusion detection tools. Bonifacio and Mauricio in 1998 were the first to introduce the
application of Neural networks in IDSs [199]. The system works by capturing
packets and uses neural network to identify an intrusive behavior within the
analyzed data stream. The identification is based on the previous well known
intrusion profiles. The system is adaptive, since new profiles can be added to the
database and the neural network re-trained to consider them. The paper presents
the proposed model, the results achieved and the analysis of an implemented
prototype. In 1998 a stand-alone system named Bro [200] for detecting network intruders in real-time by passively monitoring a network link over which
the intruders traffic transits was introduced by Paxson and Vern. Bro would
make high-speed, large volume monitoring of network traffic possible without
dropping packets and it also provides a real-time notification of ongoing or attempted attacks. The system is extensible since it is easy to add knowledge of
new types of attack. Bro contains mechanism to withstand attacks and hence is
the first to incorporate that theory into practice.
Huang and Ming-Yuh in 1999 introduces a large scale distributed ID architecture based on IDS agents and collaborative attack strategy analysis which
creates an opportunity for IDS agents to pro-actively look ahead for data most
pertinent to current case development. This look ahead adaptive behavior focuses limited system resources on collecting and auditing those events which
are most likely to reveal intrusions.
In 2000, Ning et al. presented the paper on Modelling requests among cooperating IDSs [201]. IDSs have to share information in order to discover attacks
Appendix B
199
involving multiple sites. This paper proposes a formal framework modeling requests among the cooperating IDSs. Dickerson and John had a different idea
which they presented in the paper on Fuzzy network profiling for Intrusion Detection in 2000. The Fuzzy Intrusion Recognition Engine (FIRE) is an anomalybased IDS that uses fuzzy logic to assess whether malicious activity is taking
place on the network. It uses simple data mining techniques to process the
network input data and helps expose metrics that are particularly significant to
anomaly detection. These metrics are then evaluated as fuzzy sets. Kent and
Stephen presented the paper On the trail of intrusions into information systems
during the same time.
Luo and Jianxiong in 2000 published their paper on Mining fuzzy association
rules and fuzzy frequency episodes for Intrusion Detection [203]. Lee, Stolfo
and Mok previously reported the use of association rules and frequency episodes
for mining audit data to gain knowledge for intrusion detection. Experimental
results show the utility of fuzzy association rules and fuzzy frequency episodes
for intrusion detection. Luo and Jianxiong published another paper on Fuzzy
frequent episodes for real-time intrusion detection in 2001. Data mining methods including association rule mining and frequent episode mining have been
applied to the intrusion detection problem.
In 2001, Balajinath in his paper Intrusion detection through learning behavior model observes that normally the users exhibit regularities in their usage of
commands of a system, as they tend to achieve the same or perhaps similar objective. Hence it is popularly known that the command sequences can be used
to characterize the user behavior . Deviations from the characteristic behavior
pattern of a user can be used to detect potential intrusions.
In 2002, Mukkamala and Srinivas suggested the use of neural networks and support vector machines in intrusion detection. Their paper on Intrusion detection
using neural networks and support vector machines describes these approaches
to intrusion detection and also compares the two methods. Lichodzijewski and
Peter described the Host-based intrusion detection using self-organizing maps
in 2002. Hierarchical SOMs are applied to the problem of host based intrusion
Appendix B
200
Appendix B
201
tolerance for imprecision and uncertainty that can be exploited by soft computing. This paper presents a novel intrusion detection system (IDS) that models
normal behaviors with hidden Markov models and attempts to detect intrusions
by noting significant deviations from the models. At almost the same time,
Abouzakhar and Nasser came up with An intelligent approach to prevent distributed systems attacks. This paper proposes an innovative way to counteract
distributed protocols attacks such as distributed denial of service (DDoS) attacks using intelligent fuzzy agents. Cansian and Adriano in the paper An attack signature model to computer security intrusion detection mention internal
and external computer network attacks or security threats occur according to
standards and follow a set of subsequent steps, allowing to establish profiles or
patterns. This well-known behavior is the basis of signature analysis intrusion
detection systems. This work presents a new attack signature model to be applied on network-based intrusion detection systems engines.
Zhao and Jun-Zhong in their paper An intrusion detection system based on data
mining and immune principles describe a framework of immune-based intrusion detection system (IDS). Here data mining techniques are used to discover
frequently occurred patterns. Ouyang and Ming-Guang presented A fuzzy comprehensive evaluation based distributed intrusion detection. Fuzzy Decision
Engine (FDE), which is a component of the detection agent in a distributed intrusion detection system, can consider various factors based on fuzzy comprehensive evaluation when an intrusion behavior is judged. Kumar and Parimal
in their paper on Detection of port-scans and OS fingerprinting using clustering explain the port-scanning and OS fingerprinting exploit vulnerabilities of
TCP/IP for intrusion in a computer network.
In 2002, Sekar in his paper Specification-based anomaly detection: A new approach for detecting network intrusions introduced a different idea. Specificationbased techniques have been shown to produce a low rate of false alarms, but are
not as effective as anomaly detection in detecting novel attacks, especially when
it comes to network probing and denial-of-service attacks. This paper presents
a new approach that combines specification-based and anomaly-based intrusion
detection, mitigating the weaknesses of the two approaches while magnifying
Appendix B
202
their strengths. The approach begins with state-machine specifications of networks protocols, and augments these state machines with information about
statistics that need to be maintained to detect anomalies.
Inoue and Hajime in the paper on Anomaly intrusion detection in dynamic execution environments describe an anomaly intrusion-detection system for platforms that incorporate dynamic compilation and profiling. This approach called
dynamic sandboxing gathers information about behavior of applications, usually unavailable to other anomaly intrusion detection systems, and is able to
detect anomalies at the application layer. This implementation is shown to be
both effective and efficient at stopping a backdoor and a virus, and has a low
false positive rate. Taylor and Carol in 2002 presented a paper on An empirical analysis of NATE - Network analysis of Anomalous Traffic Events. This
paper presents results of an empirical analysis of NATE (Network Analysis of
Anomalous Traffic Events), a lightweight, anomaly based intrusion detection
tool.
Mahoney and Chan have done credible work in detecting the novel attacks and
presented a paper on Learning nonstationary models of normal network traffic
for detecting novel attacks in 2002. The paper proposes a learning algorithm
that constructs models of normal behavior from attack-free network traffic. Behavior that deviates from the learned normal model signals possible novel attacks. This IDS is unique in two respects. First, it is nonstationary, modeling
probabilities based on the time since the last event rather than on average rate.
This prevents alarm floods. Second, the IDS learns protocol vocabularies (at
the data link through application layers) in order to detect unknown attacks that
attempt to exploit implementation errors in poorly tested features of the target
software.
Kemmerer and Richard in 2003 presented a paper on Internet security and intrusion detection which highlights the principal attack techniques that are used
in the Internet today and possible countermeasures. In particular, intrusion detection techniques are analyzed in detail. This paper mixes a practical character
with a discussion of the current research in the field. Feng and Hanping came
Appendix B
203
up with a paper on Anomaly detection using call stack information in 2003. The
call stack of a program execution can be a very good information source for
intrusion detection. There is no prior work on dynamically extracting information from call stack and effectively using it to detect exploits. In this paper, a
new method is proposed to do anomaly detection using call stack information.
The basic idea is to extract return addresses from the call stack, and generate
abstract execution path between two program execution points. Experiments
show that this method can detect some attacks that cannot be detected by other
approaches, while its convergence and false positive performance is comparable
to or better than the other approaches.
In 2003, Ling and Jun in the paper Novel immune system model and its application to network intrusion detection analyzes the techniques and architecture of
existing network Intrusion Detection Systems, and probes into the fundamentals of Immune System (IS), a novel immune model is presented and applied to
network IDS, which is helpful to design an effective IDS. Besides, this paper
suggests a scheme to represent the self profile of network. And an automated
self profile extraction algorithm is provided to extract self profile from packets.
Almost the same time, Tapiador and Juan in their paper on NSDF: A computer
network system description framework and its application to network security
describe a general framework, termed NSDF, for describing network systems.
Both entities and relationships are the basis underlying the concept of system
state. The dynamics of a network system can be conceived of as a trajectory in
the state space. The term action is used to describe every event which can produce a transition from one state to another. These concepts (entity, relationship,
state, and action) are enough to construct a model of the system. Evolution and
dynamism are easily captured, and it is possible to monitor the behavior of the
system.
In 2003, Xiang and Ga in their paper Generating IDS attack pattern automatically based on attack tree illustrate the generation of attack pattern automatically based on attack tree. The extending definition of attack tree is proposed
and the algorithm of generating attack tree is presented. The method of generating attack pattern automatically based on attack tree is shown, which is tested
Appendix B
204
by concrete attack instances. The results show that the algorithm is effective and
efficient. The efficiency of generating attack pattern is improved and the attack
trees can be reused. In 2003, Gao and Meimei worked on a paper Fuzzy intrusion detection based on fuzzy reasoning Petri Nets. Fuzzy rule-based technique,
combining fuzzy logic and expert system methodology, not only is capable to
deal with uncertainty in intrusion detection but also allows the most flexible
reasoning about the widest variety of information possible. It can be used in
both anomaly and misuse detections. This paper presents a method for detecting intrusion based on fuzzy rule-based technique. Fuzzy Reasoning Petri Nets
(FRPN) model is used to represent fuzzy rule base and to derive the final detection decision as an inference engine. FRPN have parallel reasoning ability and
are readily used into real time detection.
In 2003, Sarawagi and Sunita in their paper on Sequence data mining techniques
and applications comment that many interesting real-life mining applications
rely on modeling data as sequences of discrete multi-attribute records. Mining
models for network intrusion detection view data as sequences of TCP/IP packets. Erbacher and Robert in 2003 presented a paper on Analysis and Application
of Node Layout Algorithms for Intrusion Detection. The proposed monitoring
environment aids system administrators in keeping track of the activities on
such systems with much lower time requirements than that of perusing typical
log files. With many systems connected to the network the task becomes significantly more difficult. If an attack is identified on one system then all systems
have likely been attacked. The ability to correlate activity among multiple machines is critical for complete analysis and monitoring of the environment. This
paper discusses the layout techniques experimented with and their effectiveness.
Zhong and Shao-Chun presented a paper on A safe mobile agent system for
distributed intrusion detection where some applications of the technology of
mobile agent (MA) in Intrusion detection system have been developed. MA
technology can bring IDS flexibility and enhanced distributed detection ability.
The MA-IDS architecture and detail methods of local intrusion detection and
distributed intrusion detection are presented. Sabhnani and Maheshkumar in
their work on Application of Machine Learning Algorithms to KDD Intrusion
Appendix B
205
Appendix B
206
profile to validate attack. Amo and Sandra in 2003 presented the paper on Mining generalized sequential patterns using genetic programming. They propose
a new kind of sequential pattern called Generalized Sequential Pattern, and introduce the problem of mining generalized sequential patterns over temporal
databases.
Ye and Nong in 2004 had a paper on Robustness of the Markov-chain model for
cyber-attack detection. This paper presents a cyber-attack detection technique
through anomaly-detection, and discusses the robustness of the modeling technique employed. In this technique, a Markov-chain model represents a profile of
computer-event transitions in a normal/usual operating condition of a computer
and network system. The Markov-chain model of the norm profile is generated
from historic data of the systems normal activities. The observed activities of
the system are analyzed to infer the probability that the Markov-chain model of
the norm profile supports the observed activities. The lower probability the observed activities receive from the Markov-chain model of the norm profile, the
more likely the observed activities are anomalies resulting from cyber-attacks,
and vice versa.
Xu and Ming in their paper Anomaly detection based on system call classification aim to create a new anomaly detection model based on rules. A detailed
classification of the LINUX system calls according to their function and level
of threat is presented. The detection model only aims at critical calls (i.e. the
threat level 1 calls). In the learning process, the detection model dynamically
processes every critical call, but does not use data mining or statistics from static
data. Therefore, the increment learning could be implemented. Based on some
simple predefined rules and refining, the number of rules in the rule database
could be reduced, so that the rule match time can be reduced effectively during
detection processing. The experimental results demonstrate that the detection
model can detect R2L and U2R attacks. The detected anomaly is limited in the
corresponding requests, but not in the entire trace. The detection model is fit for
the privileged processes, especially for those based on request-responses.
In 2004, Yang and Hongyu introduced a different idea with Decision Support
Appendix B
207
Module introduced to Intrusion Detection. They presented a paper on An application of decision support to network intrusion detection in 2004 which briefs
network intrusion system, describes the design of a decision support module
(DSM) for intrusion detection system, which can provide active detection and
automated response support during intrusions. The primary function of the decision support module is to provide recommended actions and alternatives and
the implications of each recommended action. In the decision support module, the GA (genetic algorithm) was run over a subset of the data, called the
training data, and then tested over the entire data set to test real-world performance. Zhang and Lian-Hua in their paper on Intrusion detection using rough
set classification in 2004 comment that recently machine learning-based intrusion detection approaches have been subjected to extensive researches because
they can detect both misuse and anomaly. In this paper, rough set classification
(RSC), a modern learning algorithm, is used to rank the features extracted for
detecting intrusions and generate intrusion detection models.
Imamura and Kosuke in 2004 presented a paper on Potential application of
training based computation to intrusion detection comments that without detection of a network intrusion, a system is not capable of properly defending itself.
Therefore, the first step in preserving system integrity is to detect whether or not
the system is under attack. Packet analysis approaches are effective at detecting known attacks, but fail at unknown attack detection. In order to protect the
system from unknown attacks, a classifier system which is independent of the
signatures found in network packets is developed. One of the promising ways
to perform this classification is to profile kernel level activities. A probabilistically optimal classifier ensemble method is used to monitor kernel activity, and
ultimately to predict whether or not the system is under attack.
Du and Yan-Hui in their paper on Formalized description of distributed denial of service attack try to analyze, check and judge DDoS. Based on a careful
study of the attack principles and characteristics, an object-oriented formalized
description is presented, which contains a three-level framework and offers full
specifications of all kinds of DDoS modes and their features and the relations
between one another. Its greatest merit lies in that it contributes to analyzing,
Appendix B
208
checking and judging DDoS. Teng and Shaohua in their paper on Scan attack
detection model by combining feature and statistic analysis in 2004 remark that
attackers often find a host that is attacked in the Internet by scan, so a lot of
attacks can be prevented if such scan attacks are detected. Presently, there are
mainly two kinds of methods to detecting scan attacks: statistics-based detection and feature-based detection. But the high rate of false negative and false
positive makes them not very effective. In this study, a new method for detecting scan attacks is presented by combining feature with statistic analysis. It can
detect efficiently scan attacks with lower false positive rate and false negative
rate.
Teng and Shaohua again in a different paper on Case reasoning and state transition analysis for intrusion detection in 2004 make remarks that when a new
intrusion scenario is developed, many intrusion methods can be derived by exchanging the command sequences or replacing commands with the functionally
similar commands, which makes the detection of the developed intrusion very
difficult. To overcome this problem, a Case Reasoning And State Transition
Analysis (CRASTA) is proposed in this paper. For an intrusion case all the
possible derived intrusions are generated as an intrusion base and based on this
intrusion base, an efficient algorithm to detect such intrusions by using finite
automation is presented. A derived intrusion can be seen as an unknown intrusion, in this sense the technique presented can detect some unknown intrusions.
Zhao and Yuming in their paper on Study of anomaly detection based on system
call and data mining technology in 2004 introduce the categories of intrusion
detection and the methods of data mining applied in anomaly detection. It also
describes the design and implementation of the anomaly IDS based on system
calls and data mining algorithms.
Abraham [171] in 2004 investigated the suitability of linear genetic programming (LGP) technique to model fast and efficient intrusion detection systems.
The performance and accuracy of the LGP was compared to results obtained
by ANN and regression tree methods. Experiments performed over the popular
DARPA IDS data set showed that LGP outperformed decision trees and support
Appendix B
209
vector machines in terms of detection accuracies (except for one class). Decision trees were considered as the second best, especially for the detection of
U2R attacks.
Xu and Ming in the paper Two-layer Markov chain anomaly detection model
in 2005 propose, on the basis of the current single layer Markov chain anomaly
detection model, a new two-layer model. Two distinctly different processes,
the different requests and the system call sequence in the same request section,
are classified as two layers and dealt with by different Markov chains respectively. The two-layer frame can depict the dynamic activity of the protected
process more exactly than the single layer frame, so that the two-layer detection
model can promote the detection rate and degrade the false alarm rate. Furthermore, the detected anomaly will be limited in the corresponding request
sections where anomaly happens. The new detection model is suitable for privileged processes, especially for those based on request-response.
Zhao et al. [172] in 2005 have proposed a misuse detection system and anomaly
detection system that encode an experts knowledge of known patterns of attack
and system vulnerabilities as if-then rules. The normal connection and intruded
connection divided into different clustering sets and then to distinguish them,
then the researchers integrate GA to detect intruded action. The researchers system combines two stages (clustering stage and genetic optimizing stage) into the
process. The GA was successfully applied and learned to a real-world test case.
At almost the same time, Gong et al. [173] chose GA approach to network misuse detection, because robust to noise, no gradient information is required to
find a global optimal or suboptimal solution, self-learning. Kim et al. in 2005
[174] proposed Genetic Algorithm to improve Support Vector Machines based
IDS. They fused GA and SVM in order to improve the overall performance of
IDS. An optimal detection model for SVM classifier was determined. As the
result of the fusion, SVM based IDS did not only select optimal parameters for
SVM but also optimal feature set among the whole feature set.
Appendix B
210
Abraham and Grosan [175] evaluated the performances of two Genetic Programming techniques for IDS, Linear Genetic Programming (LGP) and MultiExpression Programming (MEP) and provided a comprehensive comparison
of obtained results with selected nonevolutionary machine learning techniques
such as Support Vector Machines (SVM) and Decision Trees (DT). Based on
numerical experiments and comparisons, they showed that Genetic Programming techniques outperformed the reference machine learning methods. In
detail, the MEP outperformed LGP for three of the considered classes and
LGP outperform MEP for two of the classes. MEP classification accuracy was
greater than 95% for all considered classes and for three of them is greater
than 99.75%. Moreover, they suggested that for real time intrusion detection
systems, MEP and LGP would be the ideal candidates because of its simple
implementation.
Anomaly detection or the behavior-based detection or Heuristic detection methods use information about repetitive and usual behavior on the systems they
monitor, and this approach identifies events that deviate from expected usage
patterns as malicious. Most anomaly detection approaches attempt to build
some kind of a model over the normal data and then check to see how well
new data fits into that model. In other words, anything that does not correspond
Appendix B
211
Appendix B
212
to a previously learned behavior is considered intrusive. Therefore, the intrusion detection system might not miss any attacks, but its accuracy is a difficult
issue, since it can generate a lot of false alarms. Examples of anomaly detection systems are IDES, NIDES, EMERALD and Wisdom and Sense. Anomaly
detection can be either by unsupervised learning techniques or by supervised
learning techniques.
1. Unsupervised learning systems Unsupervised or self-learning systems learn
the normal behavior of the traffic by observing the traffic for an extended
period of time and building some model of the underlying process. Examples include such techniques such as Hidden Markov Model (HMM) and
Artificial Neural Network (ANN). More details are available in the work
of Sundaram [176].
2. Supervised Systems In the programmed systems or the supervised learning method, the system has to be taught to detect certain anomalous events.
The supervised anomaly detection approaches build predictive models provided labeled training data (normal or abnormal users or applications behavior) are available. Thus the user of the system forms an opinion on
what is considered abnormal for the system to signal a security violation.
Advantages of behavior-based approaches
Detects new and unforeseen vulnerabilities.
Less dependent on operating system-specific mechanisms.
Detect abuse of privileges types of attacks that do not actually involve exploiting any security vulnerability.
Disadvantages of behavior-based approaches
The high false alarm rate is generally cited as the main drawback of
behavior-based techniques because:
The entire scope of the behavior of an information system may not be
covered during the learning phase.
Behavior can change over time, introducing the need for periodic online retraining of the behavior profile.
Appendix B
213
The information system can undergo attacks at the same time the intrusion detection system is learning the behavior. As a result, the behavior profile contains intrusive behavior, which is not detected as
anomalous.
It must be noted that very few commercial tools today implement such
an approach, leaving anomaly detection to research systems, even if the
founding paper by Denning [7] recognizes this as a requirement for IDS
systems.
Knowledge-based detection methods
Appendix B
214
Host-based monitoring
A host-based IDS is deployed on devices that have other primary functions such
as Web servers, database servers and other host devices. Host logs, comprised
of the combination of audit, system and application logs, offer an easily accessible and non-intrusive source of information on the behavior of a system. In
addition, logs generated by high-level entities can often summarize many lowerlevel events such as a single HTTP application log entry covering many system
calls, in a context-aware fashion. A host-based IDS provides information such
as user authentication, file modifications/deletions and other host-based information, thus designated as secondary protection to devices on the network. Examples of HIDS products are EMERALD, NFR etc.
Advantages of Host-based Intrusion Detection Systems
Although overall host-based IDS is not as robust as network-based IDS, hostbased IDS does offer several advantages over network-based IDS:
More detailed logging:- HIDS can collect much more detail information
regarding exactly what occurs during the course of an attack.
Increased recovery:- Because of the increased granularity of tracking events
in the monitored system, recovery from a successful incident is usually
more complete.
Detects unknown attacks:- Since the attack affects the monitored host,
HIDS detects unknown attacks more than the Network-based IDS.
Fewer false positives:- The way HIDS works provides fewer false alerts
than produced by Network-based IDS.
Disadvantages of Host-Based IDS
Indecipherable information:- Because of network heterogeneity and the
profusion of operating systems, no single host-based IDS can translate all
operating systems, network applications, and file systems. In addition,
in the absence of something like a corporate key, no IDS can decipher
encrypted information.
Appendix B
215
Indirect information:- Rather than monitor activity directly (as do networkbased IDS), host-based IDS usually rely heavily or completely on an audit
record of activity that is created by a system or application. This audit
record varies widely in quality and quantity between different systems and
applications, thus dramatically affecting IDS effectiveness.
Complete coverage:- Host-based IDS are installed on the system being
monitored. On very large networks this can comprise many thousands
of workstations. Providing IDS on this scale is both very expensive and
difficult to manage.
Outsiders:- A host-based IDS can potentially detect an outside intruder
only after the intruder has reached the monitored host system, not before,
as can network-based IDS. To reach a host system, the intruder must have
already bypassed network security measures.
Host interference:- Host-based IDS places such a load on the host CPU as
to interfere with normal host operations. On some systems, just invoking
an audit record sufficient for the IDS can result in unacceptable loading.
Network-based monitoring
The sole function of network-based IDS is to monitor the traffic of that network.
This ensures that the IDS can observe all communication between a network
attacker and the victim system, resolving many of the problems associated with
log monitoring. Typical Network-based IDS are Microsoft Network Monitor,
Cisco Secure IDS (formerly NetRanger), Snort etc.
Advantages of network-based intrusion detection
Ease of deployment:- Passive nature and hence few performance or compatibility issues in the monitored environment.
Cost:- Strategically placed sensors can be used to monitor a large organizational environment where as a host-based IDS requires software on each
monitored host.
Appendix B
216
Appendix B
217
segments. To provide coverage, the IDS user must select key shared-access
segments for IDS sensors. Most frequently they place sensors in the demilitarized zone and, in some cases, in front of port and server farms. To
monitor distributed ports, internal attack points, distributed Ethernet connections, and desktops, many sensors must be installed. Even then, elastic
or unauthorized connections such as desktop dial-ins and modems will not
be monitored.
Switched networks:- To make matters worse, switching has replaced shared
/routed networks as the architecture of choice. Switching effectively hides
traffic from shared-access network-based IDS products. Switched networks fragment communication and divide a network into myriad micro
segments that make deploying shared-access IDS prohibitively expensive
since to provide coverage, very many sensors must be deployed. Alternatives could be attaching hubs to switches wherever switched traffic must be
monitored or mirroring selected information such as that moving to specific critical devices, to a sensor for processing. None of these are easy or
ideal solutions.
Insiders:- Network-based IDS focus is on detecting attacks from outside,
rather than attempting to detect insider abuse and violations of local security policy
Host network monitoring
Host Network Monitoring is also called the network-node or the hybrid intrusion detection, this approach is used in personal firewalls and some IDS probe
designs lies in combining network monitoring with host-based probes. By observing data at all levels of the hosts network protocol stack, the ambiguities
of platform-specific traffic handling and the problems associated with cryptographic protocols can be resolved. The data and event streams observed by
the probe are those observed by the system itself. This approach offers advantages and disadvantages similar to both alternatives listed above. It resolves
many of the problems associated with promiscuous network monitoring, while
maintaining the ability to observe the entire communication between victim
and attacker. Like all host-based approaches, however, this approach implies a
Appendix B
218
Appendix B
219
Network packets
The IDS includes a network-based sensor designed to capture and process network packets and decipher at least one network protocol (e.g. TCP/IP).
Audit trial
The IDS includes a host-based agent designed to process the audit record of at
least one specific operating system (e.g., Solaris, Ultrix, Unicos).
B.3.4 Architecture
The IDS should provide a distributed capability, since this component of scalability is vital for effective deployment of IDS in the vast majority of corporate
networks. A distributed capability means that both a central manager or managers and local collection/ processing agents placed as needed throughout the
monitored network provide the IDS functionality. However, some products are
available in both a local and distributed versions.
Monolithic systems
If one considers the alerts generated by an IDS instance to be events in themselves, suitable for feeding into a higher-level IDS structure, an intrusion detection hierarchy results. At the root of the hierarchy, lie a resolver unit and
controller. Below this lie one or more monitor components, with subsidiary
Appendix B
220
probes distributed across the protected systems. Effectively, the whole hierarchy forms a macro-scale IDS. The use of a centralized controller unit allows
information from different subsystems to be correlated, potentially identifying
transitive or distributed attacks. For example, a simple address range probe,
while difficult to detect using a network of monolithic host IDS instances, can
be trivial to observe when correlating connections using a hierarchic structure.
Agent-based systems
A more recent model of IDS architecture divides the system into distinct functional units: probes, monitors, resolver and controller units. These may be
distributed across multiple systems, with each component receiving input from
a series of subsidiaries, and reporting to one or more higher-level components.
Probes report to monitors, which may report to resolver units or higher-level
monitors, and so forth. This architecture, implemented in systems such as
EMERALD, allows great flexibility in the placement and application of individual components. In addition, this architecture offers greater survivability in
the face of overload or attack, high extensibility, and multiple levels of reporting
throughout the structure. FIRE is a product prototype that uses agent-based approach of intrusion detection. The paper, which discusses this method in detail,
is [186].
Distributed systems
All the IDS architectural models described so far consider attacks in terms of
events on individual systems. A recent development, typified by the GrIDS
system, lies in regarding the whole system as a unit. Attacks are modeled as
interconnection patterns between systems, with each link representing network
activity. The graphs that form can be viewed at different scales, ranging from
small systems to the interconnection between large and complex systems (where
sub-networks are collapsed into points). This novel approach promises high
scalability and the potential to recognize widely distributed attack patterns such
as worm behavior. Also this architecture is implemented in DIDS.
Appendix B
221
In passive response, the IDS simply generates alarms to inform responsible personnel of an event by way of console messages, email, paging, and report updates. Passive or indirect gathering of information aids in identifying the source
of attack using techniques such as DNS lookups, passive fingerprinting etc.
Appendix B
222
Reactive
Appendix B
223
system for sharing security event data among administrative domains. Using AirCERT, organizations can exchange security data ranging from raw
alerts generated automatically by network intrusion detection systems (and
related sensor technology), to incident reports based on the assessments of
human analysts.
ISS Real Secure This IDS works satisfactorily at Gigabit speed. The high
speed is possible by IDS integrated into the switch or by using a specific
port called the span port, which mirrors all the traffic on the switch. The
Blackice technology of this sensor includes protocol analysis and anomaly
detection combined with the Real Secures library of signature-based detection capabilities.
Real Secure Server Sensor It is a hybrid IDS, which resides on one host and
still monitors the network traffic and detects attacks in the network layer
of the protocol stack. However the sensor also detects attacks at higher
layers and therefore it can detect attacks hidden in encrypted sessions such
as IP sec or SSL encryptions. The sensors can also monitor application
and operating system logs.
Snort Snort is an Open Source Network Intrusion Detection Systems that
keeps track of intrusion attempts, signs of possible bad behavior or hacking exploits. It is capable of performing real-time traffic analysis and
packet logging on IP networks. It can perform protocol analysis, content searching/matching and can be used to detect a variety of attacks and
probes, such as buffer overflows, stealth port scans, CGI attacks, SMB
probes, OS fingerprinting attempts, and much more. It is non-intrusive,
easily configured, utilizes familiar methods for rule development, currently
includes the ability to detect more than 1200 potential vulnerabilities.
Sourcefire Founded by the creators of Snort, the most widely deployed Intrusion Detection technology worldwide, Sourcefire has been recognized
throughout the industry for enabling customers to quickly and effectively
address security risks. Today, Sourcefire is redefining the network security industry by combining enhanced Snort with sophisticated proprietary
technologies to offer the first ever unified security monitoring infrastructure, delivering all of the capabilities needed to proactively identify threats
Appendix B
224
Appendix B
225
Appendix B
226
the effectiveness of data mining techniques that utilize fuzzy logic. This
system combines two distinct intrusion detection approaches: Anomaly
based intrusion detection using fuzzy data mining techniques, and Misuse detection using traditional rule-based expert system techniques. The
anomaly-based components look for deviations from stored patterns of
normal behavior. The misuse detection components look for previously
described patterns of behavior that are likely to indicate an intrusion. Both
network traffic and system audit data are used as inputs. The paper, which
describes this prototype, is [193].
DERBI DERBI is a computer security tool that targets at diagnosing and
recovering from network-based break-ins. The technology adopted has the
ability to handle multiple methods (often with different costs) of obtaining
desired information, and the ability to work around missing information.
The prototype will not be an independent program, but will invoke and
coordinate a suite of third-party computer security programs (COTS or
public) and utility programs.
MINDS MINDS (Minnesota Intrusion Detection System) project is developing a suite of data mining techniques to automatically detect attacks against
computer networks and systems. It uses an unsupervised anomaly detection system that assigns a score to each network connection that reflects
how anomalous that connection is.
NetSTAT NetSTAT is a tool aimed at real-time network-based intrusion detection. The NetSTAT approach extends the state transition analysis technique (STAT) to network-based intrusion detection in order to represent
attack scenarios in a networked environment. Net-STAT is oriented towards the detection of attacks in complex networks composed of several
subnetworks.
BlackICE The BlackICE IDS scans network traffic for hostile signatures in
much the same way that virus scanners examine files for virus signatures.
BlackICE runs at 148,0 00 packets per second, checks all 7 layers of the
stack and rates each attack on a scale of 1 to 100 so that only attacks
it considers serious are alerted. There are two versions: desktop agent
Appendix B
227
(BlackICE Defender) and network agent (BlackICE Sentry) . The desktopagent runs on Win95/WinNT desktop. The network agent runs just like any
other sniffer-type IDS.
Cylops
Snort-based Cyclops IDS provides advanced and flexible intrusion detection at
Gigabit speeds and secures networks by performing high-speed packet analysis
to detect malicious activities in real-time and automatically launch preventive
measures before security can be compromised.
Dragon Sensor
Dragon sensor detects suspicious activity with both signature based and anomaly
based techniques. Its library of attacks detects thousands of potential network
attacks and probes, and also hundreds of successful system compromises and
backdoors.
E-Trust
eTrust intrusion detection delivers state-of-the-art network protection including
DDoS attacks. All incoming and outgoing traffic is checked against a categorized list of web sites to ensure compliance. It is then checked for content,
malicious codes and viruses and notify the administrator of offending payloads.
Manhunt
Symantec ManHunt provides high-speed, network intrusion detection, real-time
analysis and correlation, and proactive prevention and response to protect enterprize networks against internal and external intrusions and denial-of-service
attacks. The ability to detect unknown threats using protocol anomaly detection, helps in eliminating network exposure and the vulnerability inherent in
signature-based intrusion detection systems. Symantec ManHunt traffic rate
monitoring capability allows for detection of stealth scans and denial-of-service
attacks that can cripple even the most sophisticated networks.
NetDetector
Appendix B
228
NetDetector is a network surveillance system for IP networks that provides nonintrusive, continuous traffic recording and real-time traffic analysis. NetDetector records network traffic, analyzes every packet, detects the activities of intruders, sets alarms for real-time alerting, and gathers evidence for post-event
analysis.
Appendix B
229
relation to the users functions. These tasks are taken as series of actions,
which in turn are matched to the appropriate audit data. The analyzer keeps
a set of tasks that are acceptable for each user. Whenever a mismatch is
encountered, an alarm is produced. SECURENET uses this technique for
intrusion detection.
4. Computer immunology Analogies with immunology has lead to the development of a technique that constructs a model of normal behavior of UNIX
network services, rather than that of individual users. This model consists
of short sequences of system calls made by the processes. Attacks that exploit flaws in the application code are very likely to take unusual execution
paths. First, a set of reference audit data is collected which represents the
appropriate behavior of services, then the knowledge base is added with all
the known good sequences of system calls. These patterns are then used
for continuous monitoring of system calls to check whether the sequence
generated is listed in the knowledge base; if not an alarm is generated.
This technique has a potentially very low false alarm rate provided that the
knowledge base is fairly complete. Its drawback is the inability to detect
errors in the configuration of network services. Whenever an attacker uses
legitimate actions on the system to gain unauthorized access, no alarm is
generated.
5. Machine learning This is an artificial intelligence technique that stores the
user-input stream of commands in a vectorial form and is used as a reference of normal user behavior profile. Profiles are then grouped in a library
of user commands having certain common characteristics.
6. Data mining Data mining generally refers to a set of techniques that use
the process of extracting previously unknown but potentially useful data
from large stores of data. Data mining method excels at processing large
system logs of audit data. However they are less useful for stream analysis
of network traffic. One of the fundamental data mining techniques used in
intrusion detection is associated with decision trees. Decision tree models
allow one to detect anomalies in large databases. Another technique refers
to segmentation, allowing extraction of patterns of unknown attacks. This
is done by matching patterns extracted from a simple audit set with those
Appendix B
230
Appendix B
231
Appendix B
232
Appendix B
233
application level. It is the Netfilter/IPtables software that allows for the implementation of the response mechanism while Snort Inline provides policies
based on which IPtables makes the decision to allow or to deny packets. After
an incoming packet to the network is provided by IPtables, Snort performs the
rule matching against the packet. Thus Snort Inline provides a more proactive
and dynamic capability against todays attacks. However, the rule matching is
against a statically created rule base and thus needs prior estimate of the kinds
of attacks that will be seen and the action is taken at the site of detection.
McAfee Internet Security Suite (ISS) has been developed for the Windows operating system platform that integrates many security technologies to protect
desktop computers from malicious code, spam and unwanted or unauthorized
access. Thus it functions both as an antivirus as well as a firewall. The antivirus subsystem allows for the detection of viruses, worms, and other types of
malicious code by using a signature-base approach along with a heuristic engine for unknown attacks. The firewall component scans multiple points of data
entry. McAfee IntruShield IPS is a Network Prevention product for encrypted
attacks, botnets, and VoIP vulnerability based attacks. It delivers unique forensic features to analyze key characteristics of known and zero-day threats and
intrusions.
Appendix B
234
circumstances that may exacerbate or ameliorate the problems. The IDS fusion
offers the following advantages over a single IDS:
Analytically proved [220] higher system detection rates and lower system
false alarm rates than those of a single IDS or a weighted average.
Error probabilities of fusion system are significantly reduced and approach
zero[220].
As Axelsson highlights it in [221], In reality there are many different types
of intrusions, and different detectors are needed to detect them. The same
argument is made by Lee et al. [222] and additionally they mention that, Combining evidence from multiple base classifiers ... is likely to improve the effectiveness in detecting intrusions. As such, analyzing the data from multiple
sensors should increase the accuracy of the IDS [222]. Kumar [15] observes
that, Correlation of information from different sources has allowed additional
information to be inferred that may be difficult to obtain directly. Such correlation is also useful in assessing the severity of other threats, be it severe because
an attacker is making a concerted effort to break in to a particular host, or severe
because the source of the activity is a worm with the potential to infect a large
number of hosts in a short amount of time.
Multisensor correlation has long been a theme in Intrusion Detection, especially
as most of the early IDS work took place in the wake of the Morris Worm, as
well as by the need to centrally manage the alerts from a network of host based
IDSs. Recently, a great deal of work has been done to standardize the protocols
that IDS components use to communicate with each other. The first solid protocol to do this is the Common Intrusion Detection Format (CIDF) [223]. CIDF
spurred additional work in protocols for multisensor correlation, for example,
Ning et al. [224] extended CIDF with a query mechanism to allow IDSs to
query their peers to obtain more information on currently observed suspicious
activity.
B.7.1 Existing fusion IDSs
Some of the IDS that make use of multisensor correlation and various fusion
techniques are covered in this section.
Appendix B
235
Research IDSs
The first couple of IDSs that performed data fusion and cross sensor correlation were the Information Security Officers Assistant (ISOA) [225] and the
Distributed Intrusion Detection System (DIDS)[226]. ISOA used the audit information from numerous hosts whereas DIDS used the audit information from
numerous host and network-based IDSs. Both made use of a rule-based expert
system to perform the centralized analysis. The primary difference between the
two was that ISOA was more focused on anomaly detection and DIDS on misuse detection. Additional features of note were that ISOA provided a suite of
statistical analysis tools that could be employed either by the expert system or
a human analyst, and the DIDS expert system featured a limited learning capability.
EMERALD was an extension to NIDES [7, 227] with a hierarchical analysis
system. The various levels (host, network, enterprize, etc) would each perform
some level of analysis and pass any interesting results up the chain for correlation [228, 229, 230]. It provided a feedback system such that the higher levels
could request more information for a given activity. Of particular interest is
the analysis done at the top level which monitored the system for network-wide
threats such as Internet worm-like attacks, attacks repeated against common
network services across domains, or coordinated attacks from multiple domains
against a single domain, [228]. The EMERALD architects employed numerous
approaches such as statistical analysis, an expert system and modular analysis
engines as they believed, no one paradigm can cover all types of threats.
Commercial IDSs
RealSecure Siteprotector does advanced data correlation and analysis by interoperating with other Realsecure products [231]. Symantec ManHunt [232]
and nSecure nPatrol [233] integrate the means to collect alarms. Cisco IDS
[234] and Network Flight Recorder (NFR) [235] provide a means to do centralized sensor configuration and alarm collection. The problem with all of these
systems is that they are designed more for prioritizing what conventional intrusion (misuse) detection systems already detect, and not for finding new threats.
Other products, such as Computer Associates eTrust Intrusion Detection Log
Appendix B
236
View [236], and NetSecure Log [237] are more focused on capturing log information to a database, and doing basic analysis on it. Such an approach seems
to be more oriented towards insuring the integrity of the audit trail (itself an
important activity in an enterprize environment), than data correlation and analysis.
B.7.2 Current status of applying sensor fusion in IDS
Despite the proved utility of multiple classifier systems, no general answer
to the original question about the possibility of exploiting the strengths while
avoiding the weaknesses of different IDS designs has yet emerged. Many fundamental issues are a matter of ongoing research in different research communities. The results achieved during the past few years are also spread over different
research communities, and this makes it difficult to exchange such results and
promote their cross-fertilization.
B.8 Conclusion
Intrusion Detection System is currently gaining considerable interest from both
research community and commercial companies. It has become an indispensable and integral component of any comprehensive enterprize security program.
The reason being that the intrusion detection system has the potential to alleviate many of the problems facing current network security. A number of the
techniques and solutions found in current systems and literature are outlined in
this work. As evidenced by recent events, however, network security has some
way to go before any network can be considered safe and hence its near-term
future is very promising. It is clear though, that under the pressures of a highly
competitive global research environment, the field of Intrusion Detection System will re-mould rapidly and overcome many current limitations and hurdles.
Appendix C
Modeling of the Internet Attacks and the
Countermeasure for Detection
Success is the ability to go from one failure to another with no loss of enthusiasm.
Winston Churchill
C.1 Introduction
This appendix introduces dynamic models for the attack-detector interactions
with the simple Nicholson-Bailey precursor, in which the detector is randomly
searching for the attack on the network traffic independent of the attack distribution. The dependence between the detectors and their heterogeneity is introduced as a subsequent step. The heterogeneity is incorporated by the use of negative binomial distribution as introduced in chapter 2, which also accounts for
the non-randomness in the attacks and the detectors. The attack-detector models that incorporate the attack carrying capacity, detector improvement with the
attacks detected, detector correlation and the non-randomness of attacks and detectors have been derived in this appendix. The proposed modeling idea is new
and the related works other than Shimeall and Williams [54] and Browne et al.
[55] are discussed here.
Ravishankar Iyer et al.[238] combine an analysis of data on security vulnerabilities and a focused source-code examination to develop a Finite State Machine
237
Appendix C
238
(FSM) model to depict and reason about security vulnerabilities and also to extract characteristics shared by a large class of commonly seen vulnerabilities.
This information is used to devise a generic, randomization-based technique
for protecting against a wide range of security attacks. Jonsson and Olovsson
[239, 240] try to quantitatively model the security intrusion process based on
attacker behavior. This model presents the phases in performing attacks on a
system in the presence of detector system. They discuss the three phases in the
security intrusion process namely the learning phase, the standard attack phase
and the innovative attack phase.
Ed Skoudis [241] in his book Counter Hack: A step-by-step guide to computer attacks and effective defenses presents a model of an attack using five
phases: reconnaissance, scanning, gaining access, maintaining access and covering tracks. McDermott [242] mentions that most of the quantitative models of security or survivability have been defined on a range of probable intruder behavior. This measures survivability as a statistic such as mean time to
breach. This kind of purely stochastic quantification is not suitable for highconsequence systems. Detailed aspects of the intruders attack potential can
have significant impact on the expected survivability of an approach.
This section also surveys the different research efforts related to the field of
intrusion correlation. IBM has developed a prototype called the aggregation
and correlation component (ACC) [243]. The purpose of the aggregation and
correlation algorithm is to form groups of related alerts using a small number
of relationships. M2D2 uses a formal data model to include external information in the alert correlation process [244]. Four different information types are
handled: information about the monitored system, information about known
vulnerabilities, information about security tools (vulnerability scanners and intrusion detection systems), and information generated by the security tools, e.g.
scans and alerts. A relational database is used to store information from IDS
and scanners, together with product information from the ICAT vulnerability
database [245].
SRI has introduced a probabilistic approach to alert correlation [246, 247, 248].
Appendix C
239
Appendix C
240
Ne
At ;
when Dt = 1).
GX
x
X!;
where P (X) is the probability of X occurrences and Gx is the average occurrence of X. Solving for zero detection, i.e. probability of not being detected
(proportion of attacks undetected) is given by P (0) = exp(Gx ).
Setting Gx =
Ne
At
e
gives P (0) = exp( N
At ).
Appendix C
241
2.5
x 10
ATTACKDETECTOR RELATIONSHIP
A(t)
D(t)
1.5
0.5
Time
(C.1)
Dt+1 = At [1 exp(dDt )]
(C.2)
and
respectively. Figure C.1 shows the attack-detector relationship using the NicholsonBailey model with typical values and initial conditions as a = 0.25, A(1) =
20000, d = 0.9, D(1) = 1 and t varying from 1 to 5. This basic NicholsonBailey model showing the detector performance over the years was in agreement
with the figure of merit of IDSs over the years from 1995 to 2004 as shown in
figure 2.6. The decrease in the growth rate of attacks over the years as seen in
the Figure C.1 depends on the number of detectors initially deployed and also
on the efficiency of the deployed detectors. The actual attacks that happen on
the Internet is expected to be more than what gets reported. This is because
some of the attacks may not be detected and then it is only a small portion of
the detected attacks that gets reported.
Appendix C
242
With the available experimental data to test the model and also with the practical data in the work of Shimeall et al. [54], it is seen that the Nicholson-Bailey
model is in good agreement. It is seen to provide a quantitative possibility of oscillations in the attack-detector interactions. The attack-detector interactions are
often characterized by very strong fluctuations from year-to-year, and then the
complete extinction of either the attack or the detector. Moreover, during certain
time span, detector levels become so low that the model predicts the eventual
extinction of the detector, when the attacks increase exponentially or vice-versa.
In this appendix, an attempt has been made to model the dynamic relationship
existing between the detectors and the attacks and this knowledge can be used
to enrich the design and development of IDSs. For each combination of a and
d there is an unstable attack-detector equilibrium, with the slightest disturbance
leading to expanding population oscillations. With Nicholson-Bailey equations
C.1 and C.2, it is shown that depending on the initial state, the system can evolve
towards a simple steady state or a limit cycle, in which the attack-detector populations oscillate periodically in time. The attack-detector relationship may thus
exhibit coupled oscillations. The aim is to study the oscillatory behavior of an
attack-detector model with intelligent pursuit and evasion rules. Usually detectors respond to attack distribution. A constant searching efficiency is more
difficult to accept. Searching efficiency depends on the speed of the traffic,
attack density on a priori grounds and also on the detector density.
C.2.1 Attack/Detection as they stand alone
This appendix investigates the fate of the attack and the detector in the absence
of the other, in order to assess whether the modeling of attack-detector relationship using the Nicholson-Bailey model would be reasonable.
At+1 = aAt exp(dDt )
and
Dt+1 = At [1 exp(dDt )]
In the absence of detectors At+1 = aAt and Dt+1 = 0; attacks increase exponentially and at any time (t + n), At+n = an At . With this simple model, it
Appendix C
243
is clear that when the detector density Dt = 0, the attack density will follow
the logistic function. It is reasonable to set an attack carrying capacity k for the
attacks beyond a certain limit if the detectors are totally absent.
In the absence of any attack, it is reasonable to assume that the presence of
detectors is of no use. So the existing detection systems will also die out at
the next instant of time if there are no attacks. i.e., if At = 0; Dt+1 = 0,
At+1 = 0. To explain in detail, the beginning years of the Internet is taken as an
year of no attack, the succeeding year will not have any detectors. This happens
till attacks are found and with a latency, detectors evolve. The detector density
is a function of the attack density weighted by the efficiency and the density of
the detector. As soon as one of these detector parameters becomes very large,
the detector density will match, but never overcomes the attack density.
The basic model of Nicholson-Bailey can be extended by incorporating additional features such as density-dependence in the attacks, interference among
detectors, and the refuges. The classification of the detectors spans a wide
range of complexity. The general statistics show that the more the detectors are
successful in detecting the attacks, the more are the chances of highly sophisticated detectors emerging, possibly learning from the detected attacks. Thus
the Nicholson-Bailey model for attack-detector modeling is based on the data
that reflect the knowledge that one has about the system and/or the potential
attacks, but it does not express all the different possibilities that are encountered
in the attack-detector interaction. The following sections look into the different
possibilities in order to generalize the attack-detector interactions.
C.2.2 Attack carrying capacity
The Nicholson-Bailey model suffers from the important defect of having the
attacks with a constant rate of increase and thus a potentially unlimited number
of attacks. It is necessary to take into account the fact that the attack density
does not exceed beyond some carrying capacity. The first definitive theoretical
treatment of this relationship is that while detectors and attacks grow logarithmically, the resources on which they depend may not follow such a fast rate
of increase. Thus the demand for resources must eventually exceed the supply,
Appendix C
244
and population growth being dependent on the resource supply, must then cease.
This is mathematically modeled as a logistic equation with the attack remaining
at a saturation value equal to the attack carrying capacity. Practically there are
technical bottlenecks for the attacks to increase beyond this value; other reasons
of the sated state of the attacks can be the ineffectiveness of detectors, or that
the existing attacks serve all the malicious intents. Hence the attacks should saturate at the attack carrying capacity given by At+1 = At when At = k, where k
is the attack equilibrium density or the attack carrying capacity. This condition
is substituted in the attack equation
At+1 = aAt exp(dDt )
Attacks saturate as At+1 = At = k. This simplifies the attack equation to
exp(dDt ) = a1 or dDt = ln(a). Hence to incorporate the attack carrying
capacity, the attack-detector equations can be modified as:
At+1 = aAt exp(ln(a) Akt dDt )
and
Dt+1 = At [1 exp(dDt )]
respectively, where ln(a) is density-dependent using the expression ln(a) Akt ,
such that as At approaches k, the growth rate of attack approaches zero. The
introduction of the carrying capacity is to cause the system to be stable, thereby
making the Nicholson-Bailey model more realistic. The impact of the attack
getting sated on the detector is that it will vary depending on the probability of
detecting attacks. The detectors pick up to a stage of maximum detection during
this time when the attacks are sated. If given enough time lag, detectors also
will stabilize. Since k depends on the technology bottleneck, it is expected to
increase every year. Expecting around 500 varieties of attacks in year 2000, it
can be several fold higher now.
The stability of this density-dependent model is determined by the attack increase rate a and also by the detector searching efficiency d as shown in Figure
C.4 with k = 500. If the detector is extremely efficient with a large value of d,
Appendix C
245
2.5
4
x 10 ATTACKDETECTOR RELATIONSHIP
A(t)
D(t)
1.5
0.5
3
Time
then it is expected to hold the attacks below their carrying capacity. The dynamics are determined most strongly by the unstable attack-detector interactions. If
the detector is inefficient with a small value of d, attack dynamics are determined largely by the density-dependent feedback. Thus the density-dependent
attack growth rate can be applied to populations facing limited resources, a
situation that is unlikely to occur in successful cases of stable systems where
equilibria occur at very low levels where resources are not limited. Thus the
detections and the increase rate can be considered as a means of stabilizing the
interactions.
C.2.3 Stability in attack-detector model
With Dt detectors searching for At attacks, the ones that are not detected or survive patch fixing, along with the new attacks generated in the interval between
t and t + 1 are given by At+1 . Similarly, there will be detectors that learn from
the attacks detected and also there will be detectors that remain effective even
with new vulnerabilities or service closure at any point of time and hence Dt+1
denote the number of detectors at any time t + 1. This is a cyclic pickup of attacks and detectors and hence is oscillatory in nature without any over damping
as such.
It is shown in section 2.5.2 that simple detector models when aggregated in
a network of high attack density contribute to the stability of an attack-detector
Appendix C
246
interaction. For the detection of external intrusion activities, if there are multiple
paths to the Internet, an IDS needs to be present at every entry point, whereas
for the detection of internal intrusion activities, an IDS is required in every
network segment. This specifies the broadest solution of advanced intrusion
detection technology to provide ubiquitous coverage through individual IDSs
spread everywhere on the network. The success of a security system depends
on the detector or the security measures in reducing the attack population and
maintaining it at a new lower level in a stable interaction. These equilibrium
levels depend on the following two factors:
1. the effective rate of increase of the attack unaffected by detection.
2. the average proportion of the attacks detected, which in turn depends on
the number of detectors and all factors affecting the searching efficiency
( ANt De t ).
The IDSs that are likely to stabilize the attack population at low levels have the
following characteristics:
high intrinsic searching efficiency
small attack handling time
detector interference to a certain level
high level of detector aggregation using the techniques of sensor fusion
C.2.4 Inclusion of stealthy attacks
If both the attack and the detector were randomly and independently distributed
in Nicholson-Bailey fashion, then the proportion of the attacks escaping detection at time t is given by edDt . If a proportion b of the attacks that are at the
risk of detection are allowed to hide themselves, for example, with the fragmentation of packets or even tunneling, then the proportion of the attacks escaping
detection is raised to:
edDt + b(1 edDt ) for 0 b 1
Appendix C
247
The equations for the number of attacks and detectors at any instant of time
t are:
At+1 = aAt (b + (1 b)exp(dDt )
and
Dt+1 = At At+1 /a
The equilibrium
of the above equation are as:
h solutions
i
a(1b)
dD = ln 1ab
and
a
A = a1
D .
It is necessary for the value of b to lie between 0 and 1 for a solution to exist.
These solutions are stable against small disturbances. When b tends towards
zero, the system resembles the Nicholson-Bailey model.
C.2.5 Modeling of non-random attacks and detection
The random search is a mathematically convenient assumption, but not a realistic one. When a network intrusion happens, the sequence of attacks does not
take place in a totally random order. Intruders come with a set of tools trying
to achieve a specific goal. The selection of the cracking tools and the order of
application depends heavily on the situation as well as the responses from the
targeted system. Typically there are multiple ways to invade a system. Nevertheless, it usually requires several actions/tools to be applied in a particular
logical order to launch a sequence of effective attacks to achieve a particular
goal. It is this logical partial order that reveals the short and long-term goals
of the invasion. Random search is an exception rather than a rule. Real attacks
are likely to be distributed in a patchwork of high and low densities, and the
detectors can be expected to respond to the attacks by orienting towards high
density patches. It is natural that the detector searches for certain traffic features
for signs of attack. This provides a strong selective advantage for detectors that
result in a more focused search. The modeling of non-random behavior of the
attack-detector interactions is provided in detail in chapter 2.
Appendix C
248
C.3 Summary
The modeling shows the restricted growth rates of both the attacks as well as its
detection. With the existing IDSs it is not possible to attain a growth rate so that
the effect of attacks is not felt in the information systems. Hence, it is required
to look at advanced techniques for performance enhancement of the available
IDSs. The level of severity of the alert is understood with this modeling. This
knowledge could then potentially be used by a security analyst to understand
and respond more effectively to future intrusions. As seen from the model, the
existing as well as emerging attacks are not expected to totally evade the detectors monitoring the network. The modeling is realistic in an environment of
network with multiple IDSs for protection, looking at the system as a whole,
instead of the individual responses to an attack. For more proactive defense, it
is essential to understand the network defensive and offensive strategies. With
the attack-detector scenario better understood, the future evolution of attacks
can be estimated in a certain way thereby aiding better attack detection and in
turn reduced false negatives. This knowledge helps the security community to
become proactive rather than reactive with respect to incident response.
Appendix D
Methodology for Evaluation of Intrusion
Detection Systems
Make everything as simple as possible, but not simpler.
Albert Einstein
D.1 Introduction
The poor understanding of the performance of Intrusion Detection Systems
available in literature may be in-part caused by the shortage of an effective,
unbiased evaluation and testing methodology that is both scientifically rigorous
and technically feasible. The choice of intrusion detection systems for a particular environment is a general problem, more concisely stated as the intrusion
detection evaluation problem, and its solution usually depends on several factors. The most basic of these factors are the false alarm rate and the detection
rate, and their tradeoff can be intuitively analyzed with the help of the Receiver
Operating Characteristic (ROC) curve [14], [57], [12], [58], [59]. However, as
pointed out by the earlier investigators [21] [60] [61], the information provided
by the detection rate and the false alarm rate alone might not be enough to provide a good evaluation of the performance of an IDS. Hence, the evaluation
metrics need to consider the environment the IDS is going to operate in, such as
the maintenance costs and the hostility of the operating environment (the likelihood of an attack). In an effort to provide such an evaluation method, several
performance metrics such as Bayesian detection rate [21], expected cost [60],
sensitivity [62] and intrusion detection capability [63], have been proposed in
249
Appendix D
250
literature. These metrics usually assume the knowledge of some uncertain parameters like the likelihood of an attack, or the costs of false alarms and missed
detections. Yet despite the fact that each of these performance metrics makes
their own contribution to the analysis of intrusion detection systems, they are
rarely applied in the literature when proposing a new IDS.
This Appendix introduces a framework for evaluating IDSs along with some
new metrics for IDS evaluation. Classification accuracy in intrusion detection
systems deals with such fundamental problems as how to compare two or more
IDSs, how to evaluate the performance of an IDS, and how to determine the
best configuration of an IDS. In an effort to analyze and solve these related
problems, evaluation metrics such as Area Under ROC Curve, precision, recall, and F-score, have been introduced. Additionally, we introduce the P-test,
which is more of an intuitive way of comparing two IDSs and also more relevant to intrusion detection evaluation problem. We also introduce a formal
framework for reasoning about the performance of an IDS and the proposed
metrics against adaptive adversaries. We provide simulations and experimental
results with these metrics using the real-world network traffic and the DARPA
1999 data set in order to illustrate the benefits of the proposed algorithms in the
chapters five to nine.
Appendix D
251
T P +T N
T P +F P +T N +F N
Appendix D
252
Overall Accuracy is not a good metric for comparison in the case of network
traffic data since the true negatives abound.
D.2.5 Precision
Precision (P ) is a measure of what fraction of test data detected as attack is
actually from the attack class.
P =
TP
T P +F P
D.2.6 Recall
Recall (R) is a measure of what fraction of attack class is correctly detected.
R=
TP
T P +F N
There is a trade-off between the two metrics precision and recall. As the number of detections increase by lowering of the threshold, the recall will increase,
while precision is expected to decrease. A plot showing the recall-precision
characterization of a particular IDS is used to analyze the relative and absolute
performance of an IDS over a range of operating conditions.
D.2.7 F-score
F-score scores the balance between precision and recall. The F-score is a measure of the accuracy of a test. The F-score can be considered as the harmonic
mean of recall and precision, and is given as:
F -score =
2P R
P +R
The standard measures, namely, precision, recall, and F-score are grounded on a
probabilistic framework and hence allows one to take into account the intrinsic
variability of performance estimation. The comparison of IDSs with the metric
F-score has the limitation in directly applying tests of significance to it in order
Appendix D
253
to determine the confidence level of the comparison. The primary goal was to
achieve improvement in both precision as well as recall, and hence P-test [110]
was used for IDS comparison.
D.2.8 P-test
To compare two IDS X and Y , let (RX , PX ) and (RY , PY ) be the values of recall and precision with respect to attack respectively. Let IDS X and Y predict
NXP os and NYP os positives respectively and N P os be the total number of positives
in the test sample. Then the P-test is applied as follows:
ZR =
RX RY
2R(1R)/N P os
ZP =
where R =
PX PY
P os +1/N P os )
2P (1P )(1/NX
Y
RX +RY
2
and P =
P os
NX
PX +NYP os PY
P os +N P os
NX
Y
Appendix D
254
RX RY and PX PY , then X Y .
It may so happen that one metric is significantly better and the other metric
is significantly worse. In such cases of conflict, the non-probabilistic metric
F-score can be used instead of applying the significance test.
Appendix D
255
attack with high detection rate since R2L attack normally exploits the application layer. Other than the diversity of the chosen IDSs, yet another reason for
the choice of the two anomaly detectors PHAD and ALAD was the acceptably
low false alarm rates. Snort is an open source network intrusion prevention and
detection system utilizing a rule-driven language, which combines the benefits
of signature, protocol and anomaly based inspection methods. Snort is the most
widely deployed intrusion detection and prevention technology worldwide and
has become the de facto standard for the industry. Snort is efficient in detecting
the DoS attacks and the U2R attacks with high detection rate.
D.4 Summary
In an effort to analyze and solve the IDS evaluation problems identified in this
thesis, evaluation metrics such as Area Under ROC Curve, precision, recall, and
F-score have been introduced in this appendix. Additionally, the P-test, which
is more of an intuitive way of comparing two IDSs and also more relevant to
intrusion detection evaluation problem has been included. The metrics used for
IDS evaluation like F-score and P-test are highly effective for a perfect comparison of IDSs.
References
[1] M. McLuhan, Letters of Marshall McLuhan, Oxford University Press,
1987, pp. 254.
[2] Internet Domain Survey Host Count, https://www.isc.org/solutions/survey
[3] J. McHugh, A. Christie, J.Allen, Defending Yourself: The Role of Intrusion Detection Systems, IEEE software, Sep/Oct. 2000.
[4] Losses due to cyber crime can be as high as $40 billion, Business Line,
Business Daily from THE HINDU group of publications, Monday, May21,
2007.
[5] CSI/FBI
Computer
Crime
http://www.gocsi.comipress/20020407
and
Security
Survey,
256
REFERENCES
257
[10] C. Ko, M. Ruschitzka, K. Levitt, Execution Monitoring of SecurityCritical Programs in Distributed Systems: A Specification-based Approach, In Proceedings of the 1997 IEEE Symposium on Security and
Privacy, pp. 175-187, May 1997.
[11] D. Wagner, D. Dean, Intrusion Detection via Static Analysis, In Proceedings of the IEEE Symposium on Security and Privacy, IEEE Press, 2001.
[12] C. Warrender, S. Forrest, B.A. Pearlmutter, Detecting intrusions using system calls: Alternative data models, In IEEE Symposium on Security and
Privacy, pages 133-145, 1999.
[13] DEF CON 8 conference. Las Vegas, NV, 2000. www.defcon.org
[14] W. Lee, S.J. Stolfo, P.K. Chan, E. Eskin, W. Fan, M. Miller, S. Hershkop, and J. Zhang, Real time data mining-based intrusion detection. In
Proc. Second DARPA Information Survivability Conference and Exposition, IEEE Computer Society, pp. 85100.
[15] S. Kumar, Classification and Detection of Computer Intrusions, PhD thesis, West Lafayette, IN: Purdue University, Computer Sciences, 1995.
[16] T.D. Lane, Machine Learning Techniques for the computer security domain of anomaly detection, Ph. D. thesis, Purdue Univ., West Lafayette,
IN, 2000.
[17] F. Neri, Comparing local search with respect to genetic evolution to detect
intrusion in computer networks, In Proc. of the 2000 Congress on Evolutionary Computation CEC00, IEEE Press, pp. 238243
[18] P.K. Chan, S. Stolfo, Toward parallel and distributed learning by metalearning, In Working Notes AAAI Work, Knowledge Discovery in
Databases, Portland, OR, pp. 227240, AAAI Press, 1993.
[19] A. L. Prodromidis, and S. J. Stolfo, Cost complexity-based pruning of
ensemble classifiers, Knowledge and Information Systems 3(4), 449469,
2001.
REFERENCES
258
REFERENCES
259
REFERENCES
[42] DARPA
intrusion
http://www.ll.mit.edu/IST/ideval/data/
data index.html
260
detection
evaluation,
[43] W. Lee, S.J.Stolfo, A Data Mining framework for building intrusion detection models, IEEE Symposium on Security and Privacy, 1999.
[44] T. Lane, C.E. Brodley, Temporal sequence learning and data reduction for
anomaly detection, ACM Trans. Inform. Syst. Secur. 2 (3), 1999.
[45] M. Thottan, C. Ji, Anomaly detection in IP networks, IEEE Trans. Signal
Processing 51 (8) (2003) 21912204.
[46] S. Jin, D. Yeung, A covariance analysis model for DDoS attack detection, IEEE International Communication Conference (ICC04), vol. 4, June
2004, pp. 2024.
[47] S. Jina, D. S. Yeunga, XizhaoWangb, Network intrusion detection in covariance feature space, Pattern Recognition, vol.40, pp 2185-2197, 2007.
[48] DARPA intrusion detection evaluation, http://www.ll.mit.edu/IST/ideval/
[49] S. Axelsson. The base-rate fallacy and its implications for the difficulty of
intrusion detection. In Proceedings of the 6th ACMConference on Computer and Communications Security (CCS 99), pages 17, November 1999.
[50] D.E. Denning, Information Warfare and Security, Addison Wesley, 1999.
[51] W. Lee, W. Fan, M. Miller, S. Stolfo, E. Zadok, Toward cost-sensitive
modeling for intrusion detection and response, Technical report CUCS002-00, Computer Science, Columbia University, 2000.
[52] Elkan, C., Results of the KDD99 classifier learning, SIGKDD Explorations, Vol. 1, Issue 2, pp. 63-64, Jan 2000.
[53] CERT report of vulnerabilities, http://www.cert.org/stats/cert stats.htm/
#vulnerabilities
[54] T. Shimeall, P. Williams, Models of Information Security Trend Analysis,
www.cert.org/archive/pdf/info-security.pdf
REFERENCES
261
REFERENCES
262
REFERENCES
263
REFERENCES
264
[87] P.J. Nahin, J.L. Pokoski, NCTR Plus Sensor Fusion Equals IFFN or can
Two Plus Two Equal Five?, IEEE Transactions on Aerospace and Electronic Systems,vol. AES-16, 3, pp.320-337, 1980.
[88] S.C.A. Thomopoulos, R. Vishwanathan, D.C. Bougoulias, Optimal decision fusion in multiple sensor systems, IEEE Transactions on Aerospace
and Electronics Systems, vol. 23, 5, pp.644-651, 1987.
[89] W. Baek and S. Bommareddy, Optimal m-ary data fusion with distributed
sensors, IEEE Transactions on Aerospace and Electronics Systems, vol.31,
3, pp.1150-1152, 1995.
[90] V. Aalo, R. Viswanathan, On distributed detection with correlated sensors:
Two examples, IEEE Trans. Aerospace Electron. Syst., vol.25, pp.414421.
[91] E. Drakopoulos, C.C. Lee, Optimum multisensor fusion of correlated local, IEEE Trans. Aerospace Electron. Syst., vol.27, pp.593-606.
[92] M. Kam, Q. Zhu, W. Gray, Optimal data fusion of correlated local decisions in multiple sensor detection systems. IEEE Trans. Aerospace Electron. Syst., vol.28, pp.916-920.
[93] R. Blum, S. Kassam, H. Poor, Distributed detection with multiple sensors
- Part II: Advanced topics, Proceedings of IEEE, pp.64-79.
[94] T. Bass, Multisensor Data Fusion for Next Generation Distributed Intrusion Detection Systems, IRIS National Symposium, 1999.
[95] G. Giacinto, F. Roli, L. Didaci, Fusion of multiple Classifiers for Intrusion Detection in Computer Networks, Pattern Recognition Letters, 24,
pp. 1795-1803, 2003.
[96] Y. Wang, H. Yang, X.Wang, R. Zhang, Distributed intrusion detection
system based on data fusion method, Intelligent control and automation,
WCICA 2004.
[97] W. Hu, J. Li, Q. Gao, Intrusion Detection Engine on Dempster-Shafers
Theory of Evidence, Proceedings of International Conference on Communications , Circuits and Systems, vol.3, pp. 1627-1631, Jun 2006.
REFERENCES
265
[98] A. Siraj, R.B. Vaughn, S.M. Bridges, Intrusion Sensor Data Fusion in an
Intelligent Intrusion Detection System Architecture, Proceedings of the
37th Hawaii international Conference on System Sciences, 2004.
[99] R. Perdisci, G. Giacinto, F. Roli, Alarm clustering for intrusion detection systems in computer networks, Engg. applications of Artificial intelligence, Elsevier publications, March 2006.
[100] A. Valdes, K. Skinner, Probabilistic alert correlation, Springer Verlag
Lecture notes in Computer Science, 2001.
[101] O.M. Dain, R.K. Cunningham, Building Scenarios from a Heterogeneous
Alert Stream, IEEE Workshop on Information Assurance and Security,
2001.
[102] F. Cuppens, A. Miege, Alert correlation in a cooperative intrusion detection framework, Proceedings of the 2002 IEEE symposium on security and
privacy, 2002.
[103] B. Morin, H. Debar, Correlation of Intrusion Symptoms : an Application
of Chronicles, RAID 2003.
[104] H. Debar, A. Wespi, Aggregation and Correlation of Intrusion-Detection
Alerts, RAID 2001.
[105] F. Valeur, G. Vigna, C. Kruegel, R. Kemmerer, A Comprehensive Approach to Intrusion Detection Alert Correlation, In IEEE Transactions on
Dependable and Secure Computing, 2004.
[106] H. Wu, M. Seigel, R. Stiefelhagen, J. Yang, Sensor Fusion using
Dempster-Shafer Theory, IEEE Instrumentation and Measurement Technology Conference, 2002.
[107] M. Zhu, S. Ding, R. R. Brooks, Q. Wu, S. S. Iyengar, N. S. V. Rao, Decision making-based multiple sensor data Fusion, Report, US Department
of Energy.
[108] rfp@wiretrip.net/libwhisker
REFERENCES
266
[109] R.C. Holte, N. Japkowicz, C.X. Ling, Learning from imbalanced data
sets, Technical Report WS-00-05, AAAI Press, Menlo Park, CA.
[110] R. Agarwal, M.V. Joshi, PNrule: A new framework for learning classifier
models in data mining (a case-study in network intrusion detection), Tech.
Rep. RC 21719, IBM Research report, Computer Science/Mathematics,
2000.
[111] Lippmann, R.P., An introduction to computing with Neural Nets, IEEE
ASSP Magazine, Vol.4, pp. 4-22, April 1987.
[112] G. Shafer, A Mathematical Theory of Evidence, Princeton University
Press.
[113] G. Shafer, Perspectives on the theory and practice of belief functions,
International Journal of Approximate Reasoning 31-40, 1990.
[114] P. Smets, What is Dempster-Shafers model? in Advances in the
Dempster-Shafer theory of evidence Pages:5-34, John Wiley Sons, 1994,
iridia.ulb.ac.be/ psmets/WhatIsDS.pdf
[115] G. Pasi, R. R. Yager, Modeling the concept of majority opinion in group
decision making, Information Sciences 176, 390414, 2006.
[116] R. R. Yager, On the determination of strength of belief for decision support under uncertaintyPart II: fusing strengths of belief, Fuzzy Sets and
Systems 142, 129142, 2004.
[117] P. Smets, The combination of evidence in the transferable belief
model, IEEE Transactions on pattern analysis and machine intelligence,
12(5):447458, May 1990.
[118] D. Yonga,S. WenKanga, Z. ZhenFub, L. Qi, Combining belief functions
based on distance of evidence, Science Direct, Volume 38, Issue 3, Pages
489-493,Dec.2004.
[119] R.R. Tenney, N. R. Sandel, Detection and distributed sensors, IEEE
Trans. Aerospace Electronic Systems 23(4), 501509, 1981.
REFERENCES
267
[120] J.D. Howard, An analysis of security incidents on the Internet, 19891995, PhD thesis, Carnegie Mellon University, Department of Engineering
and Public Policy, April 1997.
[121] www.cert.org/research/JHThesis/table of contents.html
[122] U. Lindqvist, E. Jonsson, How to systematically classify computer security intrusions, IEEE Symposium on Security and Privacy, p. 154163, Los
Alamitos, CA, 1997.
[123] D.J. Weber, A taxonomy of computer intrusions. Masters thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, June 1998.
[124] Chris Rodgers. Threats to TCP/IP Network Security. 2001.
[125] G. Alvarez, S. Petrovic, A new taxonomy of web attacks suitable for
efficient encoding, Computers and Security, 22(5): p. 435449, July 2003.
[126] M.A. Bishop, A taxonomy of Unix and network security vulnerabilities,
Technical report, Department of Computer Science, University of California at Davis, May 1995.
[127] I.V. Krsul, Software Vulnerability Analysis. PhD thesis, Comp. Sci.
Dept., Purdue University, May 1998.
[128] C.E. Landwehr, A.R. Bull, A taxonomy of computer program security
flaws, with examples, ACM Computing Surveys, 26(3), pp. 211254, 1994.
[129] J. Korba, Windows NT Attacks for the Evaluation of Intrusion Detection
Systems, M. Eng. Thesis, MIT Department of Electrical Engineering and
Computer Science, June 2000.
[130] A. Baker, J.B. Beale, Snort 2.1 Intrusion Detection (Second Edition)
pp.751, 2004.
[131] Xerox Palo Alto Research Center, Parc history, 2003, http://www.p
arc.xerox.com/ about/history/default.html
[132] Eugene Spafford. The InternetWorm Program: An Analysis. Technical
report, Department of Computer Sciences, Purdue University, 1988.
REFERENCES
268
[133] Fred Cohen. Computer Viruses. PhD thesis, University of Southern California, 1985.
[134] CERT Coordination Center. Advisory CA-2001-19 Code Red Worm
Exploiting Buffer Overflow In IIS Indexing Service DLL. July 2001.
http://www.cert.org/advisories/ CA-2001-19.html.
[135] CERT Coordination Center. Advisory CA-2001-26 NimdaWorm.
September 2001. http://www. cert.org/advisories/CA-2001-26.html.
[136] CERT Coordination Center. Advisory CA-2003-04 MS-SQL Server
Worm. January 2003. http://www.cert.org/advisories/CA-2003-04.html.
[137] CERT Coordination Center. Advisory CA-2003-20 W32/Blaster Worm.
August 2003. http://www.cert.org/advisories/CA-2003-20.html.
[138] Top
Ten
Cyber
Security
http://www.sans.org/2008menaces/
Menaces
for
2008,
[139] Symantec.
Symantec
Internet
Security
Threat
Report
Volume
III.
February
2003.
http://enterprisesecurity.symantec.com/content.cfm?articleid=1539EID=0
[140] Symantec.
Symantec
Internet
Security
Threat
Report
Volume
IV.
September
2003.
http://enterprisesecurity.symantec.com/content.cfm?articleid=1539EID=0
[141] Icove, David, Seger, VonStorch, A Crimefighters Handbook, OReilly &
Associates, 1995.
[142] http://www.cert.org/research/JHThesis/chapter6.html
[143] D.E. Denning, Cyberterrorism, http://www.cs.georgetown.edu/denning/infosec/
cyberterror.html
[144] 2004 web Server Intrusion Statistics, www.zone-h.org
[145] CERT Coordination Center. Advisory CA-1999-04 Melissa Macro Virus.
March 1999. http://www.cert.org/advisories/CA-1999-04.html
REFERENCES
269
REFERENCES
270
REFERENCES
271
[168] Biswanath Mukherjee, L. Todd Heberlein, Karl N. Levitt, Network Intrusion Detection, IEEE Network, May/June 1994.
[169] May Grance, The DIDS (Distributed Intrusion Detection System) prototype, Proceedings of the Summer USENIX Conference, 227-233, San
Antonio, Texas, 8-12 June 1992.
[170] Herve Debar, Marc Dacier and Andreas Wespi, Towards a taxonomy of
Intrusion-Detection Systems, Computer Networks, 31(8): 805-822, April
1999.
[171] A. Abraham, Evolutionary Computation in Intelligent Network Management, Evolutionary Computing in Data Mining, Springer, pp. 189.210,
2004.
[172] J. L. Zhao, J. F. Zhao, and J. J. Li, Intrusion Detection Based on Clustering Genetic Algorithm, International Conference on Machine Learning
and Cybernetics IEEE, Guangzhou, pp. 3911-3914, 2005.
[173] R. H. Gong, M. Zulkernine, and Purang, A software Implementation
of a Genetic Algorithm Based Approach to Network Intrusion Detection,
SNPD/SAWNf05, IEEE, 2005.
[174] Dong Seong Kim, Ha-Nam Nguyen, Jong Sou Park, Genetic Algorithm
to Improve SVM Based Network Intrusion Detection System, AINA05,
IEEE, 2005.
[175] A. Abraham and C. Grosan: Evolving Intrusion Detection Systems, Studies in Computational Intelligence (SCI) 13, 57-79, 2006
[176] Sundaram A.,
An
http://www.acm.org 2001.
Introduction
to
Intrusion
Detection,
[177] Koral Ilgun, Richard A Kemmerer, and Phillip A. Porras, State Transition
Analysis: A rule-based Intrusion Detection Approach, IEEE Transactions
on Software Engineering, 21(3): 181-199, March 1995.
[178] Verwoerd T., and Hunt R. Intrusion Detection Techniques and Approaches, 2001, http://www. Elsevier.com
REFERENCES
272
A survey,
REFERENCES
273
[191] John E. Dickerson, Jukka Juslin, Ourania Koukousoula, Julie A. Dickerson, Fuzzy Intrusion Detection, Proceedings: IFSA World Congress
and 20th North American Conference Fuzzy Intrusion Detection,2001,
http://ieeexplore.ieee.org/iel5/7506/20427/00943772.pdf
[192] Dipankar Dasgupta and Fabio Gonzalez, An Immunity-Based Technique
to Characterize Intrusions in Computer Networks, IEEE Transactions on
Evolutionary Computation, Vol. 6, No. 3, June 2002.
[193] Susan M. Bridges and Rayford B. Vaughn, Intrusion Detection via Fuzzy
Data Mining, The Twelfth Annual Canadian Information Technology Security Symposium June 19-23, 2000.
[194] Kristopher Kendall, A Database of Computer attacks for the evaluation
of intrusion detection systems, Thesis Report, Department of Electrical
Engineering and Computer Science at MIT, June 1999.
[195] Shieh et al., A pattern oriented ID model and its applications, Proceedings
of the Symposium on security and privacy, 1991.
[196] Ilgun, Koral, USTAT:a real time IDS for Unix, Proceedings of the 1993
IEEE Computer Society Symposium on research insecurity and privacy,
1993.
[197] Christoph, Gray G, UNICORN: misuse detection for UNICOS, Proceedings of the 1995 ACM/IEEE Supercomputing Conference, Dec. 1995.
[198] White, Gregory B, PEER-based hardware protocol for IDSs, Journal of
Engineerng and Applied Science, v 2, 1996.
[199] Bonifacio, Jose Mauricio, Neural Networks applied in IDSs, IEEE International Conference on neural networks, 1998.
[200] Paxson,Vern, Bro:A system for detecting network intruders in rel-time,
Computer Network, v 31, n 23, Dec 1999.
[201] Ning, Wang X.S, Jajodia S, Modelling requests among cooperating IDSs,
Computer Communications, v 23, n 17, Nov, 2000.
REFERENCES
274
[202] Dickerson, John E, Fuzzy network profiling for intrusion detection, Annual Conference of the North American Fuzzy information processing society, 2000.
[203] Luo, JianXiong, Mining Fuzzy association rules and fuzzy frequent
episodes for intrusion detection, International Journal of Intelligent systems, v15, n 8, Aug, 2000.
[204] Andrew P. Kosoresow and Steven A. Hofmeyr, Intrusion Detection via
System Call Traces, IEEE Software, 14(5), pp. 24-42, September /oct
1997.
[205] Aviv Bergman Intrusion Detection with Neural Networks Technical Report, SRI International, Number A012, February 1993.
[206] D. Gunetti and G. Ruffo, Intrusion Detection through Behavioral Data,
Proc. of The Third Symposium on Intelligent Data Analysis, Lecture Notes
in Computer Science, Springer-Verlag, 1999.
[207] Gunar E. Liepins and H. S. Vaccaro Intrusion Detection: Its Role and
Validation Computers & Security, 11(4), pp. 347-355, 1992.
[208] Guy Helmer and Johnny Wong and Vasant Honavar and Les Miller, Feature Selection Using a Genetic Algorithm for Intrusion Detection, Proceedings of the Genetic and Evolutionary Computation Conference, Vol.
2, p. 1781, Morgan Kaufmann, 13-17 July 1999.
[209] Jake Ryan and Meng-Jang Lin and Risto Miikkulainen, Intrusion Detection with Neural Networks, Advances in Neural Information Processing
Systems 10 (Proceedings of NIPS97, Denver, CO), MIT Press, 1998.
[210] K. Ilgun, USTAT : A Real-Time Intrusion Detection System for UNIX,
Proceedings of the IEEE Symposium on Security and Privacy, pp. 16-29,
1993.
[211] Koral Ilgun and Richard A. Kemmerer and Phillip A. Porras, State Transition Analysis: A Rule-Based Intrusion Detection Approach, IEEE Transactions on Software Engineering, 21(3), pp. 181-199, March 1995.
REFERENCES
275
November 15,
1999,
[218] Peter
Mell,
Vincent
Hu
,
Richard
Lippmann,
An
Overview of Issues in Testing Intrusion Detection Systems,
http://csrc.nist.gov/publications/nistir/nistir-7007.pdf
[219] CERT report of vulnerabilities, http://www.cert.org/stats/cert stats.htm/
#vulnerabilities
[220] Zhu, Ding, Brooks, Wu, Iyengar, Rao, Decision making-based multiple
sensor data Fusion, Report, US Department of Energy.
[221] S. Axelsson, A preliminary attempt to apply deetction and estimation
theory to intrusion detection, Technical report 00-4, Chalmers University
of Technology, Goteborg, Sweden.
[222] W. Lee, S. J. Stolfo, Data mining approaches for intrusion detection, In
Proc of 7th USENIX security symposium, San Antonio, TX. USENIX.
REFERENCES
[223] B.
Tung,
Common
http://www.isi.edu/gost/cidf/
276
intrusion
detection
framework,
[224] P. Ning, X. S. Wang, and S. Jajodia, Modeling requests among cooperating intrusion detection systems, Computer Communications 23(17),
17021716.
[225] J.R. Winkler, and W.J. Page, Intrusion and anomaly detection in trusted
systems, In Fifth Annual Computer Security Applications Conf., 1989,
Tucson, AZ, pp. 3945.
[226] S.R. Snapp, J. Brentano, G. V. Dias, T. L. Goan, T. Grance, L. T. Heberlein, C. lin Ho, K. N. Levitt, B. Mukherjee, D. Mansur, K. L. Pon, and
S. E. Smaha, A system for distributed intrusion detection, In COMPCON
Spring 91, Digest of Papers, San Francisco, pp. 170176.
[227] NIDES: Next-generation Intrusion
http://www.sdl. sri.com/projects/nides/
Detection
Expert
System,
[228] P.A. Porras, and P. G.Neumann, EMERALD: conceptual overview statement. http://www.sdl.sri.com/papers/emerald-position1/
[229] P.A. Porras, and A. Valdes, Live traffic analysis of TCP/IP gateways, In
Proc. of the 1998 ISOC Symp. on Network and Distributed Systems Security (NDSS98), San Diego, 1998.
[230] P. A. Porras (1999, April). Experience with EMERALD to date. In
First USENIX Workshop on Intrusion Detection and Network Monitoring, Santa Clara, CA, pp. 7380.
[231] Internet Security Systems, RealSecure SiteProtector,
http://www.iss.net/products services/enterprise protection/
rssite protector/siteprotector.php
[232] Symantec Enterprise Solutions: ManHunt.http://enterprisesecurity,
symantec.com/products/products.cfm?ProductID=156.
[233] nSecure Software, nSecure nPatrol, http://www.nsecure.net/features.htm
REFERENCES
277
REFERENCES
278