You are on page 1of 304

i

Performance Enhancement of Intrusion Detection


Systems using Advances in Sensor Fusion

A T HESIS
S UBMITTED F OR THE D EGREE OF
D OCTOR OF P HILOSOPHY
IN THE FACULTY OF E NGINEERING

by
Ciza Thomas

Supercomputer Education and Research Centre


Indian Institute of Science
BANGALORE 560 012
April 2009

c
Ciza
Thomas
April 2009
All rights reserved

DEDICATED WITH EXTREME AFFECTION AND GRATITUDE TO


my parents Mr. M.C. Thomas and Mrs. Accamma Thomas
my husband Dr. T. John Tharakan
my kids Alka and Alin
and
my research supervisor Prof. N. Balakrishnan

Acknowledgements
The endless thanks goes to Lord Almighty for all the blessings he has showered onto me, which has enabled me to write this last note in my research work.
During the period of my research, as in the rest of my life, I have been blessed
by Almighty with some extraordinary people who have spun a web of support
around me. Words can never be enough in expressing how grateful I am to those
incredible people in my life who made this thesis possible. I would like an attempt to Thank them for making my time during my research in the Institute a
period I will treasure.
I am deeply indebted to my research supervisor, Professor N. Balakrishnan for
presenting me such an interesting thesis topic. Each meeting with him added
invaluable aspects to the implementation and broadened my perspective. He has
guided me with his invaluable suggestions, lightened up the way in my darkest
times and encouraged me a lot in the academic life. From him I have learned to
think critically, to select problems, to solve them and to present their solutions.
I would like to thank him for furthering my education in many subjects like
probability theory, network security, pattern recognition and machine learning.
He has given me the best training within the country and even abroad by sending me to Cylab and CERT of CMU, Pittsburg, US. It was more than I had ever
hoped for in my research life. His drive for scientific excellence has pushed me
to aspire for the same (but could never achieve it). It was a great pleasure for
me to have a chance of working with him. He was the best choice I could have
made for an advisor. Sometime we are just (or incredibly) lucky!!!
I consider it a great honor to have been part of the MMSL lab, SERC department of the Indian Institute of Science and I salute the efforts of my Professor
for his support to the nation in many many ways. I would also like to mention
ii

iii

my deep gratitude towards Prof. R. Govindarajan, Chairman, SERC for all the
support provided to me while I was a student in the department.
I will be failing in my duty if I dont acknowledge some of my friends in the
campus with whom I have shared my research experiences since it was a joy
and enlightenment to me. I am fortunate to have a friend like Sharmili Roy
who has opened her heart and her problems to me in turn motivating me many
a times with her extraordinary brilliance and analytical perceptions. Ms. Neeta
Trivedi has helped me through the totally alien landscape of writing any document correctly. Suneesh S.S. has been a caring friend who has helped me at
times of troubles in the campus.
I would like to address special thanks to the unknown reviewers of my thesis, for accepting to read and review this thesis. I wish to thank the authors,
developers and maintainers of the open source used in this work. I would like to
appreciate all the researchers whose works I have used, initially in understanding my field of research and later for updates. I would like to thank the many
people who have taught me starting with my school teachers, my undergraduate
teachers, and my graduate teachers and examiners especially Prof. Joy Kuri,
Prof. Vital Rao, Prof. Veni Madhavan, Prof. Anurag Kumar, and Prof. Mathew
Jacob.
It is with sincere gratitude that I wish to thank Prof. K.R. Ramakrishnan, Prof.S.
M. Rao, Prof. S.K. Sen, and Prof. K. Gopakumar for the caring they have provided. I consider it a great privilege to have associated with some great Professors in my field of research namely Prof. Raj Reddy, Dr. B.V Dasarathy, Prof.
Dorothy Denning, Prof. P.K.Chan, Prof. Mathew Mahoney, CERT cylab group,
Dr.Athithan, and Mr. Philip and I appreciate the help rendered by them. Prof.
Jyothi Balakrishnan, Mr. Murali, and Ms. Reshmi need a special mention in
this acknowledgement for being particularly supportive during times of need. I
feel obliged to say thank you one more time. I also like to express my gratitude
to Dr. Latha Christy, whose thoughtful advise when I was away from my kids
during the days of my research, served to give me a sense of direction during
my PhD studies. I wish to thank Mr. Vishwas Sharma, Dr. G. Ravindra, Dr.J.

iv

Sujatha, Ms. K. Nagarthna, Ms. Swarna, Mr. Ravikumar, Mr. Sasikumar, Mr.
Sekhar, SERC security staff and a few others who have in some way or the other
helped me at various stages during my research life. And then, there are all the
other people who are not mentioned here but has helped in making IISc a very
special place over all these years. I would also like to thank my employers,
the Director of Technical Education, Govt. of Kerala, Principal, and Head of
the Department for the support and encouragement extended to me during my
period of research in the Institute.
I would express my deep sense of gratitude to the affection and support shown
to me by my parents-inlaw. My father-inlaw could not see me reach this stage of
my research and I acknowledge him in front of his memory. I take this opportunity to dedicate this work to my parents who have made me what I am, husband
and children who have given consistent support throughout my research, and
my guide who had a vision in my research work. I learnt to aspire to a career
in research from my parents in my childhood, and later from my husband. My
parents have passed onto me a wonderful humanitarian lineage, whose value
cannot be measured by any worldly yardstick. The warmest of thanks to my
husband Dr. T. John Tharakan for his understanding and patience while I was
far away from home during the period of my research in the Institute. He has
supported me in each and every way, believed in me permanently and inspired
me in all dimensions of life. I am blessed with two wonderful kids Alka and
Alin, who knew only to encourage and never did complain about anything even
when they had to suffer a lot in my absence over these years. I owe everything
to them, without their everlasting love, this thesis would never be completed.
To you all, I dedicate this work.
All of you made it possible for me to reach this last stage of my endeavor.
Thank You from my heart-of-hearts.
Ciza Thomas

Publications based on this Thesis


International Journal Publications
1. Ciza Thomas and N. Balakrishnan, Improvement in Intrusion Detection
with Advances in Sensor Fusion, To appear in the IEEE Transactions on
Information Forensics and Security.
2. Ciza Thomas and N. Balakrishnan, Performance Enhancement in Attack
Detection with Skewness in Network Traffic, International Journal on Information Fusion (under review).
3. Ciza Thomas and N. Balakrishnan, Data-dependent Decision Fusion of Intrusion Detection Systems using Modified Evidence Theory, IEEE Transactions on Information Forensics and Security (under review).
4. Ciza Thomas and N. Balakrishnan, Modeling the Attack-Detection Scenario with Network Intrusion Detection Systems, International Journal of
Security and Networks (under review).
5. Ciza Thomas and N. Balakrishnan, Improvement in Intrusion Detection
with Advances in Sensor Fusion, International Journal of Security and Networks (under review).
6. Ciza Thomas and N. Balakrishnan, Sensor Fusion for Performance Enhancement of Intrusion Detection Systems, IEEE Transactions on Dependable and Secure Computing (to be communicated).
7. Ciza Thomas and N. Balakrishnan, Intrusion Detection Systems: A survey,
ACM surveys (to be communicated).
v

vi

International Conference Publications


1. Ciza Thomas and N. Balakrishnan, Selection of Intrusion Detection Threshold for Effective Sensor Fusion, International Symposium on Defense and
Security, Proceedings of SPIE, 6570, 5, 2007.
2. Ciza Thomas, Vishwas Sharma and N. Balakrishnan, Usefulness of DARPA
Dataset for Intrusion Detection System Evaluation, International Symposium on Defense and Security, Proceedings of SPIE, 6973, 15, 2008.
3. Ciza Thomas and N. Balakrishnan, Improvement in Minority Attack Detection with Skewness in Network Traffic, International Symposium on
Defense and Security, Proceedings of SPIE, 6973, 23, 2008.
4. Ciza Thomas and N. Balakrishnan, Advanced Sensor Fusion Technique for
Enhanced Intrusion Detection, Proceedings of the IEEE International Conference on Intelligence and Security Informatics, 1-4244-2415, pp. 173178, 2008, available online in IEEEXplore.
5. Ciza Thomas and N. Balakrishnan, Performance Enhancement of Intrusion
Detection Systems using Advances in Sensor Fusion, Proceedings of the
International Conference on Information Fusion, 4883, 2, pp. 1671-1677,
2008.
6. Ciza Thomas and N. Balakrishnan, Modified Evidence Theory for Performance Enhancement of Intrusion Detection Systems, Proceedings of the
International Conference on Information Fusion, 4883, 2, pp. 1751-1758,
2008.
7. Ciza Thomas and N. Balakrishnan, Mathematical Analysis of Sensor Fusion for Intrusion Detection Systems, Proceedings of the International Conference on Communications and Networking, 97, 2009, available online in
IEEEXplore.
Research Symposium Publication
1. Ciza Thomas and N. Balakrishnan, Performance Enhancement of Intrusion
Detection Systems using Advances in Sensor Fusion, Techvista, Microsoft
Research Symposium, 2007.

Abstract
The technique of sensor fusion addresses the issues relating to the optimality
of decision-making in the multiple-sensor framework. The advances in sensor
fusion enable to perform intrusion detection for both rare and new attacks. This
thesis discusses this assertion in detail, and describes the theoretical and experimental work done to show its validity.
The attack-detector relationship is initially modeled and validated to understand
the detection scenario. The different metrics available for the evaluation of intrusion detection systems are also introduced. The usefulness of the data set
used for experimental evaluation has been demonstrated. The issues connected
with intrusion detection systems are analyzed and the need for incorporating
multiple detectors and their fusion is established in this work. Sensor fusion
provides advantages with respect to reliability and completeness, in addition to
intuitive and meaningful results. The goal for this work is to investigate how
to combine data from diverse intrusion detection systems in order to improve
the detection rate and reduce the false-alarm rate. The primary objective of the
proposed thesis work is to develop a theoretical and practical basis for enhancing the performance of intrusion detection systems using advances in sensor
fusion with easily available intrusion detection systems. This thesis introduces
the mathematical basis for sensor fusion in order to provide enough support for
the acceptability of sensor fusion in performance enhancement of intrusion detection systems. The thesis also shows the practical feasibility of performance
enhancement using advances in sensor fusion and discusses various sensor fusion algorithms, its characteristics and related design and implementation issues. We show that it is possible to build performance enhancement to intrusion
detection systems by setting proper threshold bounds and also by rule-based fusion. We introduce an architecture called the data-dependent decision fusion as
vii

viii

a framework for building intrusion detection systems using sensor fusion based
on data-dependency. Furthermore, we provide information about the types of
data, the data skewness problems and the most effective algorithm in detecting
different types of attacks. This thesis also proposes and incorporates a modified
evidence theory for the fusion unit, which performs very well for the intrusion
detection application. The future improvements in individual IDSs can also
be easily incorporated in this technique in order to obtain better detection capabilities. Experimental evaluation shows that the proposed methods have the
capability of detecting a significant percentage of rare and new attacks. The
improved performance of the IDS using the algorithms that has been developed
in this thesis, if deployed fully would contribute to an enormous reduction of
the successful attacks over a period of time. This has been demonstrated in the
thesis and is a right step towards making the cyber space safer.

Keywords
Intrusion Detection Systems, Sensor Fusion, Negative Binomial Distribution,
Chebyshev Inequality, Data-dependent Decision Fusion, Neural Network, Baserate Fallacy, Accuracy Paradox, Dempster-Shafer Evidence Theory, Contextdependent Operator

ix

Notation and Abbreviations

Notation

Details

Assigns weight to precision over recall

Mean value of the normal traffic profile

Mean value of the attack traffic profile

Set of all focal points

0
1

Standard deviation of normal traffic


Standard deviation of attack traffic

Standard deviation of Sensor indexed i

av

Average standard deviation

f usion

Standard deviation of fused Sensor

False alarm rate fixed for acceptable detection

False alarm rate at the fusion center

Frame of Discerment

Correlation coefficient

Correlation coefficient of n sensors

ji,k

Correlation coefficient between ith and k th detectors

i (s)

Bahadur-Lazarsfeld polynomial

At

Number of attacks at any time t

At+1

Number of attacks at any time (t + 1)

Bel(A)

Belief of hypothesis A

Cj

Class labels

Detection rate

Di

Detection rate of Sensor indexed i

Dt

Number of detectors at any time t

xi

Notation

Details

Dt+1

Number of detectors at any time (t + 1)

E(s)

Expectation of s

False positive rate

F (X)

Cumulative distribution function

Fi

False positive rate of Sensor indexed i

F Pi

False Positives of Sensor indexed i

Fj

Fusion function for any input indexed j

Gx

Average occurrence of X

Likelihood function

N
N

Total number of experiments


f

Number of experiments where all the Sensors fail to detect

Nt

Number of experiments where all the Sensors detect correctly

Ne

Number of encounters between the detector and the attack

Precision

P (s)

Probability density function of s

Pe

Probability of error

P l(A)

Plausibility of hypothesis A

P ()

Power set of FoD

P0

Prior probability of normal traffic

P1

Prior probability of attack traffic

Recall

Threshold

T Pi

True Positives of Sensor indexed i

V ar(s)

Variance of s

ai

Ambiguity

Attack increase rate

Detector learning parameter

Detector efficiency

Error rate

ef (x)

Feature extractor

e1 , ...em

Unknown feature list

Clumping factor

Number of Sensors in the fusion unit

Detector correlation

xii

Notation

Details

m(A)

Basic Probability Assignment

Time in years

pi

Probability of detection of Sensor indexed i

p(A)

Probability of hypothesis A

{(xn , yn )}

Training data set

Fusion output

sji

Output of Sensor indexed i corresponding to an input xj

si

Set of parameters associated with sensor indexed i

Network traffic as an input vector

vr

Variance reduction factor

Abbreviation

Details

AFRL

AirForce Research Laboratory

ALAD

Application Layer Anomaly Detector

ANN

Artificial Neural Network

ARPANET

Advanced Research Project Administration NETwork

AUC

Area Under Curve

BPA

Basic Probability Assignment

CD

Context Dependent

CERT

Computer Emergency Response Team

CERT/CC

Computer Emergency Response Team / Coordination Center

CRB

Cramer Rao Bound

CTF

Capture The Flag

CV

Coefficient of Variation

DARPA

Defense Advanced Research Projects Agency

DCost

Damage Cost

DD

Data-dependent Decision

DDoS

Distributed Denial of Service

DS

Dempster-Shafer

DOD

Department of Defense

DoS

Denial of Service

DRDoS

Distributed Reflector DoS

D-Tree

Decision Tree

xiii

Abbreviation

Details

EMERALD

Event Monitoring Enabling Responses to Anomalous Live Disturbances

FBI

Federal Bureau of Investigation

FCM

Fuzzy Cognitive Maps

F-score

Figure of merit score (or F-measure)

FN

False Negative

FoD

Frame of Discernment

FP

False Positive

FTP

File Transfer Protocol

IC3

Internet Crime Complaint Center

ICMP

Internet Control Message Protocol

IDS

Intrusion Detection Systems

IIDS

Intelligent Intrusion Detection System

IP

Internet Protocol

KDD

Knowledge Discovery in Databases

LR

Likelihood Ratio

MAP

Maximum A Posteriori

MIT

Massuchusetts Institute of Technology

NB

Naive Bayes

PHAD

Packet Header Anomaly Detector

P-test

A significance test

RBF

Radial Basis Function

R2L

Remote to Local

RB

Rule based

ROC

Receiver Operating Characteristics

RCost

Response Cost

SSH

Secure SHell

SVM

Support Vector Machine

TBM

Transferable Belief Model

TCP

Transmission Control Protocol

TN

True Negative

TP

True Positive

U2R

User to Root

UDP

User Datagram Protocol

Contents
Acknowledgements

ii

Publications based on this Thesis

Abstract

vii

Keywords

ix

Notation and Abbreviations

1 Introduction
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Intrusion Detection Systems: Background . . . . . . . .
1.2.1 Growth of the Internet . . . . . . . . . . . . . .
1.2.2 Growth of Internet attacks . . . . . . . . . . . .
1.2.3 Cyber crimes in India . . . . . . . . . . . . . . .
1.2.4 Financial risks in corporate networks . . . . . .
1.2.5 Need for Intrusion Detection Systems . . . . . .
1.2.6 Current status, challenges and limitations of IDS
1.2.7 Open issues . . . . . . . . . . . . . . . . . . . .
1.3 Motivation . . . . . . . . . . . . . . . . . . . . . . . . .
1.4 Problem statement . . . . . . . . . . . . . . . . . . . .
1.5 Major contributions of this thesis . . . . . . . . . . . . .
1.5.1 Theoretical formulation . . . . . . . . . . . . .
1.5.2 Experimental validation . . . . . . . . . . . . .
1.6 Research goal . . . . . . . . . . . . . . . . . . . . . . .
1.7 Organization of the thesis . . . . . . . . . . . . . . . . .

xiv

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

1
1
1
1
2
3
4
5
7
9
9
12
13
13
14
14
14

CONTENTS

xv

2 Issues Connected with Single IDSs and the Attack-Detection Scenario


2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Attackers influence on the detection environment . . . . . . . .
2.3 Data skewness in network traffic . . . . . . . . . . . . . . . . .
2.3.1 Classification of attacks . . . . . . . . . . . . . . . . .
2.3.2 Identification of real-world network traffic problems . .
2.3.3 Non-uniform misclassification cost . . . . . . . . . . .
2.3.4 Inability of IDS in optimum decision making due to
data skewness . . . . . . . . . . . . . . . . . . . . . . .
2.4 Attack-Detection Scenario in a Secured Environment . . . . . .
2.4.1 Internet attacks and the countermeasure for detection . .
2.4.2 Testing the performance of Intrusion Detection Systems
2.5 Modeling the attack-detector relationship . . . . . . . . . . . .
2.5.1 Detectors learning from the detected attacks . . . . . . .
2.5.2 Detector correlation . . . . . . . . . . . . . . . . . . .
2.6 Validation of the model using real-world data . . . . . . . . . .
2.6.1 Discussion on the modeling . . . . . . . . . . . . . . .
2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

30
31
32
33
38
45
46
48
49
49

3 Evaluation and Test-bed of Intrusion Detection Systems


3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Data set . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3 Usefulness of DARPA data set for IDS evaluation . . . . . . .
3.3.1 Criticisms against the DARPA IDS evaluation data set
3.3.2 Facts in support of the DARPA IDS evaluation data set
3.3.3 Results and discussion . . . . . . . . . . . . . . . . .
3.4 Choice and the performance improvement of individual IDSs .
3.4.1 Snort: Improvements by adding new rules . . . . . . .
3.4.2 PHAD/ALAD . . . . . . . . . . . . . . . . . . . . .
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.

53
53
55
57
57
58
59
64
65
66
67

.
.
.
.

68
68
69
70
72

4 Mathematical Basis for Sensor Fusion


4.1 Introduction . . . . . . . . . . . . . . . . . . . .
4.2 Sensor fusion algorithms . . . . . . . . . . . . .
4.2.1 Machine Learning for intrusion detection
4.2.2 Evidence Theory . . . . . . . . . . . . .

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

17
17
20
20
21
24
28

CONTENTS

xvi

4.2.3 Kalman filter . . . . . . . . . . . . . . . . . . . . . .


4.2.4 Bayesian network . . . . . . . . . . . . . . . . . . . .
Related work in sensor fusion . . . . . . . . . . . . . . . . . .
Related work using sensor fusion in intrusion detection application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Theoretical formulation . . . . . . . . . . . . . . . . . . . . .
Solution approaches . . . . . . . . . . . . . . . . . . . . . . .
4.6.1 Dempster-Shafer combination method . . . . . . . . .
4.6.2 Analysis of detection error assuming traffic distribution
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. 73
. 73
. 73
.
.
.
.
.
.

76
78
86
88
93
103

5 Selection of Threshold Bounds for Effective Sensor Fusion


5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 Modeling the fusion IDS by defining proper threshold bounds
5.3 Results and discussion . . . . . . . . . . . . . . . . . . . . .
5.3.1 Experimental evaluation . . . . . . . . . . . . . . . .
5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.

104
104
105
109
109
113

4.3
4.4
4.5
4.6

4.7

6 Performance Enhancement of IDS using Rule-based Fusion and Datadependent Decision Fusion
114
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
6.2 Rule-based fusion . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.3 Data-dependent decision fusion . . . . . . . . . . . . . . . . . . 117
6.3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . 118
6.3.2 Data-dependent decision fusion architecture . . . . . . . 118
6.3.3 Detection of rarer attacks . . . . . . . . . . . . . . . . . 121
6.4 Results and discussion . . . . . . . . . . . . . . . . . . . . . . 122
6.4.1 Test setup . . . . . . . . . . . . . . . . . . . . . . . . . 122
6.4.2 Data set . . . . . . . . . . . . . . . . . . . . . . . . . . 123
6.4.3 Data-dependent decision fusion algorithm . . . . . . . . 123
6.4.4 Experimental evaluation . . . . . . . . . . . . . . . . . 126
6.4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . 131
6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
7 Modified Dempster-Shafer Theory for Intrusion Detection Systems 134
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
7.2 Dempster Shafer combination method . . . . . . . . . . . . . . 136

CONTENTS

xvii

7.2.1

7.3
7.4

7.5
7.6

Motivation for choosing the Dempster Shafer combination method . . . . . . . . . . . . . . . . . . . . . . .


7.2.2 Limitations of the Dempster-Shafer combination . . .
Disjunctive combination of evidence . . . . . . . . . . . . . .
Context-dependent operator . . . . . . . . . . . . . . . . . . .
7.4.1 Performance of the proposed combination operator . .
7.4.2 Discussion . . . . . . . . . . . . . . . . . . . . . . .
Experimental evaluation . . . . . . . . . . . . . . . . . . . .
7.5.1 Impact of this work . . . . . . . . . . . . . . . . . . .
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8 Modeling of Intrusion Detection Systems and Sensor Fusion


8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . .
8.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . .
8.3 Modeling of data-dependent decision fusion system . . . .
8.3.1 Modeling of Intrusion Detection Systems . . . . .
8.3.2 Modeling the fusion IDS . . . . . . . . . . . . . .
8.3.3 Statement of the problem . . . . . . . . . . . . . .
8.3.4 The effect of setting threshold . . . . . . . . . . .
8.3.5 Modeling of neural network learner unit . . . . . .
8.3.6 Dependence on the data and the individual IDSs .
8.3.7 Threshold optimization . . . . . . . . . . . . . . .
8.4 Results and discussion . . . . . . . . . . . . . . . . . . .
8.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

137
138
141
142
145
149
151
153
154

.
.
.
.
.
.
.
.
.
.
.
.

156
156
157
157
158
159
161
162
164
165
166
167
167

9 Conclusions
169
9.1 Results and discussion . . . . . . . . . . . . . . . . . . . . . . 170
9.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
9.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
A Attacks on the Internet: A study
A.1 Introduction . . . . . . . . . . .
A.2 History of Internet attacks . . .
A.3 Attack motivation and objectives
A.4 Attack taxonomy . . . . . . . .
A.4.1 Viruses . . . . . . . . .
A.4.2 Worms . . . . . . . . .

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

174
174
175
176
177
177
179

CONTENTS

A.4.3 Trojans . . . . . . . . . . . . .
A.4.4 Buffer overflows . . . . . . . .
A.4.5 Denial of Service attacks . . . .
A.4.6 Network-based attacks . . . . .
A.4.7 Password attacks . . . . . . . .
A.4.8 Information gathering attacks .
A.4.9 Blended attacks . . . . . . . . .
A.5 Top ten cyber security menaces for 2008
A.6 Conclusion . . . . . . . . . . . . . . .

xviii

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

B Intrusion Detection Systems: A survey


B.1 Introduction . . . . . . . . . . . . . . . . . . . . . . .
B.2 History of Intrusion Detection Systems . . . . . . . . .
B.2.1 The emergence of intrusion detection systems .
B.3 Taxonomy of Intrusion Detection System . . . . . . .
B.3.1 Intrusion detection methods . . . . . . . . . .
B.3.2 Deployment techniques . . . . . . . . . . . .
B.3.3 Information source . . . . . . . . . . . . . . .
B.3.4 Architecture . . . . . . . . . . . . . . . . . . .
B.3.5 Analysis frequency . . . . . . . . . . . . . . .
B.3.6 Response . . . . . . . . . . . . . . . . . . . .
B.4 Latest Intrusion Detection softwares . . . . . . . . . .
B.5 Review of the data processing techniques used in IDS .
B.6 Current Intrusion Detection research . . . . . . . . . .
B.6.1 Intrusion Prevention System . . . . . . . . . .
B.7 Intrusion detection using multi-sensor fusion . . . . . .
B.7.1 Existing fusion IDSs . . . . . . . . . . . . . .
B.7.2 Current status of applying sensor fusion in IDS
B.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

C Modeling of the Internet Attacks and the Countermeasure for


tection
C.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . .
C.2 Nicholson-Bailey model . . . . . . . . . . . . . . . . . . .
C.2.1 Attack/Detection as they stand alone . . . . . . . . .
C.2.2 Attack carrying capacity . . . . . . . . . . . . . . .
C.2.3 Stability in attack-detector model . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

179
180
181
185
186
186
188
188
191

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

192
192
193
194
210
210
213
218
219
221
221
222
228
231
231
233
234
236
236

De.
.
.
.
.

.
.
.
.
.

237
237
239
242
243
245

CONTENTS

xix

C.2.4 Inclusion of stealthy attacks . . . . . . . . . . . . . . . 246


C.2.5 Modeling of non-random attacks and detection . . . . . 247
C.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
D Methodology for Evaluation of Intrusion Detection Systems
D.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . .
D.2 Metrics for performance evaluation . . . . . . . . . . . .
D.2.1 Detection rate and false alarm rate . . . . . . . .
D.2.2 Receiver Operating Characteristic (ROC) Curve .
D.2.3 The Area Under ROC Curve (AU C) . . . . . . .
D.2.4 Accuracy . . . . . . . . . . . . . . . . . . . . .
D.2.5 Precision . . . . . . . . . . . . . . . . . . . . .
D.2.6 Recall . . . . . . . . . . . . . . . . . . . . . . .
D.2.7 F-score . . . . . . . . . . . . . . . . . . . . . .
D.2.8 P-test . . . . . . . . . . . . . . . . . . . . . . .
D.3 Test setup . . . . . . . . . . . . . . . . . . . . . . . . .
D.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . .
References

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

249
249
250
250
251
251
251
252
252
252
253
254
255
256

List of Tables
2.1
2.2
2.3
2.4

Details of attack types present in DARPA 1999 data set [27]


Performance dependence of Snort on base-rate . . . . . . . .
Damage cost and response cost of different attack types [24]
Cost matrix [52] . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.

.
.
.
.

22
26
29
29

3.1
3.2
3.3
3.4
3.5

Attacks present in DARPA 1999 data set . . . . . . . . . . . .


Attacks detected by Snort from the DARPA 1999 data set . . .
Attacks detected by PHAD from the DARPA 1999 data set . .
Attacks detected by ALAD from the DARPA 1999 data set . .
Attacks detected by Cisco IDS from the DARPA 1999 data set

.
.
.
.
.

56
60
60
61
61

5.1
5.2

Types of attacks detected by PHAD at 0.00002 FP rate (100 FPs) 110


Types of attacks detected by ALAD at at 0.00002 FP rate (100
FPs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
Types of attacks detected by the combination of ALAD and
PHAD at 0.00004 FP rate (200 FPs) . . . . . . . . . . . . . . . 110
F-score of PHAD for different choice of false positives . . . . . 110
F-score of ALAD for different choice of false positives . . . . . 110
F-score of fused IDS for different choice of false positives . . . 111
Comparison of the evaluated IDSs using the real-world network
traffic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

5.3
5.4
5.5
5.6
5.7
6.1
6.2
6.3

Types of attacks detected by the rule-based combination of ALAD


and PHAD at a FP rate of 0.000025 (125 FPs) . . . . . . . . . . 117
Types of attacks detected by PHAD at a false positive rate of
0.00002 (100 FPs) . . . . . . . . . . . . . . . . . . . . . . . . 126
Types of attacks detected by ALAD at a false positive rate of
0.00002 (100 FPs) . . . . . . . . . . . . . . . . . . . . . . . . . 127

xx

LIST OF TABLES

6.4
6.5
6.6
6.7
6.8
6.9
7.1
7.2
7.3

Types of attacks detected by Snort at a false positive rate of


0.0002 (1000 FPs) . . . . . . . . . . . . . . . . . . . . . . . . . 127
Types of attacks detected by DD fusion IDS at a false positive
rate of 0.00002 (100 FPs) . . . . . . . . . . . . . . . . . . . . . 127
Comparison of the evaluated IDSs with various evaluation metrics127
Detection of different attack types by single IDSs and datadependent decision fusion IDS . . . . . . . . . . . . . . . . . . 127
Comparison of the evaluated IDSs using the real-world data set . 130
Performance comparison of single IDSs and DD fusion IDS . . 130

Evidence with total conflict . . . . . . . . . . . . . . . . . . . .


Evidence with conflict . . . . . . . . . . . . . . . . . . . . . .
Evidence from four sensors with one unreliable using the DS
method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.4 Evidence from four sensors with one unreliable using the contextdependent operator . . . . . . . . . . . . . . . . . . . . . . . .
7.5 Belief of each of the IDSs for a R2L attack . . . . . . . . . . . .
7.6 Belief of each of the IDSs for a stealthy probe . . . . . . . . . .
7.7 Type of attacks detected by PHAD at 100 false alarms . . . . . .
7.8 Type of attacks detected by ALAD at 100 false alarms . . . . .
7.9 Type of attacks detected by Snort at 1000 false alarms . . . . . .
7.10 Type of attacks detected by context-dependent fusion at 100
false alarms . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.11 Comparison of the evaluated IDSs using the real-world data set .
8.1

xxi

139
139
140
146
152
152
152
152
152
153
154

Average probability of error with DD fusion algorithms . . . . . 168

List of Figures
1.1
1.2

Growth of Internet in terms of the host count over the years [2] .
A typical security scenario in any network . . . . . . . . . . . .

2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
2.10
2.11
2.12
2.13
2.14
2.15

Probability of attack vs Bayesian attack detection rate for fixed


values of FP and TP . . . . . . . . . . . . . . . . . . . . . . .
Probability of attack vs F-score for Snort . . . . . . . . . . . .
Receiver Operator Characteristic graph . . . . . . . . . . . . .
Trade-off between recall and precision . . . . . . . . . . . . .
Precision and recall of IDS over the years . . . . . . . . . . .
Plot of F-score over the years . . . . . . . . . . . . . . . . . .
Growth in the incidents reported to CERT . . . . . . . . . . .
Growth in the vulnerabilities reported by CERT . . . . . . . .
Incidents reported over a period of time . . . . . . . . . . . .
D(1)=60, d=0.7 . . . . . . . . . . . . . . . . . . . . . . . . .
D(1)=40, d=0.7 . . . . . . . . . . . . . . . . . . . . . . . . .
D(1)=80, d=0.7 . . . . . . . . . . . . . . . . . . . . . . . . .
D(1)=60, d=0.5 . . . . . . . . . . . . . . . . . . . . . . . . .
D(1)=60, d=1.0 . . . . . . . . . . . . . . . . . . . . . . . . .
Effect of detector efficiency on the attack growth rate . . . . .

4.1

Fusion architecture with decisions from n IDSs . . . . . . . . . 81

5.1
5.2
5.3
5.4
5.5

Parametric curve showing the choice of threshold T


Detection rate vs Threshold . . . . . . . . . . . . .
Precision vs Threshold . . . . . . . . . . . . . . .
F-score vs Threshold . . . . . . . . . . . . . . . .
False Negative Rate vs Threshold . . . . . . . . . .

6.1
6.2

Data-dependent Decision fusion architecture . . . . . . . . . . . 119


Performance of evaluated systems . . . . . . . . . . . . . . . . 128
xxii

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

2
6
25
26
34
34
36
36
36
36
38
44
44
44
45
45
48

106
111
112
112
112

LIST OF FIGURES

xxiii

6.3
6.4
6.5

Semilog ROC curve of single and DD fusion IDSs . . . . . . . . 128


Comparison of evaluated systems . . . . . . . . . . . . . . . . . 129
Detection of Attack Types . . . . . . . . . . . . . . . . . . . . 129

7.1

Detection of Attack Types . . . . . . . . . . . . . . . . . . . . 153

8.1
8.2

Parallel Decision Fusion Network . . . . . . . . . . . . . . . . 160


Average probability of error . . . . . . . . . . . . . . . . . . . 168

A.1 Plot of Attack sophistication vs Intruder Knowledge


years . . . . . . . . . . . . . . . . . . . . . . . . . .
A.2 Distributed Denial of Service . . . . . . . . . . . . .
A.3 Distributed Reflector DoS (DRDoS) . . . . . . . . .
A.4 Taxonomy of Distributed DoS . . . . . . . . . . . .

over
. . .
. . .
. . .
. . .

the
. .
. .
. .
. .

.
.
.
.

176
183
184
185

B.1 Taxonomy of Intrusion Detection Systems . . . . . . . . . . . . 211


C.1 Attack-Detector relationship using the Nicholson-Bailey model . 241
C.2 Attack-Detector relationship with attack carrying capacity . . . 245

Chapter 1
Introduction
Life was simpler before World War II. After that, we had systems.
Admiral Grace Hopper

1.1 Introduction
Security attacks through Internet have proliferated in recent years. Hence, information security is an issue of very serious global concern of the present time.
This chapter discusses the growth of the Internet and the tremendous security
threat posed by the increased complexity, accessibility and openness of the Internet. The importance of protecting the corporate network has been introduced.
The need for network security and in particular the need for Intrusion Detection
Systems (IDS) have been brought out. This chapter also includes the motivation
for the work reported in this thesis, scope and objective of the thesis, the major
contributions of this thesis and synopsis of all the chapters of this thesis.

1.2 Intrusion Detection Systems: Background


1.2.1 Growth of the Internet
Internet has made real what in the 1970s a visionary of communications Marshall McLuhan called the Global Village [1]. In a matter of very few years,
the Internet has consolidated itself as a very powerful platform that has changed
the way we do business, and the way we communicate. The Internet, as no
other medium, has given an International or, else, a Globalized dimension to
1

Chapter 1

the world. It is the Universal source of information.


The Internet came into existence in the late seventies as an outgrowth of the
ARPANET, a DOD project. The incredibly fast evolution of the Internet from
1995 till the present time is understood from Figure 1.1 by taking the statistics
over this period of time. At the end of 1995, there were about 16 million Internet users, which was about 0.4% of the total population. By the end of 2005,
in a decade it has increased to 1018 million Internet users, which was about
15.7% of the total population. As of June 10, 2008, 1.596 billion people use the
Internet according to Internet World Stats.

x 10

GROWTH OF INTERNET IN TERMS OF HOST COUNT

HOST COUNT

0
1994

1996

1998

2000
2002
YEAR

2004

2006

2008

Figure 1.1: Growth of Internet in terms of the host count over the years [2]

1.2.2 Growth of Internet attacks


The growth of Internet has brought about great benefits to the modern society;
meanwhile, the rapidly increasing connectivity and accessibility to the Internet
has posed a tremendous security threat. The growth of attacks has roughly paralleled the growth of Internet [3]. Malicious usage, attacks and sabotage have
been on the rise as more and more computers are put into use. The attacks on

Chapter 1

the Internet have become both more prolific and easier to implement because
of the ubiquity of the Internet and the pervasiveness of easy-to-use operating
systems and development environments.
There are multiple penetration points for intrusions to take place in a network
system. For example, at the network level carefully crafted malicious IP packets can crash a victim host; at the host level, vulnerabilities in system software
can be exploited to yield an illegal root shell. The security threats have exploited all kinds of networks ranging from traditional computers to point-topoint and distributed networks. These security threats have also exploited the
vulnerable protocols and operating systems extending attacks to operating system on various kinds of applications, such as database and web servers. The
most popular operating systems regularly publish updates, but the combination
of poorly administered machines, uninformed users, a vast number of targets,
and ever-present software bugs has allowed exploits to remain ahead of patches.
Appendix A includes an involved study of the attacks on the Internet for further
reference.
1.2.3 Cyber crimes in India
The general and US focused trend of the cyber attacks have been highlighted in
the beginning of the section; but the trend in India is important to be looked into.
In the Indian scenario with e-commerce becoming popular in the last few years,
cyber crime is a term used to broadly describe criminal activity in which computers or computer networks are a tool, a target, or a place of criminal activity
and include everything from electronic cracking to denial of service attacks [4].
It is also used to include traditional crimes in which computers or networks are
used to enable the illicit activity. A key finding of the Economic Crime Survey 2006 was that a typical perpetrator of economic crime in India was male
(almost100%), a graduate or undergraduate and 31-50 years of age. Further,
one third of the frauds were from insiders and over 37% of them were in senior
managerial positions.
Trying to identify how widespread is the crime in India and the world over,

Chapter 1

experts feel that only a tiny proportion of cyber crime incidents are actually reported world over. In India, cyber crime cases registered are less compared to
the US, Europe, etc. The Internet Crime Complaint Center (IC3) 2006 ranks
the US (60.9%) as first among the nations in hosting perpetrators followed by
the UK (15.9%). Many countries, including India, have established Computer
Emergency Response Teams (CERTs) with an objective to coordinate and respond during major security incidents/events. These organizations identify and
address existing and potential threats and vulnerabilities in the system and coordinate with stakeholders to address these threats.
1.2.4 Financial risks in corporate networks
The threats on the Internet can translate to substantial losses resulting from
business disruption, loss of time and money, and damage to reputation. The
financial impact of application downtime and lost productivity caused by the
increasing number of application level vulnerabilities and frequency of attacks
is substantial.
According to the census conducted in US in 2007, volume of business-to-business
commerce has increased from US $38 billion in 1997 to US $990 billion in 2006
(an increase of 6.3%). The total E-commerce revenue of US $20 billion in 1999
has increased to US $990 billion in 2006. It was predicted that by 2008, online
retail sales will account for 10 percent of total US retail sales. There may have
been boom in the usage of Internet and online businesses; but one main issue
is security of online environment that is affecting both the users and businesses
alike.
According to US report, cyber crime robs US business about 67.2 billions a
year. Over the past two years, US consumers have lost US $8 billion to online
fraud schemes. The online fraudsters are not only cheating online business, but
they are also increasing the perception of fear among consumers. The 2005
annual computer crime and security survey [5], jointly conducted by the Computer Security Institute and the FBI, indicated that the financial losses incurred
by the respondent companies due to network attacks were US $130 million.

Chapter 1

In another survey commissioned by VanDyke Software in 2003, found that firewalls alone are not sufficient to provide adequate protection. Moreover, according to recent studies, an average of twenty to forty new vulnerabilities are
discovered every month in commonly used networking and computer products.
Such wide-spread vulnerabilities in software add to todays insecure computing/networking environment.
1.2.5 Need for Intrusion Detection Systems
Intrusions refer to the network attacks against vulnerable services, data-driven
attacks on applications, host-based attacks like privilege escalation, unauthorized logins and access to sensitive files, or malware like viruses, worms and
trojan horses. These actions attempt to compromise the integrity, confidentiality
or availability of a resource. Intrusions result in services being denied, system
failing to respond or data stolen or being lost. Intrusion detection means detecting unauthorized use of a system or attacks on a system or network. Intrusion
Detection Systems are implemented in software or hardware in order to detect
these activities.
An Intrusion Detection System (IDS) typically operates behind the firewall as
shown in Figure 1.2, looking for patterns in network traffic that might indicate
malicious activity. Thus, IDSs are used as the second and the final level of defense in any protected network against attacks that breach other defences. The
need for this second layer of protection is many times questioned like Do we
need an IDS once we have a firewall?. To briefly answer this question, it is
required to understand what a firewall does and does not do, and what an IDS
does and does not do. This will help in realizing the need for both IDS and
firewall to help in securing a network.
The existing network security solutions, including firewalls, were not designed
to handle network and application layer attacks such as Denial of Service and
Distributed Denial of Service attacks, worms, viruses, and Trojans. Along with
the drastic growth of the Internet, the high prevalence of the threats over the
Internet has been the reason for the security personnel to think of IDSs.

Chapter 1

Figure 1.2: A typical security scenario in any network

The unauthorized activities on the Internet are not only by the external attackers
but also by internal sources, such as fraudulent employees or people abusing
their privileges for personal gain or revenge. These internal activities cannot
be prevented by a firewall which usually stops the external traffic from entering
the internal network. Firewalls are made to stop unnecessary network traffic
into or out of any network. Packet filtering firewalls typically will scan a packet
for layer 3 and layer 4 protocol information. There are not very much dynamic
defensive abilities to most firewalls. The traffic approaching the firewall either
matches up to applied rule and is allowed through or the traffic is stopped and
the firewall logs the blocked traffic. As a result, IDSs, as originally introduced
by Anderson [6] in 1980 and later formalized by Denning [7] in 1987, have received increasing attention in the recent years. The IDSs along with the firewall
form the fundamental technologies for network security.
IDSs can be categorized into two classes, anomaly based IDSs and misuse based
IDSs. (Appendix B provides a detailed survey on various IDSs.) Anomaly
based IDSs look for deviations from normal usage behavior to identify abnormal behavior. Misuse based, on the other hand, recognize patterns of attack.
Anomaly detection techniques rely on models of the normal behavior of a computer system. These models may focus on the users, the applications, or the
network. Behavior profiles are built by performing statistical analysis on historical data [8, 9], or by using rule based approaches to specify behavior patterns

Chapter 1

[10, 11, 12]. A basic assumption of anomaly detection is that attacks differ from
normal behavior in type and amount. By defining whats normal, any violation
can be identified, whether it is part of threat model or not. However, the advantage of detecting previously unknown attacks is paid for in terms of high
false-positive rates in anomaly detection systems. It is also difficult to train
an anomaly detection system in highly dynamic environments. The anomaly
detection systems are intrinsically complex and also there is some difficulty in
determining which specific event triggered the alarms.
On the other hand, misuse detection systems essentially contain attack descriptions or signatures and match them against the audit data stream, looking for
evidence of known attacks [5, 13]. The main advantage of misuse detection
systems is that they focus analysis on the audit data and typically produce few
false positives. The main disadvantage of misuse detection systems is that they
can detect only known attacks for which they have a defined signature. As
new attacks are discovered, developers must model and add them to the signature database. In addition, signature-based IDSs are more vulnerable to attacks
aimed at triggering a high volume of detection alerts by injecting traffic that has
been specifically crafted to match the signatures used in the analysis process.
This type of attack can be used to exhaust the resources on the IDS computing
platform and to hide attacks within the large number of alerts produced.
In contrast to firewalls, a misuse based IDS will scan all packets at layers 3
and 4 as well as the application level protocols looking for back door Trojans,
Denial of Service attacks, worms, buffer overflow attacks, detect scans against
the network etc. An IDS provides much greater visibility to detect signs of
attacks and compromised hosts. There is still the need for a firewall to block
traffic before it enters the network; but, an IDS is also needed to make sure that
the traffic that gets past the firewall will be monitored.
1.2.6 Current status, challenges and limitations of IDS
Current cyber security capabilities have evolved largely as trivial patches and
add-ons to the Internet, which were designed on the principles of open communication and implicit mutual trust. It is now recognized that it is no longer

Chapter 1

sufficient to follow such evolutionary paths and that security must be considered as a sophisticated research and design part of the information infrastructure. With all the progress that IDS have made over the last few years, it still has
some major challenges. The analysis is always slow and often computationally
intensive. Hence, intrusion detection programs used to detect intrusions after
the intrusions have occurred. There is still a little hope to catch an attack in
progress.
The attackers continue to find ingenious ways to compromise remote hosts and
frequently make their tools publicly available. Also, the increasing size and
complexity of the Internet along with the end host operating systems, make it
more prone to vulnerabilities. Additionally, there is only very little broad understanding of the intrusion activity due to many privacy issues. Because of these
challenges, current best practices for the Internet security rely on the reports of
new intrusions and security holes from organizations like CERT. Another wellknown fact is that false positives are one of the biggest problems when working
with IDS. Also, a large number of false alerts mean a lot in terms of acceptability of IDSs if the incidence of attacks is considerably less in comparison to the
normal traffic.
It is very difficult to integrate router logs, system logs, firewall logs, and host
based IDS alerts, with alerts from a network based IDS. The last main challenge
is the need for skilled IDS analysts. In order to monitor and evaluate the alerts
forces, the analyst has to stay on top of all the newest attacks, worms, virus, different operating systems and network changes on the internal network to keep
rule list accurate.
In the last two decades, a range of commercial and public domain intrusion
detection systems have been developed. These systems use various approaches
to detect intrusions. As a result, they show distinct preferences in detecting certain classes of attacks with improved accuracy while performing moderately for
the other classes. The analysis of these IDSs have given us some insight into
the problems that still have to be solved before we can have intrusion detection
systems that are useful and reliable for detecting a wide range of intrusions.

Chapter 1

This has created an opportunity for us to enhance the performance of IDSs by


various advanced techniques.
1.2.7 Open issues
Although intrusion detection has evolved rapidly in the past few years, many
important issues remain. First, detection systems must be more effective, detecting a wider range of attacks with fewer false positives. Second, intrusion
detection must keep pace with modern networks increased size, speed and dynamics. Intrusion detection must keep up with the input-event stream generated
by high-speed networks and high-performance network nodes. Additionally,
there is the need for analysis techniques that support the identification of attacks
against whole networks. The issues connected with single IDSs is established
in greater detail in chapter 2. The challenge for increased system effectiveness
is to develop a system that detects close to 100 percent of attacks with minimal
false positives. We are still far from achieving this goal.

1.3 Motivation
The motivation behind the present work was the realization that with the increasing traffic and increasing complexity of attacks, none of the present day
stand-alone intrusion detection systems can meet the high demands for a very
high detection rate and an extremely low false alarm rate. Also, most of the
IDSs available in literature show distinct preference for detecting a certain class
of attack with improved accuracy while performing moderately for the other
classes of attacks. The inability of single IDSs for an acceptable attack detection is discussed in chapters 2 and 3. In view of the enormous computing power
available with the present day processors, combining multiple IDSs to obtain
best-of-breed solutions has been attempted earlier. The following works have
also motivated us to choose this area of research.
Lee et al. comment in their work [14] that analyzing the data from multiple
sensors should increase the accuracy of the IDS. Kumar [15] observes that correlation of information from different sources has allowed additional information to be inferred that may be difficult to obtain directly. Such correlation is

Chapter 1

10

also useful in assessing the severity of other threats, be it severe like an attacker
making a concerted effort to break in to a particular host, or severe like the
source of the activity is a worm with the potential to infect a large number of
hosts in a short span of time.
Lane in his work [16] comments that it is well known in the machine learning literature that appropriate combination of a number of weak classifiers can
yield a highly accurate global classifier. Likewise, Neri [17] notes the belief
that combining classifiers learned by different learning methods, such as hillclimbing and genetic evolution, can produce higher classification performances
because of the different knowledge captured by complementary search methods.
The use of numerous data mining methods is commonly known as an ensemble
approach, and the process of learning the correlation between these ensemble
techniques is known by names such as multistrategy learning, or meta-learning.
Chan and Stolfo [18] note that the use of meta-learning techniques can be easily parallelized for efficiency. Additional efficiencies can be gained by pruning
less accurate classifiers [19]. Carbone [20] notes that these multistrategy learning techniques have been growing in popularity due to the varying performance
of different data mining techniques. She describes multistrategy learning as a
high level controller choosing which outputs to accept from lower level learners
given the data, what lower level learners are employed, and what the current
goals are.
The generalizations made concerning ensemble techniques are particularly apt
in intrusion detection. As Axelsson [21] notes, In reality there are many different types of intrusions, and different detectors are needed to detect them.
As such, the same argument that Lee et al. make in their work [22, 23] for
the use of multiple sensors applies to the use of multiple methods as well: if
one method or technique fails to detect an attack, then another should detect
it. They note in the work of Lee and Stolfo [24] that combining evidence from
multiple base classifiers is likely to improve the effectiveness in detecting intrusions. They went on to find that by combining signature and anomaly detection
models, one can improve the overall detection rate of the system without compromising the benefits of either detection method [14] and that a well designed

Chapter 1

11

/ updated misuse detection module should be used to detect the majority of the
attacks, and anomaly detection is the only hope to fight against the innovative
and stealthy attacks [43]. Mahoney and Chan [25] suggest that, because their
IDSs use technique that has significant non-overlap with other IDSs, combining
their technique with others should increase detection coverage. In performing a
manual post hoc analysis of the results of the 1998 DARPA Intrusion Detection
Challenge, the challenge coordinators found that the best combination of 1998
evaluation systems provides more than two orders of magnitude of reduction in
false alarm rate with greatly improved detection accuracy [27].
Despite the positive accolades and research results that sensor fusion approaches
for intrusion detection have received, the only IDS that has been specifically designed for the use of multiple detection methods is EMERALD [30], a research
IDS. The lack of published research into applying multiple and heterogeneous
IDSs seems to be a significant oversight in the intrusion detection community.
We believe that fusion based IDSs are the only foreseeable way of achieving
high accuracy with an acceptably low false positive rate. In spite of all the earlier works in enhancing the performance of IDSs, the overall performance of the
IDS leaves room for improvement. Multi-sensor fusion meets the requirements
of a better than the best detection by a refinement of the combined response of
different IDSs with largely varying accuracy. The motivation for applying sensor fusion in enhancing the performance of intrusion detection systems is that a
better analysis of existing data gathered by various individual IDSs can detect
many attacks that currently go undetected.
Through out this thesis we use the term sensor to denote a component that monitors the network traffic or the audit logs for indications of suspicious activity
in a network or on a system, according to a detection algorithm and produces
alerts as a result. A simple IDS, in most cases constitutes a single sensor. We
therefore use the terms sensor and IDS interchangeably in this thesis.

Chapter 1

12

1.4 Problem statement


The review of the state-of-the-art IDSs has posed the following problems to be
considered for further research.
Problem No.1: What are the issues connected with single IDSs? Is there any
IDSs available at present which has a complete detection coverage? Is it possible to improve the detection performance of the available IDSs without affecting
the false alarm rate?
Problem No.2: How to model the attack-detector relationship so that the detector performance is better understood?
Problem No.3: What are the different metrics available for the effective evaluation of IDSs?
Problem No.4: Is the technique of sensor fusion acceptable for the performance
improvement of IDS? What are the sensor fusion algorithms and which algorithm best suites the intrusion detection application? As such, the primary thesis
question is Why and how does sensor fusion succeed?
Problem No.5: How to select threshold bounds for effective sensor fusion?
Problem No.6: How to propose an architecture better than the threshold-based
or the rule-based taking into consideration the large data set and also the dynamic nature of the network environment? How can the problems associated
with the skewness in the network traffic effectively handled?
Problem No.7: What are the limitations associated with the Demspter-Shafer
evidence theory in order to use it for sensor fusion? How can it be modified to
suite a generic fusion environment?
Problem No.8: How can IDS and sensor fusion be modeled?
This thesis tries to find solutions to the above raised problems and the proposed
solutions are available in the chapters from two to eight with the related work
in the above problem areas introduced in the respective chapters. In essence the
problem can be stated as: Given a set of heterogeneous IDSs monitoring a network, how should these individual decisions be assimilated in order to enhance
the localization performances of the combination?
A more complete statement of this investigation:

Chapter 1

13

Can we effectively detect network intrusions by applying advances in sensor fusion techniques to intrusion detection?
Is it possible to combine a number of currently known intrusion detection
systems with a simple fusion technique based on the data-dependency and
performance of the individual IDS?
Can a single computational model be used to represent and monitor exploitations in all the attack categories using advanced sensor fusion techniques?
The strategy for answering these questions is to engage in a study of the literature concerned with similar studies, and then to proceed with a theoretical and
empirical analysis.

1.5 Major contributions of this thesis


This thesis contributes various sensor fusion algorithms for the effective enhancement of intrusion detection performance. The thesis also incorporates a
theoretical basis for improvement in performance of IDSs using sensor fusion
techniques.
1.5.1 Theoretical formulation
1. Issues associated with data skewness in attack detection have been identified.
2. Attack-detector relationship has been modeled.
3. Improvement in performance of fusion IDS in comparison with any of the
constituent IDSs has been proved for both dependent as well as independent IDSs.
4. The Chebyshev inequality principle is used to derive appropriate threshold
bounds for the fusion unit.
5. The Demspter-Shafer evidence theory is modified for intrusion detection
application.

Chapter 1

14

1.5.2 Experimental validation


1. Attack-detector model has been validated.
2. The DARPA dataset for IDS evaluation has been validated against some of
the existing IDSs.
3. IDS fusion using threshold bounds has been validated.
4. Rule based fusion as well as the data-dependent decision fusion have been
validated for its improved performance.
5. Improved detection rate for the rarer attacks using data-dependent decision
fusion method has been validated.
6. Modified evidence theory for the fusion unit aiding enhanced detection has
been validated.

1.6 Research goal


In order to figure out how sensor fusion can be applied for performance enhancement of IDSs, we define sensor fusion as the process of collecting information from multiple and possibly heterogeneous sources and combining them
in order to provide a more descriptive, intuitive and meaningful result. Given
the additional computational requirements of sensor fusion for network intrusion detection and the unique benefits to be gained, we believe that an unique
processing module using the proposed data-dependent decision fusion architecture with the modified evidence theory, which offers the better than the best
protection will become a standard part of future defense-in-depth security architectures.

1.7 Organization of the thesis


A brief synopsis of each of the chapters of this thesis is included below:
Chapter 1 presents the motivation, goal, and contributions of this thesis work in
detail. It also includes a discussion on the growth of the Internet, growth of the

Chapter 1

15

attacks, the importance for protecting the corporate network and the need of network security and in particular the need of intrusion detection systems. Chapter
2 discusses in detail the data skewness in the monitoring traffic as well as other
issues in intrusion detection. This chapter also models the attack-detection scenario for IDSs. The model includes attack carrying capacity, performance improvement of detectors, and detector correlation assuming non-random interaction between attacks and detectors. Chapter 3 discusses the evaluation and
testbed of IDSs. This chapter includes a detailed discussion on the DARPA data
set and its usefulness in the IDS evaluation. The modifications of the available
IDSs used in this work has been attempted. This chapter also brings out the
inability of single IDSs to make a complete detection coverage of the attack
domain. Chapter 4 provides a survey of sensor fusion after identifying the issues and limitations of single IDSs in chapters 2 and 3 respectively. The related
work in sensor fusion and in particular the related work using sensor fusion in
intrusion detection application are discussed. The mathematical basis as well
as the theoretical analysis of sensor fusion in IDS are also incorporated in this
chapter. Chapter 5 includes the selection of intrusion detection system threshold
bounds, which is an important parameter in sensor fusion. The bounds are deduced by means of Chebyshev inequality using the detection rate and the false
positive rate, for effective sensor fusion. The theoretical proof is supplemented
with empirical evaluation and the results have been compared.
Chapter 6 discusses the performance enhancement of intrusion detection using rule based fusion. A new data-dependent decision fusion architecture is
also proposed in this chapter. The experimental evaluation given in this chapter uses the data-dependent decision fusion architecture by specifically looking
at the data skewness in the monitoring traffic. Chapter 7 presents a new modified evidence theory, which is an extension and improvement of the classical
Dempster-Shafer theory. The context dependent operator proposed in this chapter was demonstrated to be feasible for sensor fusion. Chapter 8 provides theoretical models for the intrusion detection systems, the neural network learner
and the sensor fusion system. The chapter also includes a discussion on the
effect of threshold on detection and the threshold optimization. Chapter 9 provides the results and discussion on the main findings of this investigation. The

Chapter 1

16

conclusions detailing the overall implications of the methodologies introduced


in this thesis are drawn and recommendations for future work are also presented
in this chapter. A comprehensive study of the attacks on the Internet is presented
in Appendix A. Appendix B provides a detailed survey on various IDSs. Appendix C introduces dynamic models for the attack-detector interactions with
the simple Nicholson-Bailey precursor. The metrics used for IDS evaluation in
chapters 5-8 are discussed in Appendix D.

Chapter 2
Issues Connected with Single IDSs and the
Attack-Detection Scenario
Problems worthy to attack prove their worth by fighting back.
Paul Erdos

2.1 Introduction
The probability of intrusion detection in a corporate environment protected by
an IDS is low because of various issues. The network IDSs have to operate
on encrypted traffic packets where analysis of the packets is complicated. The
high false alarm rate is generally cited as the main drawback of IDSs. For IDSs
that use machine learning technique for attack detection, the entire scope of
the behavior of an information system may not be covered during the learning
phase. Additionally, the behavior can change over time, introducing the need
for periodic online retraining of the behavior profile. The information system
can undergo attacks at the same time the intrusion detection system is learning the behavior. As a result, the behavior profile contains intrusive behavior,
which is not detected as anomalous. In the case of signature-based IDSs, one
of the biggest problems is maintaining state information for signatures in which
the intrusive activity encompasses multiple discrete events (i.e., the complete
attack signature occurs in multiple packets on the network). Another drawback
is that the misuse detection system must have a signature defined for all possible attacks that an attacker may launch against the network. This leads to the
necessity for frequent signature updates to keep the signature database of the
17

Chapter 2

18

misuse detection system up-to-date.


Many of the IDS technologies are complement to each other, since for different kind of environments some approaches perform better than others. The
processes followed by IDS operations for detecting intrusions are mainly, 1.
monitoring and analyzing the network activities, 2. finding vulnerable parts in
a network, and 3. integrity testing of sensitive and important data. If an IDS
is to monitor all these activities, the complexity of the IDS becomes unacceptably large. If we look at the present day information system security, a network
intrusion detection system would be considered the best choice to protect the
machines from Denial of Service (DoS) attacks. At the same time, a host intrusion detection system would be the right choice to protect the systems from
internal users. In order to protect against trojans on systems, a file integrity
checker might be more appropriate. To protect the servers from attackers, an
intrusion prevention system could be the best bet. This shows that the sensors
available in literature show distinct preference for detecting a certain attack with
improved accuracy and that none of them shows good detection rate for all types
of attacks or a complete intrusion detection coverage. Since an information system has to be protected from all types of attacks, it is most likely that we will
actually need a combination of all these methods or sensors. This argument is
ascertained in this chapter by looking at the limitations of single IDSs and also
the rather slow growth rate of the performance of IDSs available in literature
over the years.
In this chapter, we examine the situation of data skewness in real-world data.
We also consider the realistic problems of the data skewness in the detection of
rare attacks. Even the highly accurate intrusion detection systems lack acceptability and usability as attack detectors if the incidence of attack is rare in the
general traffic. A large number of false alarms mean a lot in terms of the acceptability of IDSs. The underlying principle here is called the base-rate fallacy.
Yet another problem encountered in intrusion detection is that most of the IDSs
generate a trivial model by almost predicting the majority class, resulting in a
higher overall accuracy. This results in a higher error rate for the minority class
than the majority class. Within the attack class, the minority attack types cause

Chapter 2

19

more damage than the majority attack types. Thus high accuracy is not necessarily an indicator of high model quality, and therein lies the accuracy paradox
of predictive analytics. This chapter gives supporting fact to the need of giving
a higher weighting to the minority attack types namely Remote to Local (R2L)
and User to Root (U2R), compared to the majority attack types like probe and
Denial of Service (DoS) [27]. Also, the cost of missing an attack is higher than
the cost of false positives.
The class distribution affects the learning of the IDSs. The problem of designing IDSs to work effectively and yield higher accuracies for minor attacks even
in the mix of data skewness has been receiving serious attention in recent times.
In most of the available literature [31, 32, 33], this is overcome by resampling
the training distribution. The resampling is done either by oversampling of the
minority class or by undersampling of the majority class. The other commonly
used approach for overcoming data imbalance is through cost-sensitive learning
[34, 35], the two-phase rule induction method [36], and rule based classification
algorithms like RIPPER [37] and C4.5 rules [38]. In spite of all such attempts,
the performance of the IDSs in detecting minority and rarer attacks leaves room
for improvement. It is necessary for a detection system to perform much better
than those reported so far for the minority attacks while preserving the performance for the majority attacks.
This chapter also models the attack-detection scenario for intrusion detection
systems. The model includes the attack carrying capacity, the detectors performance improvement from the attacks that it has detected, and the detector
correlation assuming non-random interaction between the attack and the detector. This modeling shows that as the intrusion detection performance improves
with time, the slope of F-score is positive and becomes steeper, which causes
the effect of attack to disappear. However, it is seen from the study of the IDSs
developed over the last ten years that it is not possible to get that type of a
growth rate with a single system so that the effect of attacks is not felt in the
information systems. This establishes the need for enhancing the performance
of IDSs.

Chapter 2

20

This chapter is organized as follows. In section 2.2, the attackers influence


on the detection environment is envisaged. In section 2.3, the data skewness
problem is exemplified. This is done with reference to the DARPA data set as
well as an observation of a typical University traffic. In section 2.4, the attackdetection scenario on a secured network is discussed taking into account the
various possibilities of interaction between the two groups of population. Section 2.5 provides the modeling of the non-random attack-detection relationship.
The summary of the chapter is drawn in section 2.6.

2.2 Attackers influence on the detection environment


It is required to consider the base rate also along with the false alarm rate and
the detection rate in the analysis of IDS evaluation. These quantities can be
controlled by the intruder to a certain extent. The base-rate can be modified by
controlling the frequency of attacks. Additionally, slow and spread out attacks
are difficult to be identified by many of the existing IDSs. The perceived false
alarm rate can be increased if the intruder finds a flaw in any of the signatures of
an IDS that allows the intruder to send maliciously crafted packets that trigger
alarms at the IDS but that look benign to the IDS operator. This overloads the
system administrator or the security analyst, so that the true attacks may get
missed in between. Finally, the detection rate can be modified by the intruder
with the creation of new attacks whose signatures do not match those of the IDS,
or totally uncorrelated and novel attacks from that found in the training data
set, or simply by evading the detection scheme, for example by the creation
of a mimicry attack. Now, considering the base rate alone, if the base rate is
intentionally reduced by an adversary, then the precision of detection by an IDS
reduces and the true detection gets embedded in a lot of false positives. Hence,
we can say that the skewness is present naturally in the network traffic, which
is made even worse by an adversary to succeed in his/her plans.

2.3 Data skewness in network traffic


The goal of an IDS is to collect information from a variety of systems and
network sources, and then analyze the information for signs of intrusion and

Chapter 2

21

misuse. The network-based IDS captures the network traffic and analyzes the
packet or connection for attack traffic. The host-based IDS monitors the activities of the host on which it is installed.
The network traffic is made up of attack or anomalous traffic, and the normal
traffic. The real-world traffic is predominantly made up of normal traffic rather
than attack traffic. The fact that the normal traffic abound is supported by analyzing an University network traffic using the snort packet logger. Also, a study
of the characteristics of the network traffic during phases dominated by DDoS
attacks and worm propagation has been done. One of the statistics on the traffic
volume generated by DDoS attack is given in the work of Lan et al.[39]. One
typical situation had 28 attackers and generated 11 Mega packets of 8.6 Gb of
attack traffic in 192 seconds directed at a single host. The magnitude of the attack traffic is about three times more than the normal traffic during this duration
of time. Just after a duration of 192 seconds, the attack activity has come down
to normal. With an exception of such very small durations and that too very
rarely, the usual rate of attack traffic is extremely low. In addition, Bay [40]
mentions anomalous activity to be extremely rare and unusual. Fawcett [41]
has commented that positive activity is inherently rare.
2.3.1 Classification of attacks
A general classification of various attacks found in the network traffic is introduced in Appendix A. The thesis work of Kendall [27] provides in detail
an attack taxonomy with respect to the DARPA Intrusion Detection Evaluation data set [42]. The same is discussed here in brief. The various attacks
found in the DARPA 1999 data set are given in the Table 2.1. The probe or
scan attacks automatically scan a network of computers or a DNS server to find
valid IP addresses (ipsweep, lsdomain, mscan), active ports (portsweep, mscan),
host operating system types (queso, mscan) and known vulnerabilities (satan).
The DoS attacks are designed to disrupt a host or network service. These include crashing the Solaris operating system (selfping), actively terminate all
TCP connections to a specific host (tcpreset), corrupt ARP cache entries for a
victim not in others caches (arppoison), crash the Microsoft Windows NT web
server (crashiis) and crash Windows NT (dosnuke). In R2L attacks, an attacker

Chapter 2

22

Table 2.1: Details of attack types present in DARPA 1999 data set [27]
Attack type
Solaris
SunOS
WinNT
Linux
All
Probe
portsweep
portsweep
ntinfoscan
lsdomain, mscan illegal-sniffer
queso
queso
portsweep
queso, portsweep
ipsweep
satan
portsweep
DoS
neptune pod
arppoison
arppoison
apache2, back
processtable
land, pod
crashiis
arppoison
smurf
neptune
dosnuke
mailbomb
syslogd
mailbomb
smurf
neptune, pod
tcpreset
processtable
tcpreset
processtable
warezclient
smurf, tcpreset
teardrop, udpstorm
R2L
dict, ftpwrite
dict
dict
dict, imap, named
snmpget
guest
xsnoop
framespoof
ncftp, phf
httptunnel
netbus
sendmail
xlock
netcat
sshtrojan
xsnoop
ppmacro
xlock, xsnoop
U2R
eject, ps
loadmodule
casesen
perl, xterm
fdformat
ntfsdos
sqlattack
ffbconfig
nukepw
sechole, yaga
Data
secret
ntfsdos
secret sqlattack
ppmacro

who does not have an account on a victim machine gains local access to the
machine (e.g., guest, dict), exfiltrates files from the machine (e.g., ppmacro) or
modifies data in transit to the machine (e.g., framespoof). New R2L attacks
include an NT PowerPoint macro attack (ppmacro), a man-in-the middle web
browser attack (framespoof), an NT trojan-installed remote-administration tool
(netbus), a Linux trojan SSH server (sshtrojan) and a version of a Linux FTP
file access-utility with a bug that allows remote commands to run on a local
machine (ncftp).
In U2R attacks, a local user on a machine is able to obtain privileges normally
reserved for the UNIX super user or the Windows NT administrator. The Data
attack is to exfiltrate special files, which the security policy specifies should
remain on the victim hosts. These include secret attacks, where the user with
permission to access the special files exfiltrates them via common applications
such as mail or FTP, and other attacks where privilege to access the special files
is obtained using an U2R attack (ntfsdos, sqlattack). An attack could be labeled

Chapter 2

23

as both U2R and Data if one of the U2R attacks was used to obtain access to the
special files. The Data category thus specifies the goal of an attack rather than
the attack mechanism [27].
Attack behavior [43]

An analysis of the various attack types within the network traffic has resulted in
certain inferences that are listed below:
Probing attacks are expected to show limited variance as it involve making connections to a large number of hosts or ports in a given time frame. Likewise,
the outcome of all U2R attacks is that a root shell is obtained without legitimate means, e.g., login as root, su to root, etc. Thus, for these two categories
of attacks, given some representative instances in the training data, any learning
algorithm was able to learn the general behavior of these attacks. As a result,
the IDSs detect a high percentage of old and new Probing and U2R attacks. On
the other hand, DoS and R2L have a wide variety of behavior because they exploit the weaknesses of a large number of different network or system services.
The features constructed based on the available attack instances are very specialized to the known attack types. Hence most of the trivial IDS models missed
a large number of new DoS and R2L attacks. It is understood that the misuse
detection models fail in the case of novel attacks. Even the anomaly detection
models do not work well when there is large variance in user behavior since
the algorithm tries to model the normal behavior and the attack behavior shows
a large variance many times overlapping with the normal behavior. Hence it
turns out to be difficult to guard against the new and diversified attacks. The
same is the case with the network traffic with a relatively small number of intrusion only patterns, normal network traffic can have a large number of variations.
It is observed from most of the previous studies that there was no attempt to
consider the correlation information in the input network traffic for improving
the detection effectiveness. The network traffic can be characterized in terms
of sequences of discrete data with temporal dependency [44, 45, 46]. It is observed in [47] that different network intrusions have different correlation statistics which can be directly utilized in the covariance feature space to distinguish

Chapter 2

24

multiple and various network intrusions effectively. By constructing a covariance feature space, a detection approach can thus utilize the correlation differences of sequential samples to identify multiple network attacks. It is also
pointed out that the covariance based detection will succeed in distinguishing
multiple classes with near or equal means while any traditional mean based
classification approach will fail. Even the best intrusion detection systems for
the DARPA evaluation [48] shows that less than 10% of new R2L intrusion attempts have been detected. Hence, the numbers of detections of new attacks are
more significant in determining the quality of IDSs.
2.3.2 Identification of real-world network traffic problems
The attack traffic in a real-world traffic is mostly rare. In addition, the attack
types within the attack class itself is skewed with the probes and DoS attacks
abound whereas the R2L and U2R attacks being rare. The effect of this data
skewness poses some very serious issues in the performance of IDSs, mainly in
two ways:
1. Applying the conditional probability using Bayes theorem, the detection
of an attack can be shown to be difficult unless both the percentage of
attacks in the entire traffic and the accuracy rate of their identification are
far higher than they are at present. The Bayesian rate of attack detection
P r(I|A) [49] is given by:
P r(I|A) =

P r(A|I) P r(I)
P r(A|I) P r(I) + P r(A|N I) P r(N I)

(2.1)

where I denotes the intrusion, N I denotes no-intrusion, and A denotes the


alert. The false alarm rate is the limiting factor for the performance of most
of the IDSs. This is due to the base-rate fallacy phenomenon, which says
that in order to achieve substantial value for the Bayesian detection rate,
it is necessary to achieve an unattainably low false positive rate [49]. The
data set commonly used in order to apply this reasoning in the evaluation
of the IDSs is the DARPA 1999 evaluation data set where the ratio of
number of attacks to the number of normal traffic is roughly of the order
of 1:26,000. The DARPA data is supposedly modeling a realistic situation,
having been synthesized based on the traffic observed on a large US Air

Chapter 2

25

BAYESIAN ATTACK DETECTION RATE FOR VARYING DATA SKEWNESS


1
TP=0.99, FP=0.01

0.9

Pr(INTRUSION | ALERT)

0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
5
10

10

10
10
10
PROBABILITY OF ATTACK (LOG SCALE)

10

Figure 2.1: Probability of attack vs Bayesian attack detection rate for fixed values of FP and TP

Force base. With an IDS which is 99% accurate, and a false positive rate of
0.01, the Bayesian rate of attack detection, P r(I|A) is obtained as 0.00379.
Hence the false positives increase to roughly 262 for detecting almost one
real attack. This clearly shows the inability of the IDS for its proposed
task of attack detection, where the actual attacks get embedded in the large
volume of false positives. Even though the detection is 99% certain, the
chance of detecting an attack in the total alerts is only 1/263, due to the
fact that normal traffic is much larger than the attack traffic. Thus it is
difficult to interpret what a small false alarm rate is, when the base rate is
also small. From equation 2.1, the Bayesian attack detection rate can be
approximated as:
P r(I)
P r(I|A)
(2.2)
P r(A|N I)
Equation 2.2 clearly shows that for the probability of an alert being an
intrusion will be almost 1 for a given data, only if the false alarm rate is
of the same order as the prior probability. This adverse effect of the data
skewness is also illustrated with Figure 2.1. Figure 2.1 shows that the
naturally occurring class distribution often does not produce the best performing IDS. The optimal distribution generally contains more than 50%
of minority class examples.

Chapter 2

26

Table 2.2: Performance dependence of Snort on base-rate


Attacks Normal
FP
TP Precision Recall F-score
190
5000000 1000 115
0.1
0.61
0.17
190
2000000 500 115
0.19
0.61
0.29
190
1000000 260 115
0.31
0.61
0.41
190
100000
53
115
0.68
0.61
0.64
190
10000
5
115
0.96
0.61
0.75
190
1000
1
115
0.99
0.61
0.76
190
190
0
115
1
0.61
0.76
Base Rate vs Fscore
0.8
0.7

FScore

0.6
0.5
0.4
0.3
0.2
0.1

10

10

10
Base Rate

10

10

Figure 2.2: Probability of attack vs F-score for Snort

Yet another attempt is made to demonstrate that data skewness, which is


normally found in real traffic and hence reproduced in simulated traffic
like DARPA dataset, is a reality and is a problem in the performance of
any IDS. The commonly used signature based IDS, Snort, is used to show
the improvement in performance score (F-score) with the proportion of attack to normal traffic. Table 2.2 and Figure 2.2 show the adverse effect of
the base rate on the performance of an IDS. Beyond a stage where the base
rate equals the false positive rate, the IDS performs in an optimum way.
2. The standard base-rate says that it is safe to assume all the traffic as normal and still get an accuracy of 99.99%. The performance of IDS is usually evaluated using the accuracy measure, but the measure fails in the

Chapter 2

27

case of imbalance in data and also when the cost of different errors vary
markedly. Most of the IDSs generate a trivial model by almost predicting the majority classes, since predicting the minority classes has a much
higher error rate, which in turn degrades the performance of the IDS. This
is accuracy paradox, which says that in predictive analytics, high accuracy
is not necessarily an indicator of high model quality. This is explained
with the DARPA test data set with 5 million test records consisting of 190
attacks. Consider an IDS which detects 100 attacks at a false positive rate
of 0.01%. The accuracy of this detector is 99.994%. However the accuracy
paradox lies in the fact that the accuracy can be easily made 99.996% by
always predicting normal. The second model, even though has a higher
accuracy, is useless since it does not detect attacks. Hence, most of the
IDSs do not detect minority class types sufficiently well since they aim
to minimize the overall error rate, rather than paying attention to minority
class, which is obviously not the desired detection result. Thus the present
day stand-alone IDSs are not effective in detecting the attacks, especially
the rare class of attack types.
The probes aim at knowing whether a particular IP exists and if so the
details regarding the services and the operating system running on it. Probing may be normal or may be the pre-phase of an attack. In the latter case,
it may not be possible to confirm the intend of the prober and thus recognize an attack, but instead proper preventive measures like the installation of patches, deployment of firewall, addition of firewall rules, removal
of unused services, etc., can be taken so that the vulnerability does not
get exploited. DoS is mainly disrupting the services on a network or on
a host. Hence DoS causes service denial and probes are for reconnaissance/surveillance.
The R2L attack on the other hand gains account on a remote machine,
exfiltrates files, modifies data, installs trojans for back door entry etc. The
U2R attack uses buffer overflow to acquire root shell and getting the full
control of the system. Data attacks are the special case where an attacker
gets privilege to access the special files. In real-world environment, these

Chapter 2

28

minority attacks are more dangerous than the majority attacks. Hence, it
is essential to improve the detection performance for the minority attacks,
while maintaining a reasonable overall detection rate.
2.3.3 Non-uniform misclassification cost
It is important to understand that the cost of misclassifying an attack as a normal(type I errors or FN) is often more than the cost of misclassifying a normal
as an attack (type II errors or FP). The issues involved in the measurement of
the cost factors have been studied by the computer risk analysis and security assessment communities. Denning in her book [50] remarks on the cost analysis
and the risk assessment in general that it is not an exact science because precise measurement of relevant factors is often impossible. Damage cost (DCost)
characterizes the amount of damage to a target resource by an attack when intrusion detection is unavailable or ineffective. Response cost (RCost) is the cost
of acting upon an alarm or log entry that indicates a potential intrusion [51]. Lee
et al. [24] have come up with an attack taxonomy illustrated in Table 2.3, which
categorizes intrusions that occur in the DARPA Intrusion Detection Evaluation
data set.
When attempting to look at the skewness within the minority class, it is observed that there is again a still higher misclassification cost for the minority
attack types. This has also been highlighted in the cost matrix of Table 2.4 published for the KDD IDS evaluation [52]. Hence, it is important to have IDSs
that minimize the overall misclassification cost by performing better on minority classes and again on minority attack types. Thus, by thinking in line with
the KDD evaluations done on IDSs, the goal was not only the improvement in
accuracy but also the reduction in misclassification cost. The misclassification
cost penalty was the highest for one of the most infrequent attack type and that
too for the type I error [52]. The total misclassification cost can be reduced if
the type I errors and the type II errors can be reduced. The advantage with most
of the rare events is that their signatures are unique which can be learned from
the given data. In some of the commonly encountered attacks, the attacker often
upload some malicious code onto the target machine and then login to crash the
machine to get root access, like casesen and sechole. Also, there are attacks that

Chapter 2

29

Table 2.3: Damage cost and response cost of different attack types [24]
SubDescription
Attack Type
Cost
category
(by results) (by techniques)
U2R
local
legitimate user trying to
DCost=100
acquire higher privileges
RCost=40
remote
from remote machine
DCost=100
acquiring root privileges
RCost=60
R2L
single
with a single event an illegal
DCost=50
user access is obtained
RCost=20
multiple
with multiple events an illegal DCost=50
user access is obtained
RCost=40
DoS
crashing
crashing a system by
DCost=30
certain framed packets
RCost=10
consumption
exhausting bandwidth or
DCost=30
system resources
RCost=15
Probe
simple
fast scan
DCost=02
RCost=05
stealth
slow scan
DCost=02
RCost=07

Predicted
Actual
Normal
Probe
DoS
U2R
R2L

Table 2.4: Cost matrix [52]


Normal Probe DoS U2R
0
1
2
3
4

1
0
1
2
2

2
2
0
2
2

2
2
2
0
2

R2L
2
2
2
2
0

Chapter 2

30

upload/download illegal copies of the software through ftp, like warezmaster


and warezclient. Port 80 attacks are malformed HTTP requests, very different
from the normal requests. Usually the connection-based detection has a better
result than the packet-based models for ports 21 and port 80.
2.3.4 Inability of IDS in optimum decision making due to data skewness
An IDS is said to make an optimum decision only when the cost of that particular prediction is less. For an IDS to make a decision attack, it is expected
that:
P r(attack) C11 + P r(normal) C10 P r(attack)C01 +P r(normal)C00
(2.3)
where:
C11 =Cost of predicting an attack as an attack,
C10 =Cost of predicting a normal as attack,
C01 =Cost of predicting an attack as normal,
C00 =cost of predicting a normal as normal.
Assuming the cost of T P and T N to be zero, since in both these cases the
correct decision has been made, equation 2.3 can be written as:
P r(normal) C10 P r(attack) C01

(2.4)

If the probability of the attack is given by p; (1 p)C10 pC01 . The optimum


threshold is decided by the data with a minimum a priori probability given by:
popt =

C10
C01 + C10

(2.5)

For the DARPA 1999 data set, the cost matrix in Table 2.3 shows that C01 > C10
and on substituting the values in equation 2.5, popt is obtained as 0.41. Hence,
the optimal decision from any IDS cannot be expected with data skewness of
prior probability of attack less than 0.41. This proves that data skewness results

Chapter 2

31

in inefficient decision making.

2.4 Attack-Detection Scenario in a Secured Environment


This section introduces a new formal technique for modeling the attacks that
occur on the network traffic and the countermeasure for its detection. With
the modeling of attack-detector relationship, the significance of the distributed
dynamics for persistence of attack-detector interactions is discussed. The modeling is based on deduction rules that model the capabilities of the attacker and
the detector. The attacker makes use of the vulnerabilities on the applications,
or in the systems or on the network and generates attacks that exploit those vulnerabilities. The security experts react to it by attempting to detect the attacks
and thereby to protect the network.
This section introduces the non-random attack-detection interactions that is normally expected on a secured network environment. The attack-detector modeling helps in enriching the understanding and to further the design and research
of IDSs. Also, the level of severity of the alert is understood with this modeling. This knowledge could then potentially be used by a security analyst to
understand and respond more effectively to future intrusions. The modeling
shows that as the intrusion detection performance improves with time, the slope
of F-score is positive and becomes steeper, which causes the effect of attacks
to disappear. However, it is not possible to get that type of a growth rate with
a single IDS. In order that the effect of attacks is not felt in the information
systems, it is necessary for the performance of IDS to rise steep and reach towards a Figure of Merit metric value of 1. Since none of the IDSs available
in literature can achieve this, it is felt necessary to make use of multiple IDSs,
benefiting from the advantages of each one of them. The modeling is realistic in
a network environment with multiple IDSs for protection, looking at the system
as a whole, instead of the individual responses to an attack. Thus, the modeling of the attack-detection scenario also partially establishes the limitations of
a single IDS in attack detection.

Chapter 2

32

2.4.1 Internet attacks and the countermeasure for detection


The IDS analyzes the network data and looks for patterns of attacks. Patterns
can be as simple as an attempt to access a specific port or as complex as sequences of operations directed at multiple hosts over an arbitrary period of time.
In any case, the threats of these attacks are quite real and cannot be overemphasized. Hence, IDSs can be an extremely valuable tool if implemented properly. The understanding of the practical limitations as well as the capabilities
of the technology and modeling the attack-detection scenario will enable one
to achieve the best results. IDSs in general detect trivial attacks and cause only
highly transient reductions in attack density. Fusion of highly sophisticated
IDSs, each of which is usually developed for a particular attack may deplete a
large fraction of attacks with an appreciable impact on the total trend in attacks.
Hence, there are lot of conflicting issues that need to be dealt with in the case
of an IDS. The analysis and the final decision making with an IDS in case of
highly sophisticated attacks bring to the fore the need for theoretically sound
basis for modeling the attack-detector interactions.
The attackers, security researchers and intrusion detection developers have continually played a game of point-counterpoint when it comes to IDS technology.
The basic problems in the field of intrusion detection are extremely challenging
even with the continuous emergence of methods and technology for securing
networks. The attackers continue to find ingenious ways to compromise remote
hosts and frequently make their tools publicly available. The increasing size
and complexity of the Internet along with the end host operating systems, make
it more and more prone to vulnerabilities. The lack of in-depth understanding
of the intrusion activities due to many privacy issues is yet another problem.
Because of these challenges, current best practices for the Internet security rely
on reports of new intrusions and security holes mainly from organizations like
CERT (Computer Emergency Response Team). The IDS developers continually counteract the attacks with patches and new releases. Due to the inherent
complexities involved in capturing, analyzing and understanding the network
traffic, there are several common techniques that can be used to exploit inherent
weaknesses in IDSs. Hence considering the three layers (attackers, security researchers and intrusion detection developers) in the attack-detector scenario, it

Chapter 2

33

is the IDS developers who take the middle layer and keep the ecosystem stable
by maintaining at least a minimum number of undetected attacks and detectors
at all time.
2.4.2 Testing the performance of Intrusion Detection Systems
The evaluation of IDS was initiated by the US Defense Advanced Research
Projects Agency (DARPA) in 1998 and has been the most comprehensive scientific study known for comparing the performance of different IDSs [48].
The MIT Lincoln Laboratory synthesized the network traffic with its data sets
DARPA 1998 and DARPA 1999 [42]. The performance of IDS can be evaluated
by choosing these publicly available data sets.
IDSs can be configured and tuned in a variety of ways in order to reduce the
false positive rate and to maximize the detection rate. However, there is a tradeoff between these two metrics for any system and hence these measurements
are used to form the Receiver Operating Characteristic (ROC) curves. An ROC
curve plots the detection rate against the false alarm rate. If the IDS raises
alarms very often on every suspicious packet, the false alarm rate as well as the
detection rate will increase. On the other hand, if the IDS raises alarms only after sufficient evidence is available i.e., lower false alarms, the detection rate will
suffer but with an increased alarm confidence. An IDS can be operated at any
given point on the ROC curve. The optimal operating point for an IDS, given
a particular network, is determined by factors like the cost of a false alarm, the
value of a correct detection and the prior probabilities of normal and attack traffic. The ROC curve conveys information of importance when analyzing and
comparing IDSs. Figure 2.3 is an ROC graph plotted with each point identifying the status of a particular IDS, developed from 1995 to 2004, in terms of the
detection rate and the false alarm rate. The crowded region to the top left as seen
in the graph can be identified as that due to the recent systems. Thus the environment in which most of the IDSs of recent times operate requires very low
false alarm rates (much lower than the 0.1% designated by DARPA) for useful
detection. The overall accuracy is not a good metric since the class of interest
is extremely rare. In cases with low base rate, the IDS has to be evaluated based
on its performance in terms of both recall as well as precision. Figure 2.4 shows

Chapter 2

34

ROC GRAPH SHOWING INDIVIDUAL IDS


1

DETECTION RATE

0.9

0.8

0.7

0.6

0.5
0

0.05

0.1
0.15
FALSE ALARM RATE

0.2

Figure 2.3: Receiver Operator Characteristic graph

PRECISIONRECALL CURVE FOR COMBINATION SENSOR


0.6
0.55
0.5

PRECISION

0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.2

0.3

0.4

0.5
0.6
RECALL

0.7

0.8

0.9

Figure 2.4: Trade-off between recall and precision

Chapter 2

35

the tradeoff between the two metrics precision and recall. The precision of an
IDS refers to the fraction of intrusions detected from the total alerts generated.
Similarly the recall refers to the IDS completeness; the more complete an IDS is
the fewer are the intrusions that remain undetected. The plots of precision and
completeness of the IDS over the years of IDS development can be understood
from Figure 2.5 with both the plots on a single graph, where the top plot refers
to the precision and the bottom plot refers to the recall.
detected intrusions
T rue P ositives
P recision(P ) = number of correctly
= T rue P ositives+F
number of alerts
alse P ositives
Recall(R) =

number of correctly detected intrusions


numberof actual intrusions

T rue P ositives
T rue P ositives+F alse N egatives

The behavior of the IDS can be generalized in terms of F-score, which scores a
balance between precision and recall as:
F -score =

1
;
. P1 +(1). R1

where takes a value between 0 and 1 and corresponds to the relative importance of precision over recall. Thus F-score is expected to take values between
0 and 1 depending on the relative importance of precision over recall. Considering equal importance to precision and recall; is assigned a value of 0.5
and F-score takes a value, which is the harmonic mean of precision and recall.
Higher value of F-score indicates that the IDS is performing better on recall as
well as precision. The plot of F-score over a period of time, as shown in Figure
2.6, gives an idea of the effectiveness of the IDSs developed over that period.
Hence a study was undertaken to highlight the performance of the various IDSs
over a period of time in terms of the F-score.
Usually it is expected that both technology as well as the performance of any
system improve with time. However, in the case of IDSs, this need not be the
case, since the attackers also gain a lot of expertise with time and the false
alarms can be increased so as to confuse the security analyst regarding the correct picture of the attack. Analysis of the Figure 2.6 can be carried out in terms
of the study of the growth in Internet insecurity from the incidents reported to
Computer Emergency Response Team/ Coordination Center (CERT/CC). The

Chapter 2

36

RECALL/PRECISION vs YEAR

FSCORE vs YEAR
1

0.9

0.95

0.8

0.9

FSCORE

RECALL/PRECISION

0.7

0.6

0.85

0.8

0.75
0.5

PRECISION
RECALL

0.4
1995

1996

1997

0.7
1998

1999 2000
YEAR

2001

2002

2003

2004
0.65
1995

1996

1997

1998

1999

2000

2001

2002

2003

2004

YEAR

Figure 2.5: Precision and recall of IDS over the


years

Figure 2.6: Plot of F-score over the years


GROWTH IN VULNERABILITIES REPORTED BY CERT

x 10 GROWTH IN INCIDENTS REPORTED TO CERT


4

14

4500
4000
3500

VULNERABILITIES

REPORTED INCIDENTS

12
10
8
6
4

2500
2000
1500
1000

2
0
1988

3000

500

1990

1992

1994

1996 1998
YEAR

2000

2002

2004

Figure 2.7: Growth in the incidents reported to


CERT

0
1995

1996

1997

1998

1999

2000

2001

2002

2003

2004

YEAR

Figure 2.8: Growth in the vulnerabilities reported by CERT

Figure 2.7 and Figure 2.8 show the growth rate of incidents reported to CERT
as well as the vulnerabilities reported by CERT respectively. It looks as if the
attackers understanding of security weaknesses has also increased over time,
as is the case of improvements in their attack tools.
The statistics released by CERT [53] show disturbing trends in incident and
vulnerability reporting since it has started increasing exponentially in the recent
times whereas the increase was more linear in the early times. In order to bring
down the exponential growth of attacks, it is required that sufficient number
of IDSs have to be deployed for detection along with better and advanced IDS

Chapter 2

37

techniques for better detection. The plot of F-score over the years is a clear
indication that the IDS techniques are not improving in a regular and steady
manner, but instead following a prey/predator relationship.
Just after the introduction of IDS, it was becoming more and more competitive
trying to detect the existing intrusions caused by attackers. This can be seen
by an initial steep rise in the F-score. With the advancement in the IDS technology, the attackers also try to acquire more expertise and will try to launch
more unidentified and confusing attacks which causes the F-score to come down
drastically. As a result it becomes a need for security experts to overcome these
attacks, where both the IDS researchers and implementors will strive for new
techniques and modify the available IDSs and also find patches for the known
vulnerabilities. This effort will again bring up the F-score and this process of
the attacker or the detector gaining more and more expertise with time continues as seen in the Figure 2.3. This trend is seen in the work of Shimeall and
Williams [54] and Browne et al. [55] where the incidents reported behaves in an
oscillatory manner. The practical data available in their work explains this type
of system behavior. The plot is reproduced in Figure 2.6, from the data observed
by Shimeall and Williams. The plot shows the number of incidents with serious effects on a reporting site as reported to the CERT/CC each week between
June 24, 2000 and Feb 17, 2001. The incident counts excludes the simple portscanning, unexploited vulnerabilities, false alarms, failed frauds and hoaxes.
Shimeall and Williams [54] have found a temporal, spatial and associative trend
in the relationship of attacks on the Internet. One type of such temporal trends
relates to the timing between an event that may trigger an incident (say, a new
product announcement or the release of a new intrusion tool) and the corresponding incidents. Hacking conventions like Defcon, Black Hat Briefings, etc.
appeared at or close to local peaks in the incident reporting. Taking into account the exploits published on website, the peaks in the exploit publication
rate were weakly correlated with the peaks in the incident reporting rate one to
three weeks later. The exploitation of vulnerabilities in reported security incidents is very common. The intruders have the habit of modifying their tactics
too quickly. It is also true that while substantive periods of time may elapse
between the discovery of a vulnerability and its wide spread exploitation, there

Chapter 2

38

INCIDENTS OVER A PERIOD OF TIME


100

NUMBER OF INCIDENTS

90
80
70
60
50
40
30
20
10
5

10
15
20
25
30
NUMBER OF WEEKS SINCE 24JUN2000

35

Figure 2.9: Incidents reported over a period of time

is a trend toward more rapid exploitation.


Browne et al. [55] have collected evidence to show that the availability of
patches will not reduce the severity of incidents after a time delay as normally
expected. They found that the incidents accumulate regardless of the existence
of corrections for exploited vulnerabilities. The incidents, however, accumulate
in a linear fashion which allow the statistical modeling of the incident accumulation rate. This modeling helps any organization in deciding on its IT investments, in knowing the severity of continuing incidents, and in testing vendor
supplied patches prior to deployment.

2.5 Modeling the attack-detector relationship


The detection of an attack scenario depends on the exact correspondence of an
attackers actions with those anticipated in the scenario. Since the total number
of possible actions is huge, the specification, by an expert and a priori, of the
set of all the scenarios becomes illusory. It implies that it is known, a priori,
all the scenarios to detect, but without any guarantee that those scenarios can be
generated [56].
Most IDSs are designed to learn from attacks and feed into the patches for
the OS, virus signatures, firewalls and hence thwart future attacks. There is a

Chapter 2

39

delay between the time attacks are detected and steps are taken to immunize
the systems to such attacks. After a time delay the attackers develop new types
of attack. This process continues and hence the variation of F-score with time
shows an oscillatory behavior in the case of IDSs.
On further analyzing the F-score plot, one of the possible models for estimating
the performance of IDS with time is a prey/predator relationship identified in
the attack/detector relationship. The attacks to a host/network increases, causing the detection rate to go high; as the detection becomes more and more competitive, the attacks will naturally come down since many of the attacks will no
longer be successful. As the attacks reduce significantly, the research and development taking place in the field of countermeasures like IDSs decreases. This
is because the intrusion detection analysts may be inclined to try new methods
only when they see their old methods to be inadequate, particularly when the
new methods require considerable knowledge and skill to be used effectively.
This again causes the attacks to increase considerably.
Considering the performance of a single IDS over the years, it will be seen
that the performance significantly deteriorate with time. It is difficult for an
IDS to keep up with the new attacks generated by the sophisticated attackers.
This infers that a constant update will be required at all times to maintain a
steady performance. This occurs only if the security researchers keep up with
the attackers steadily in their capabilities. However, this rarely happens with
full efficiency. Hence the performance of a single IDS deteriorates with time.
The performance of a single IDS at no stage has caused the effect of attacks to
totally disappear. Similar is the case with attacks also. A successful attack of
one time may not be successful at a later stage. This is mainly because of the
patches and the advanced security measures that the security developers introduce from time-to-time. Hence the attacks also show a decreasing performance
over a period of time.
To model this scenario, certain assumptions are made in the initial phase while
working with the model trying to define the attack-detector population dynamics and is given as follows:

Chapter 2

40

Attacks increase at a rate directly proportional to the attack density in the


absence of detectors. The knowledge of a successful attack in one domain
may motivate the attacker to generate another attack for a different target or
domain so far undisturbed by attacks. The new attacks get introduced with
an attack increase rate, in the absence of detectors that are effective over
them. (As discussed later, it is not reasonable to assume that the attacks
grow indefinitely in the absence of detectors.)
Detectors become ineffective at a rate proportional to the detector density.
(More detectors exist when there is a scope for more, due to many of them
already becoming ineffective. Normally, the motivation for a new detector
development as seen in most of the literature is the limitation of the existing ones. If a 100% efficient IDS exists, there is no need for more IDSs,
since that itself is sufficient. Hence the introduction of each new IDS is to
overcome the ineffectiveness (or limitations) of the earlier IDSs to newer
attacks.)
Detectors increase in number proportional to the rate at which they detect attacks. (The detectors learn from the attacks detected and will either replace it with an improved version or new detectors come up for the
changing attack scenario. To explain in detail, the gateway/firewall logs
constitute a good source for traces of novel attacks/probe techniques that
are not yet publicly known. Analysis of large amounts of such logs could
lead to the synthesis of signatures that could be incorporated into IDSs.
Alternatively, this log data can enhance the detection capacity of probabilistic systems. When a highly sophisticated anomaly-based IDS detects
an attack, the attack signature gets known and the signature database of the
misuse-based IDSs get updated. Thus new versions of misuse based IDS
are developed. This attack appears as a labeled attack in the training data
set of any pattern recognition IDS, which again improves the performance
of the IDS.)
Detectors non-randomly detect attacks and cause the attack to be ineffective or unsuccessful, at a rate proportional to the detector density. (In Appendix C we consider an initial assumption with no prior knowledge on
the probable attacks that happen on the Internet. Hence, it is reasonable

Chapter 2

41

to consider that the detectors search randomly for attacks, and more the
number of detectors, more is the chance of attacks becoming ineffective or
unsuccessful.)
Attack density cannot exceed some carrying capacity, which is imposed
by an ultimate resource limit on the attacks due to technical bottlenecks on
the system and network resources. The attacker knowledge also restricts
the sophistication in the attacks beyond a certain level.
The assumptions made for the pursuit-evasion processes are given as follows:
The attackers classified as script kiddies try to identify the security measures in terms of the density of the detectors in a particular domain and
try to attack sparsely monitored or secured networks. Thus it is a trend
that majority of the attacks move away from the domain which is intensely
watched and protected.
In the case of professional hackers, it is expected that they concentrate on
well protected and highly gainful domains. In such cases, sophisticated
attacks are seen to successfully concentrate more on critical domains intensely watched and protected. Their interest is to attack such domains for
various goals.
The trivial and average detectors find application by identifying the density
of attacks in a particular domain and then try to monitor sparsely monitored
resources and networks to fight against the existing attack scenario.
Highly sophisticated IDSs concentrate on intensely secured resources and
networks to still improve the security measures.
In this section, an attempt has been made to model the dynamic relationship
existing between the detectors and the attacks. This knowledge can be used to
enrich the design and development of IDSs. For each combination of a and d
there is an unstable attack-detector equilibrium, with the slightest disturbance
leading to expanding population oscillations. Depending on the initial state, the
system can evolve towards a simple steady state or a limit cycle, in which the
attack-detector populations oscillate periodically in time. The attack-detector
relationship may thus exhibit coupled oscillations.

Chapter 2

42

Usually detectors respond to attack distributions. A constant searching efficiency is more difficult to accept. Searching efficiency depends on the speed of
the traffic, attack density on a priori grounds and also on the detector density.
When a network intrusion happens, the sequence of attacks does not take place
in a totally random order. Intruders come with a set of tools trying to achieve
a specific goal. The selection of the cracking tools and the order of application depends heavily on the situation as well as the responses from the targeted
system. Typically there are multiple ways to invade a system. Nevertheless, it
usually requires several actions/tools to be applied in a particular logical order
to launch a sequence of effective attacks to achieve a particular goal. It is this
logical partial order that reveals the short and long-term goals of the invasion.
Real attacks are likely to be distributed in a patchwork of high and low densities, and the detectors can be expected to respond to the attacks by orienting
towards high density patches. It is natural that the detector searches for certain
traffic features for signs of attack. This provides a strong selective advantage for
detectors that result in a more focused search. Since the detector aggregation
has the effect of giving attacks a partial refuge at low densities, it is potentially important stabilizing factor in the dynamic interactions. Detectors usually
search for certain traffic features for the signs of attack. The normal traffic is
uncorrelated whereas the attack traffic is correlated. This provides a strong selective advantage for detectors that results in a more focused search.
The detections show a negative binomial pattern. The negative binomial distribution does not assume randomness but only proneness, i.e. certain traffic
feature(s) have a higher chance of disclosing attacks than the other features. If
the variance is larger than the mean, the level of proneness of the population is
high. If a Poisson distribution is used to model such a data, the model mean and
variance are equal. In that case, the observations are over dispersed with respect
to the Poisson model. Data are examined that relate to two-population interaction models based on the negative binomial parameter g. The negative binomial
pattern is specified by its mean , and a clumping parameter, g. The expected
proportion of attacks, not getting detected at all is given by P (0) = [1 + g ]g .

Chapter 2

43

The mathematical meaning of the parameter g may be appreciated by noting


that, for a negative binomial with mean , the coefficient of variation (CV) is
given by:
CV 2 =

variance
(mean)2

+ g1 .

In the limit, as g tends to infinity, the random or Poisson distribution is recovered, with the variance equal to the mean. Thus with large g values, say g > 8,
the Poisson randomness is assumed. If g < 1, CV gets larger, the detectors are
strongly aggregated in patches of high attack density.
Let At and Dt denote the number of attacks and detectors at any time t. Let
d denote the detector efficiency and a denote the attack increase rate ignoring
detection. The attack-detector model giving the attacks and the detectors in the
successive generations, t + 1 and t can be expressed as:
At+1 = aAt (1 + dDt /g)g
and
Dt+1 = cAt [1 (1 + (dDt /g)g ],
with g being the negative binomial dispersion parameter, which can be interpreted as a coefficient of variance of detector density among patches and dDt
being the mean detector density. The dynamics of the model show diverging
oscillations if g > 1; for g < 1, the system is always stable, at first showing damped oscillations and then approaches the equilibrium state. Figure 2.10
shows the attack-detector relationship using the negative binomial distribution
model. Besides the densities of the attacks and the detectors; namely At and
Dt , the parameters of the system are non-negative values. Figure 2.10 shows
the attack and detector distributions with typical values and initial conditions
for the coexistence of the attacks and the detectors as: a = 0.25(from Figure
2.7), A1 = 20000, d = 0.9 (from Figure 2.6), D1 = 6, g = 0.7 and t varying
from 2000 to 2005. The attacks as well as detectors are seen to increase with
time and clumping of attacks was significantly reduced upon detection. The

Chapter 2

44

x 10

ATTACKDETECTOR RELATIONSHIP

6
attacks
detectors

A(t) and D(t)

5
4
3
2
1
0
1

1.5

2.5

3
Time

3.5

4.5

Figure 2.10: D(1)=60, d=0.7

model is seen to be in total agreement with the attack incident reports published
by CERT for the same period of time. In order to bring down the attacks, it is
necessary to deploy more detectors or/and improve the detection efficiency of
the detectors. With the detector efficiency maintained steady, the number of de5

x 10

ATTACKDETECTOR RELATIONSHIP

3.5

1.8

ATTACKDETECTOR RELATIONSHIP

1.6

attacks
detectors

attacks
detectors

2.5
A(t) and D(t)

1.4
A(t) and D(t)

x 10

1.2
1
0.8
0.6

2
1.5
1

0.4
0.5

0.2
0
2000

2001

2002

2003

2004

Time

Figure 2.11: D(1)=40, d=0.7

2005

0
2000

2001

2002

2003

2004

Time

Figure 2.12: D(1)=80, d=0.7

tectors deployed is first decreased and then increased to demonstrate the effect.
Figures 2.11 and 2.12 show the attack-detector interactions with less number
of initial deployment of detectors and more number of initial deployment of
detectors respectively. Also, with the deployed detectors remaining unchanged
Figures 2.13 and 2.14 show the effect of decreasing the detector efficiency and

2005

Chapter 2

45

increasing the detector efficiency respectively.


5

x 10

ATTACKDETECTOR RELATIONSHIP

x 10

ATTACKDETECTOR RELATIONSHIP

4.5

attacks
detectors

attacks
detectors

A(t) and D(t)

A(t) and D(t)

3.5
5
4
3

3
2.5
2
1.5

2
1
1
0
2000

0.5
2001

2002

2003

2004

2005

0
2000

2001

Time

2002

2003

2004

Time

Figure 2.13: D(1)=60, d=0.5

Figure 2.14: D(1)=60, d=1.0

2.5.1 Detectors learning from the detected attacks


The detectors usually learn from the attacks and feed into the patches for the OS,
virus signatures, firewalls and hence thwart future attacks. When a new attack is
detected by an IDS, the IDS analyzes the attack and finds out the possible variations of this attack. This contributes to new research and further modifications
to the existing IDSs or totally new IDSs come up to encounter the variations of
the attack detected. It is a reasonable outcome of the analysis that happen when
an IDS detects an attack. The modified attack-detector model is given as:
At+1 = aAt (1 + dDt /g)g

(2.6)

Dt+1 = cAt [1 (1 + (dDt /g))g ]

(2.7)

and
where c is the factor that incorporates the effect of detectors increasing in number depending on the number of attacks it detects. A special case of non-random
search is where some attacks are completely free from detection within a certain
time span (time refuge). In such a case, the detectors may aggregate in patches
of high attack density. This tendency of detectors aggregating in patches of high
attack density gives the attacks a refuge at low densities; a powerful stabilizing
force in the interaction. This aggregation is most extreme at low values of g.

2005

Chapter 2

46

2.5.2 Detector correlation


The attack-detector model can be modified to incorporate detector density effects, through the correlation coefficient. As the detector density increases, the
density-dependent effect could stabilize the attack-detector model. The detector
searching efficiency, d is the probability of detecting a particular attack in the
life time of a detector and it depends on the density of the detectors. As the detector density increases, the detector searching efficiency will reduce and hence
d = QDtm or log(d) = log(Q) mlog(Dt ). Q = d if Dt = 1 and m is the correlation coefficient. Q is a factor that contributes to the determination of level,
but has no effect on stability. The individual searching efficiency, d reduces
when there are more number of detectors. Again, when Dt increases, correlation coefficient increases and hence d reduces again. The detector searching
efficiency versus the detector density if plotted on a log-log scale will be linearly decreasing with the detector correlation m as the slope of the plot. Thus
to introduce the effect of correlation coefficient , the attack-detector model is
modified as:
At+1 = aAt (1 + QDt1m /g)g
(2.8)
and
Dt+1 = cAt [1 (1 + (QDt1m /g)g ]

(2.9)

This modification of the attack-detector model can completely alter the outcome
of an attack-detector model. Instead of always being unstable, this new model is
stable over a wide range of conditions depending on the attack growth rate (a),
and the amount of correlation (m). With a being very large, even small values
of m (say m = 0.3) will contribute markedly to the stability and may even give
complete stability. Apart from contributing to the stability of the attack-detector
interactions, detector correlation can also account for the frequent coexistence
of several detector varieties on one attack. The value of m increases as the
detector density increases. The detector searching efficiency tends to become
independent of the detector density as the change in correlation becomes very
small. The detector curve as shown in Figure 2.11 is seen to come down with a
decrease in D(1), d and g. The detector picks up with a decrease in the value of
correlation coefficient m.

Chapter 2

47

Thus, the introduction of the interference factor into the model contributes to
the stability of the model. As m increases, the stability increases; this is due
to the detectors being distributed in a more aggregative manner to counter a
sophisticated attack - an effect termed as pseudo-interference. The heterogeneous attack distributions, coupled with detector aggregation at high attack
density, must be the main stabilizing mechanism. The clumping of detections
can be formally described as pseudo interference, same as the stabilizing dynamical effects of mutual interference among detectors. Detector correlation
can also account for the frequent coexistence of several detector varieties on
one attack.
The correlation coefficient and the aggregative behavior are, in reality, closely
related. The common effect of correlation coefficient is to reduce the required
searching time in direct proportion to the frequency of encounters. Searching efficiency d declines as detector density increases. The modification has a
marked effect on stability. Instead of always being unstable, there can now be a
stable equilibrium, given suitable values of the interference constant m and the
attack growth rate a. These notions can be applied to the present model to get a
pseudo-interference coefficient of magnitude
0

m =

g(1a(1/g) )
.
ln(a)

That is, the overdispersion of detector attacks has much the same dynamical
consequences as would be produced by pure mutual interference among detec0
tors in a homogeneous world. The pseudo-interference coefficient m corresponds to a stable equilibrium if, and only if, g < 1. A special case of nonrandom search is where some attacks are completely free from detection within
a certain time span (time refuge). In such a case, the detectors may aggregate
in patches of high attack density. This tendency of detectors to aggregate in
patches of high attack density is a powerful stabilizing force in the interaction.
This aggregation is most extreme at low values of g. The density dependent
attack growth is also a stabilizing factor. Since this behavior gives the attacks
a refuge at low densities, it is a potentially important factor in stabilizing the
attack-detector interaction. In short, there are both empirical and theoretical

Chapter 2

48

18

x 10

Attacks over years 1999 to 2004

16
d=0.25

Number of Attacks

14
12
10
8

d=0.5

6
4

d=0.75

2
0
1999

2000

2001

2002

2003

2004

Time in years

Figure 2.15: Effect of detector efficiency on the attack growth rate

reasons for fastening on the negative binomial to approximate the distribution


of detectors in a patchy environment.

2.6 Validation of the model using real-world data


Figure 2.15 shows the validation of the attack-detector relationship using the
real world data. The middle plot for d = 0.5 exactly matches with the increase
in attacks as reported by CERT and given in Figure 2.7. The detector efficiency
is chosen to be 0.5 as it approximately reflects the performance of the IDS
available in the commercial market. In order to identify the effect of detectors
on the attack, we have used a higher average detection efficiency as well as a
lower average detection efficiency. Figure 2.15 shows the attack and detector
distributions for three different values of d (d = 0.25, 0.5 and 0.75), with typical
values and initial conditions for the coexistence of the attacks and the detectors
as: a = 0.25 (from Figure 2.7), A1 = 10000, in year 1999, D1 = 60, g = 0.7
and t varying from year 1999 to 2004 for which the data is available in CERT
site. The number of detectors have been chosen to be a fraction of the servers
existing at a particular time. The attacks increase rate is observed to be mainly
dependent on the efficiency of the detectors deployed for the protection of the
servers, which are the main target for the attackers. The model is seen to be
in total agreement with the attack incident reports published by CERT for the
same period of time. It is understood from this modeling that in order to bring

Chapter 2

49

down the attacks, it is necessary to deploy detectors of higher detection. Hence


enhancing the performance of the IDSs is a step towards making the cyber space
safer.
2.6.1 Discussion on the modeling
The increased frequency, sophistication and strength of Internet attacks have led
to the proposal of numerous detectors. However, the problem is hardly tackled,
let alone solved. There are many factors that hinder the advance of defense
research.
It is necessary to thoroughly understand the attacks in order to design
imaginative solutions for them. It is generally believed that publicly reporting attacks damages the reputation of the victim network. Attacks are
therefore reported only to government organizations under obligation to
keep the details secret. There are efforts from researchers in the right direction, but they still reveal only a small part of the total picture.
There is currently no benchmark suite of attack scenarios or established
evaluation methodology that would enable comparison between detectors.
In addition to known threats, there are attacks seen rarely in the wild and
mostly stealthy, and some novel attack methods. As usual suspects get
handled by detectors, these alien attacks will gain popularity. Understanding these threats, implementing them in a test bed environment, and using
them to test detectors will help researchers keep one step ahead of the attackers.
This chapter does not propose or advocate any specific defense mechanism.
Even though some sections might point out the different possibilities in the field
of security, our purpose is not to criticize, but to draw attention to these defense
problems so that they might be solved.

2.7 Summary
Intrusion detection systems are becoming an indispensable and integral component of any comprehensive enterprize security system. The reason being that

Chapter 2

50

the IDS has the potential to alleviate many of the problems facing current network security.
A review of the issues connected with single IDSs has offered a critical analysis
to understand the need for further work in this field of research. This chapter
is integral to the whole thesis supporting the correctness of the track, and reinforcing that there is a contribution to make in this field. It demonstrates that the
existing work in the field of intrusion detection has been understood critically
along with the most important issues and their relevance to this work, its controversies, and its omissions.
In this chapter, issues connected with single IDSs are discussed. The problems associated with data skewness is exemplified. The need for improving the
performance of individual IDSs using advanced techniques is established. This
chapter makes certain inferences about the intrusion detection environment. The
normal traffic in any environment comprises of a majority of non-attacks and a
minority of attacks. The cost of missing an attack is higher than the cost of false
positives. Within the attack traffic, some attacks are even rarer. Rarer attacks
may also cause significant damage. The IDSs are normally characterized by
the overall accuracy. The imbalance in data degrades the prediction accuracy.
Though an IDS can give very high overall accuracy, its performance for the
class of rarer attacks has been found to be less than acceptable. Hence, it is not
appropriate to evaluate the IDSs using predictive accuracy when the data is imbalanced and/or the cost of different errors vary markedly. The data skewness
in the network traffic demands for an extremely low false positive rate of the
order of the prior probability of attack for an acceptable value of the Bayesian
attack detection rate.
The trends of F-score and precision/recall for IDSs over a period of 10 years
is analyzed and a model proposed to characterize the attack-detector behaviors
and formalize the attack-detector interactions. The modeling is based on deduction rules that are used to model the capabilities of the attacker and the detector.
The proposed model is validated with the empirical values. This modeling helps
in enriching the understanding and to further the design and research in IDSs.

Chapter 2

51

Also, the level of severity in a network environment due to the exponentially


growing Internet attacks is understood. This knowledge could then potentially
be used by a security analyst to understand and respond more effectively to
future intrusions. Usually, deploying more detectors and also improving the
detector performance is seen to bring down the attacks as understood with this
modeling.
The modeling also shows that as the intrusion detection performance improves
with time, the slope of F-score is positive and becomes steeper, which causes
the effect of attacks to disappear. However, it is not possible to get that type of
a growth rate with a single IDS. In order that the effect of attacks is not felt in
the information systems, it is necessary for the IDS performance to rise steep
and reach towards an F-score value of 1. Since none of the IDSs available in
literature can achieve this, it is felt necessary to make use of multiple IDSs,
benefiting from the advantages of each one of them. The modeling is realistic
in an environment of network with multiple IDSs for protection, looking at the
system as a whole, instead of the individual responses to an attack. Thus, the
modeling of the attack-detection scenario also partially establishes the limitations of a single IDS in attack detection.
The attack field contains a multitude of attack and detection mechanisms, which
obscures a global view of the attack problem. This model is an attempt to cut
through the obscurity and structure the knowledge in this field. The model is
intended to help the security research community to think about the threats we
face and the possible countermeasures. One benefit we foresee from this study
is that of fostering easier cooperation among researchers. Attackers cooperate to exchange attack code and information about vulnerable machines, and
to organize their agents into coordinated networks of immense power and survivability. The Internet community must be equally cooperative within itself to
counter the threat. They should look at how different mechanisms are likely to
work in concert, and identify areas of remaining weaknesses that require additional work. There is a pressing need for the research community to develop
common metrics and benchmarks for detector evaluation. It is clear that under
the pressures of a highly competitive global research environment, the field of

Chapter 2

52

IDS will re-mould rapidly and overcome many of the existing limitations and
hurdles. As the field grows, the attack-detection scenario will also be refined.
For more proactive defense, it is essential to understand the network defensive
and offensive strategies. With the attack-detector scenario better understood,
the future evolution of attacks can be estimated in a certain way thereby aiding better attack detection and in turn reduced false negatives. This knowledge
helps the security community to become proactive rather than reactive with respect to incident response.

Chapter 3
Evaluation and Test-bed of Intrusion
Detection Systems
The strongest arguments prove nothing so long as the conclusions are not verified by experience. Experimental Science is the queen of sciences and the goal
of all speculation.
Roger Bacon

3.1 Introduction
The poor understanding of the performance of IDSs available in literature may
be in-part caused by the shortage of an effective, unbiased evaluation and testing methodology that is both scientifically rigorous and technically feasible.
The choice of IDSs for a particular environment is a general problem, more
concisely stated as the intrusion detection evaluation problem, and its solution
usually depends on several factors. The most basic of these factors are the false
alarm rate and the detection rate, and their tradeoff can be intuitively analyzed
with the help of the Receiver Operating Characteristic (ROC) curve [43], [57],
[12], [58], [59]. However, as pointed out by the earlier investigators [49] [60]
[61], the information provided by the detection rate and the false alarm rate
alone might not be enough to provide a good evaluation of the performance of
an IDS. Hence, the evaluation metrics need to consider the environment the IDS
is going to operate in, such as the maintenance costs and the hostility of the operating environment (the likelihood of an attack). In an effort to provide such an
evaluation method, several performance metrics such as Bayesian detection rate
53

Chapter 3

54

[49], expected cost [60], sensitivity [62] and intrusion detection capability [63],
have been proposed in literature. These metrics usually assume the knowledge
of some uncertain parameters like the likelihood of an attack, or the costs of
false alarms and missed detections. Yet despite the fact that each of these performance metrics makes their own contribution to the analysis of IDSs, these
are rarely applied in the literature when proposing a new IDS.
In Appendix D, we review the method of evaluation and also describe the evaluation methodology used in this thesis. Appendix D introduces some new metrics
for IDS evaluation. Classification accuracy in IDSs deals with such fundamental problems as how to compare two or more IDSs, how to evaluate the performance of an IDS, and how to determine the best configuration of an IDS. In an
effort to analyze and solve these related problems, evaluation metrics such as
Area Under ROC Curve, precision, recall, and F-score, have been introduced.
Additionally, we introduce the P-test [36], which is more of an intuitive way
of comparing two IDSs and also more relevant to intrusion detection evaluation
problem. We also introduce a formal framework for reasoning about the performance of an IDS and the proposed metrics against adaptive adversaries.
We provide simulations and experimental results with these metrics using the
real-world traffic data as well as the DARPA 1999 data set in order to illustrate
the benefits of the algorithms proposed in the chapters five to eight of this thesis. The main reason for using the DARPA data set is that we need relevant data
that can easily be shared with other researchers, allowing them to duplicate and
improve our results. The common practice in intrusion detection to claim good
performance with real-time traffic makes it difficult to verify and improve previous research results, as the traffic is never quantified or released for privacy
concerns. We use both the DARPA data sets and the real-world traffic data.
Doing so and being able to compare and contrast the results should help alleviate most of the criticism against work based solely on the DARPA data, and
still allow work to be directly compared. Being the only comprehensive data
set that can be shared for IDS evaluation it becomes reasonable to analyze the
shortcomings and also its importance and strengths for such a critical evaluation. Since this data set was made publicly available nine years back, the IDSs

Chapter 3

55

that were developed after this time were taken for analyzing whether the data
set has become obsolete. The analysis shows that the inability of the IDSs far
outweigh the limitations of the data set. This section is supposed to give enough
support to IDS researchers using the DARPA data set in their evaluations. This
chapter also highlights the inability of single IDSs to make a complete coverage
of the entire attack domain. This clearly establishes the need for multiple and
heterogeneous IDSs for a wide coverage of the present-day attacks.

3.2 Data set


The MIT Lincoln Laboratory under DARPA and AFRL sponsorship, has collected and distributed the first standard corpora for evaluation of computer network intrusion detection systems [48]. This DARPA evaluation data set [42]
is used for the purpose of training as well as testing intrusion detectors. These
evaluations contributed significantly to the intrusion detection research by providing direction for research efforts and an objective calibration of the technical
state-of-the-art. They are of interest to all researchers working on the general
problem of workstation and network intrusion detection [60].
In the DARPA IDS evaluation data set, all the network traffic including the
entire payload of each packet was recorded in tcpdump format and provided for
evaluation. Taking the DARPA 1999 data set for further discussion, the data set
consists of weeks one, two and three of training data and weeks four and five of
test data. In training data, the weeks one and three consist of normal traffic and
week two consists of labeled attacks.
The DARPA 1999 test data consisted of 190 instances of 57 attacks which included 37 Probes, 63 DoS attacks, 53 R2L attacks, 37 U2R/Data attacks with
details on attack types given in Table 3.1. Even with its serious drawbacks, as
can be seen in [64] and [65], and the potential questions about the adequacy
of the data for its intended purpose, still there is no good data set other than
DARPA data set for IDS evaluation. The DARPA data has certainly been useful in the development of the system proposed in this thesis. The details of the

Chapter 3

56

Table 3.1: Attacks present in DARPA 1999 data set


Attack Class
Attack Type
Probe
portsweep, ipsweep, lsdomain, ntinfoscan,
mscan, illegal-sniffer, queso, satan
DoS
apache2, smurf, neptune, dosnuke, land,
pod, back, teardrop, tcpreset, syslogd,
crashiis, arppoison, mailbomb, selfping,
processtable, udpstorm, warezclient
R2L
dict, netcat, sendmail,imap, ncftp,
xlock, xsnoop, sshtrojan, framespoof,
ppmacro, guest, netbus, snmpget,
ftpwrite, httptunnel, phf, named
U2R
sechole, xterm, eject, ntfsdos, nukepw,
secret, perl, ps, yaga, fdformat, ppmacro
ffbconfig, casesen, loadmodule, sqlattack

usefulness of the DARPA data set is included in section 3.3. Some of the publicly available data sets [66] have been investigated, but they are not entirely
suitable for the analysis mainly due to the absence of the application payload.
Two anomaly detectors, PHAD [67] and ALAD [68], which give extremely low
false alarm rate of the order of 0.00002 and a third IDS which is the popularly
used open source IDS, Snort [69] and the fourth IDS being the commercially
accepted CISCO IDS 4215 [70] are considered in this study.
To improve the performance of the IDSs PHAD and ALAD, more data has
been incorporated in their training. Normal data was collected from a secured
University internal network and this has been randomly divided into two parts.
PHAD is trained on week three of the data set and one portion of the internal
network traffic data, and ALAD is trained on week one of the data set and the
other portion of the internal network traffic data. Hence the two anomaly-based
IDSs PHAD and ALAD are trained on disjoint sets of training data. The correlation among the classifiers is lowered due to factors like more training data
and that too disjoint and also more training time.

Chapter 3

57

3.3 Usefulness of DARPA data set for IDS evaluation


With the increase in the network traffic and the introduction of new applications and attacks over time, continuous improvement is required to make the
IDS evaluation data set a key element to keep it valuable for researchers. The
user behavior also shows great unpredictability and changes over time. Modeling the network traffic is an immensely challenging undertaking because of
the complexity and intricacy of human behaviors. The DARPA data set models the synthetic traffic from a session level. Evaluating the proposed IDS with
DARPA 1999 data set may not be representative of the performance with more
recent attacks or with other attacks against different types of machines, routers,
firewalls or other network infrastructure. All these reasons have caused a lot of
criticisms against this IDS evaluation data set.
A paper that discusses in similar lines as presented in this section is by Brugger
[71] . He has analyzed the DARPA 1998 data set using Snort and have concluded that any sufficiently advanced IDS should be able to achieve good false
positive detection performance on the DARPA IDS evaluation data set.
3.3.1 Criticisms against the DARPA IDS evaluation data set
The main criticism against the DARPA IDS evaluation data set is that the test
bed traffic generation software is not publicly available, and hence it is not possible to determine how accurate the background traffic inserted into the evaluation is. Also the evaluation criteria does not account for system resources used,
ease of use, or what type of system it is [72].
The other popular critiques to the DARPA IDS evaluation data set are by Mc
Hugh [64] and by Mahoney and Chan [65]. Mc Hugh [64] criticizes the procedures used in building the data set and in performing the evaluation. In his
critique of DARPA evaluation, Mc Hugh questioned a number of their results,
starting from usage of synthetic simulated data for the background and using attacks implemented via scripts and programs collected from a variety of sources.
In addition, the background data does not contain the background noise like
the packet storms, strange packets, etc. Hence, the models used to generate

Chapter 3

58

background traffic were too simple with the DARPA data set, and if real background traffic was used, the false positive rate would be much higher. Mahoney
and Chan [65] comments on the irregularities in the data, like the obvious difference in the TTL value for the attacks as well as the normal packets, which
makes even a trivial detector showing appreciable detection rate. They have
conducted an evaluation of anomaly-based network IDS with an enhanced version of the DARPA data set created by injecting benign traffic from a single host.
All the above criticisms have been well researched comments and these works
have made it clear that there remain several issues unsolved in design and modeling of the resultant data set. However, we cannot agree to the comment made
by Pickering [72] that benchmarking, testing and evaluating with the DARPA
data set is useless unless serious breakthroughs are made in machine learning.
The DARPA data set has the drawback that it was not recorded on a network
connected to the Internet. Internet traffic usually contains a fairly large amount
of anomalous traffic that is not caused by any malicious behavior [73]. Hence
the DARPA data set being recorded in a network isolated from the Internet
might not include these types of anomalies. The unsolved problems clearly remain. However, in the lack of better benchmarks, vast amount of the research
is based on the experiments performed on the DARPA data set. The general
thought that even with all the criticisms, the DARPA data set is still rigorously
used by the research community for evaluation of IDSs bring to the fore the
motivation for this section.
3.3.2 Facts in support of the DARPA IDS evaluation data set
A data set that is seen to be used for IDS evaluation other than the DARPA data
set is the Defcon Capture The Flag (CTF) data set. Defcon is an yearly hacker
competition and convention. However, this data set has several properties that
makes it very different from the real-world network traffic. The differences include an extremely high volume of attack traffic, the absence of background
traffic, and the availability of a very small number of IP addresses. The nonavailability of any other data set that includes the complete network traffic was
probably the initial reason to make use of the DARPA data set for evaluation by
a researcher in IDS. Also, the experience while trying to work with the real data

Chapter 3

59

traffic was not good; the main reason being the lack of the information regarding the status of the traffic. Even with intense analysis the prediction can never
be 100 percent accurate because of the stealthiness and sophistication of the attacks and the unpredictability of the non-malicious user. It involves high cost
if an attempt is made to properly label the network connections with raw data.
Hence most of the research work that used the real network data were not able
to report the detection rate or other evaluation metrics for a comparison purpose.
Mahoney and Chan [65] comment that if an advanced IDS could not perform
well on the DARPA data set, it could also not perform acceptably on realistic
data. Hence before thinking of junking the DARPA data set, it is wise to see
whether the state-of-the-art IDSs perform well, in the sense that it detects all the
attacks of the DARPA data set.
With the general impression that the data set used was old and hence not appropriate for IDS evaluation, the poor performance of some of the evaluated
IDSs were expected and hence acceptable. Assuming that the data set is not
generalized and hence counting that as a drawback of the data set, fine tuning
of the IDSs to the data set was considered. Snort has a main configuration file
that allows one to add and remove preprocessor requirements as well as the included rules files. The limit of fragmentation to be taken into notice and the
requirement of packet reconstruction are typically specified in this file. Snort
can be customized to perform better in certain situations using the DARPA data
set by improving the Snort rule-set. Thus, we tried to manipulate the benchmark
system.
3.3.3 Results and discussion
Test setup

The test setup for the experimental evaluation consisted of three Pentium machines with Linux Operating System. The experiments were conducted with the
simulated IDSs Snort version 2.3.4, PHAD, and ALAD and also the Cisco IDS
4215, distributed across a single subnet observing the same domain. This collection of heterogeneous IDSs was to examine how the different IDSs perform
in detecting the attacks of the DARPA 1999 data set.

Chapter 3

60

Experimental evaluation

The IDS Snort was evaluated with the DARPA 1999 data set and the results are
shown in Table 3.2. It can be noted in table 3.2 that some of the attacks for a
certain attack type may get detected whereas some other attacks from the same
attack type may not get detected. Hence some of the attack types appears in
both rows of Table 3.2. The performance of PHAD and ALAD on the same
Table 3.2: Attacks detected by Snort from the DARPA 1999 data set
Attacks detected
teardrop, dosnuke, portsweep, sshtrojan, sechole
by Snort
ftpwrite, yaga, phf, netcat, land, satan,
nc-setup, imap,nc-breakin, ncftp, guessftp,
tcpreset, secret, selfping, dosnuke, crashiis,
sqlattack, ntinfoscan, neptune, httptunnel, udpstorm,
ls, xlock, xsnoop, named, loadmodule, ppmacro
Attacks not detected
ps, portsweep, crashiis, sendmail, netcat,
by Snort
nfsdos, sshtrojan, ftpwrite, back,
guesspop, xsnoop, pod, snmpget, eject, dict,
guesstelnet, syslogd, guestftp, netbus, crashiis,
secret, smurf, httptunnel, loadmod, ps, ntfsdos,
arppoison, sqlattack, sechole, mailbomb, secret,
queso, processtable, sqlattack, fdformat, apache2, warez,
arppoison ffbconfig, named, casesen, land, xterm1

data set are given in Tables 3.3 and 3.4 respectively. The duplication in both the
Table 3.3: Attacks detected by PHAD from the DARPA 1999 data set
Attacks detected
fdformat, teardrop, dosnuke, portsweep, phf,
by PHAD
land, satan, neptune
Attacks not detected
loadmodule, anypw, casesen, ffbconfig, eject,
by PHAD
ntfsdos, perl, ps, sechole, sqlattack, sendmail,
nfsdos, sshtrojan, xlock, guesspop, xsnoop, snmpget,
guesstelnet, guestftp, netbus, crashiis,
secret, smurf, httptunnel, loadmod, arppoison, land,
mailbomb, processtable, ppmacro, fdformat,
warez, arppoison, named

rows as appeared in Table 3.2 is avoided in the rest of the tables to the maximum
extent possible by making the entry depending on the majority of detections or
missings from a certain attack type. The attacks detected by the Cisco 4215 IDS
is given in the Table 3.5.

Chapter 3

61

Table 3.4: Attacks detected by ALAD from the DARPA 1999 data set
Attacks detected
casesen, eject, fdformat, ffbconfig, sechole, xterm,
by ALAD
yaga, phf, ncftp, guessftp, crashiis, ps
Attacks not detected
loadmodule, anypw, nfsdos, perl, sqlattack, sendmail,
by ALAD
sshtrojan, xlock, guesspop, xsnoop, snmpget,
netbus, secret, smurf, httptunnel, loadmodule,
arppoison, sqlattack, sechole, land, mailbomb,
processtable, sqlattack, ppmacro, warez, arppoison, named
Table 3.5: Attacks detected by Cisco IDS from the DARPA 1999 data set
Attacks detected
portsweep, land, crashiis, ppmacro, mailbomb, netbus,
by Cisco IDS
sechole, sshtrojan, imap, phf
Attacks not detected
teardrop, dosnuke, ps, ftpwrite, yaga, sendmail,
by Cisco IDS
nfsdos, xlock, guesspop, xsnoop, snmpget,
guesstelnet, guestftp, secret, smurf, httptunnel,
loadmod, ps, ntfsdos, arppoison, sqlattack,
processtable, sqlattack, fdformat, warez,
arppoison, named, satan, nc-setup, nc-breakin, ncftp
Discussion

The experimental evaluation gave rise to certain questions:


Since the DARPA attack signatures were known as the most popular data
set for the IDSs at the time of their design and development, why is it that
a 100% detection becomes impossible?
Why is it not possible to have zero false alarms with a signature-based IDS
like Snort?
The anomaly detectors are also inferior in attack detection and high with
false alarms even when they were thoroughly trained on the normal data
set. Why is it that none of the learning algorithms that learn from the
normal traffic behavior learn successfully when there is no shortage for
the normal traffic data from the data set or even otherwise?
The Snort is designed as a network IDS; extremely good at detecting distributed
port scans and also fragmented attacks which hide malicious packets by fragmentation. The pre-processor of Snort is highly capable of defragmenting the
packets. Matching the alert produced by Snort with the packets in the data set
by means of timestamp might sometimes cause missing. This is mainly because

Chapter 3

62

of the time gap within 10seconds between the two. However, the Table 3.2
shows that the DARPA 1999 data set does in fact model attacks that Snort has
trouble detecting or the Snorts signature database is still not updated with those
signatures. Isnt it reasonable to think that the attacks for which the signatures
are not available with an IDS like Snort, which has its rule set regularly updated, are the ones that still exist undetected? The attackers also are vigilant of
the detection trend and hence cant we think that some of the latest attacks are
variants of those undetected attacks since those attacks were successful in terms
of detection avoidance. Or cant we say that if an IDS is capable of detecting
those attacks in addition to the ones detected by the Snort, it is a better performing IDS than Snort? Or is it reasonable to think of changing the testbed when
the IDS is suboptimum in performance on that test bed?
In a study made by Sommers et al. [74], after comparing the two IDSs Snort
and Bro, they comment that Snorts drop rates seem to degrade less intensely
with volume for the DARPA data set. They have also concluded in the paper
that Snorts signature set has been tuned to detect DARPA attacks. Even then,
if we cannot detect all the attacks of this nine year old data set, it clearly shows
the inability of reproducing the signatures of all the available attacks in the data
set of a signature-based IDS. This shows the inability of the IDSs rather than
the deficiency of the data set.
Preprocessing of the DARPA data set is required before applying to any machine learning algorithm. With the anomaly based IDSs, PHAD and ALAD, we
tried to train them by mixing the normal data from an isolated network along
with the week 1 and week 3 respectively of the training data set. Even then,
the algorithms produce less than 50% detection and around 100 false alarms for
the entire DARPA test data set. Again, there are enough reasons to think of the
failure on the part of the learning algorithms.
The usual reasoning for the poor performance of the anomaly detectors is that
the training and the test data are not correlated; but that happens in real-world
network traffic as well. The normal user behavior changes so drastically from
what the algorithm has been trained with, and hence we expect the machine

Chapter 3

63

learning algorithms to be extremely sophisticated and learn the changing behavior. Hence, the uncorrelated test bed is good for evaluating the performance
of learning algorithms. Then again, it is the failure on the part of the learning algorithms rather than the data set if the anomaly detectors are performing
poorly. Hence it can be concluded that the DARPA data set, even though old,
still carries a lot of novelty and sophistication in attacks.
The Cisco IDS is a network-based intrusion detection system that uses a signature database to trigger intrusion alarms. As any other network IDS, the Cisco
IDS also has only a local view. This feature-gap is pointed out indirectly in
[75]; ...does not operate properly in an asymmetrically routed environment.
Thus, the main reasons for the poor performance of the IDSs with the DARPA
1999 IDS evaluation data set are the following:
The training and test data sets are not correlated for R2L and U2R attacks
and hence most of the pattern recognition and machine learning algorithms
except for the anomaly detectors that learn only from the normal data, will
perform badly while detecting the R2L and the U2R attacks.
The normal traffic in real networks and also in the data set are not correlated and hence the trainable algorithms are expected to generate a lot of
false alarms.
None of the network based systems did very well against host based, U2R
attacks [76].
The DoS and the R2L attacks have a very low variance and hence difficult
to detect with a unique signature by a signature-based IDS or to observe as
an anomaly by an anomaly detector [14].
Several of the surveillance attacks probe the network and retrieve significant information, and they go undetected, by limiting the speed and scope
of probes [76].
The data set provides a large sample of computer attacks embedded in
normal background traffic; several realistic intrusion scenarios conducted
in the midst of normal background data.

Chapter 3

64

Many threats and thereby the exploits that are available on the computer
systems and networks are undefined and open-ended.
The above limitations have to be overcome by sophisticated detection techniques for an improved and acceptable IDS performance. We have also seen
that Snort performs exceptionally well in detecting the U2R attacks and DoS
attacks, PHAD performs well in detecting the probes and ALAD performs well
in detecting the R2L attacks. This clearly shows that each IDS is designed to
focus on a limited region of the attack domain rather than the entire attack domain. Hence IDSs are limited in their performance at the design stage itself.
On analyzing certain IDS alerts, the doubt arises as to whether it is justifiable
to say that the IDS detects the particular attack. Considering for instance the
attacker executes the command : $./exploit. In the real data set especially for
per packet model, it will get translated to many packets, with the first packet
containing $, second packet containing ., third packet containing /, fourth
packet containing e. Is it justifiable to say that the IDS detects the particular attack when the IDS detects the fourth packet as anomalous? It depends on
the implementation of the IDS. Some IDS buffers the data before matching it
against the stored patterns. In that case, it is able to see the whole string $./exploit and hence detects the anomaly. For an IDS that analyzes on a per packet
basis, it is able to find some anomalous pattern in one packet before the connection is terminated and then flags it as an anomalous connection. If the aim
is to find intrusive connections, then any packet corresponding to intrusive
connection, detected as malicious, should be good.

3.4 Choice and the performance improvement of individual


IDSs
The acceptable false alarm rate has been established to be extremely low, almost
as low as the prior probability. Hence two IDSs, namely PHAD [67] and ALAD
[68], which give extremely low false alarm rate of the order of 0.00002 is chosen
for this work. The third IDS is the popularly used open source IDS, the Snort
[69] is also considered in this work. With the first two IDSs, the Bayesian
detection rate is of the order of 35% and 38% respectively. Thus one of the

Chapter 3

65

primary reasons for choosing the IDSs PHAD and ALAD was the requirement
of the acceptability in terms of the number of false alerts that does not overload
a system analyst. The other reason for the choice of PHAD and ALAD was
that most of the existing IDS algorithms neglect the minority attack types, R2L
and U2R in comparison to the majority attack types, probes and DoS. ALAD
is highly successful in detecting these rare attack types. Also, Snort detects
the U2R/Data attacks exceptionally well. All the above IDSs are average in
terms of detection performance. Hence an attempt was made to improve the
performance of the individual IDSs.
3.4.1 Snort: Improvements by adding new rules
Snort has been identified to have a lot of rules that are named differently from
that in the DARPA 99 data set. For example the land attack which comes
under the DoS attack class is found in the bad traffic rules folder of Snort and
not in the DoS rules. The attack warezclient which downloads illegal copies
of software has been identified by Snort with a rule that looks for the executable
code on the FTP port. Also, many of the rules are very generic and hence
the chances of false positive were very high. However, it was identified that it
requires tremendous effort to modify those generic rules and we have succeeded
only to a very small extent. We seek for a higher recall objective in the first
phase and the fusion is expected to reduce the false alarms to some extent. Snort
rules were modified for the DoS attacks like land, dosnuke and selfping. When
trying to incorporate the rules, care has been taken not to overfit or make it very
generic. This avoids FNs and FPs to the maximum extent possible. For example
when the signature is connection type=ftp, the misclassification should not
happen because it can be due to DoS also in the flooding attempt. Hence, rule
R2L has to be refined for the absence of DoS attacks [110]. The rules may thus
incorporate more conditions for refinement and thus avoid misclassification and
hence the misclassification cost also. This has increased the Snort detection of
DoS.

Chapter 3

66

3.4.2 PHAD/ALAD
PHAD was highly reliable in detecting all the probes except for the stealthy
slow scans which have been included in the DARPA99 data set. The stealthy
probes which PHAD missed are ipsweep, lsdomain, portsweep and resetscan.
However, Snort was effective in identifying those stealthy ones by waiting for
longer that one minute between the successive network transmissions. PHAD
has the disadvantage that it classifies attacks based on a single packet. We have
improved PHAD by examining a session and detecting the anomalies in the
connection rather than only at the packet level. A connection (record) is a sequence of TCP packets starting and ending at some well-defined times, between
which data flows from the source IP address to the target IP address under some
well-defined protocol.
The detection performance of the anomaly detectors PHAD and ALAD can
be improved further by training them on additional normal traffic other than the
traffic of weeks one and three of the DARPA 1999 data set. To improve the
performance of the IDSs PHAD and ALAD, more data has been incorporated
in their training. Normal data was collected from an University internal network and this has been randomly divided into two parts. PHAD was trained on
week three of the data set and one portion of the internal network traffic data,
and ALAD is trained on week one of the data set and the other portion of the
internal network traffic data. Hence, the two anomaly-based IDSs PHAD and
ALAD are trained on disjoint sets of the training data. The correlation among
the classifiers is lowered by incorporating more of training time and that too
being disjoint. The disjoint data sets given to PHAD and ALAD for training
has also helped to an extent in feature selection and also in reducing the correlation between the two IDSs. Both PHAD and ALAD look into almost disjoint
features of the traffic. PHAD detects anomaly based on the intrinsic features
of the TCP, UDP, IP, ICMP and the Ethernet headers. ALAD detects anomaly
based on almost disjoint features of the traffic by looking at the inbound TCP
stream connection to well-known server ports.
There are a number of DoS as well as R2L attacks that are difficult to get detected since they exploit a large number of different network or system services.

Chapter 3

67

There is no regular pattern for such attacks for detection by misuse detection
systems. The anomaly detection systems are also unable to detect them since
they may look like normal traffic because of the attacker evading some trusted
hosts and using them for an attack. These attacks are highly sophisticated and
needs a thorough analysis by a specialized detector. In addition, there is an observable imbalanced intrusion result due to DoS having more connections than
any other attack. Most of the IDSs will try to minimize the overall error rate,
but this leads to increase in the error rate of rare classes. Hence, more efforts
should be made to improve the detection rate of the rare classes.
This section has highlighted that even with an effort to improve the available
IDSs PHAD, ALAD and Snort, these IDSs still remain suboptimum with detection rates less than 50%.

3.5 Summary
The whole world has a growing interest in network security. The DARPAs
sponsorship, the AFRLs evaluation and the MIT Lincoln Laboratorys support
in security tools have resulted in a world class IDS evaluation setup that can
be considered as a ground breaking intrusion detection research. The DARPA
evaluation data set has the required potential in modeling the attacks that are
commonly found in the network traffic. Hence we conclude by commenting
that it can be used to evaluate the IDSs in the present scenario, even though any
effort to make the data set more real and therefore fairer for IDS evaluation is
to be welcomed. If a system is evaluated on the DARPA data set, then it cannot
claim anything more in terms of its performance on the real network traffic.
Hence this data set can be considered as the base line of any research.
In an effort to analyze and solve the IDS evaluation problems, evaluation
metrics such as Area Under ROC Curve, precision, recall, and F-score have
been introduced in Appendix D. Additionally, the P-test, which is more of an
intuitive way of comparing two IDSs and also more relevant to intrusion detection evaluation problem has been included in Appendix D. The metrics used for
IDS evaluation like F-score and P-test are highly effective for a perfect comparison of IDSs.

Chapter 4
Mathematical Basis for Sensor Fusion
Mathematics possesses not only truth, but supreme beauty - a beauty cold and
austere like that of sculpture, and capable of stern perfection, such as only great
art can show.
Bertrand Russell

4.1 Introduction
Chapter two and chapter three established the issues as well as limitations of a
single IDS respectively. Sensor fusion was identified as a viable solution for enhancing the performance of IDSs. The primary objective of the proposed thesis
is to develop a theoretical and practical basis for enhancing the performance of
intrusion detection systems using advances in sensor fusion with easily available IDSs. This chapter introduces the mathematical basis for sensor fusion in
order to provide enough support for the acceptability of sensor fusion in intrusion detection applications. Clearly, sensor fusion for performance enhancement of IDSs requires very complex observations, combinations of decisions
and inferences via scenarios and models. The basic problem involves selecting
IDSs and choosing the appropriate sensor fusion algorithms that provide sufficient enhancement in the performance of the fused IDS. Although, fusion in the
context of enhancing the intrusion detection performance has been discussed
earlier in literature, there is still a lack of theoretical analysis and understanding, particularly with respect to correlation of detector decisions. In this chapter,
we formulate the problem of fusion of multiple heterogeneous IDSs and examine whether the improvement in performance could be achieved through sensor
68

Chapter 4

69

fusion. This chapter describes the central concept underlying the work and a
theme that ties together all the arguments in this work. It provides an answer to
the questions posed in the introduction at a conceptual level.
With a precise understanding as to why, when, and how particular sensor fusion methods can be applied successfully, progress can be made towards a powerful new tool for intrusion detection: the ability to automatically exploit the
strengths and weaknesses of different IDSs. The theoretical model is undertaken, initially without any knowledge of the available detectors or the monitoring data. The empirical evaluation to augment the mathematical analysis is
illustrated in the chapters five to eight using two data sets; 1) the real-world
network traffic and 2) the DARPA 1999 data set. The results in those chapters
confirm the analytical findings in this chapter.
This chapter is organized as follows: Section 4.2 discusses the sensor fusion
algorithms. Section 4.3 and section 4.4 provide a survey on the related work in
sensor fusion and also the related work of sensor fusion in intrusion detection
applications respectively. Section 4.5 includes the theoretical formulation and
section 4.6 includes the solution approaches in the intrusion detection applications using sensor fusion. The chapter is summarized in section 4.7.

4.2 Sensor fusion algorithms


In this section, we provide state-of-the-art review in the area of intrusion detection based on sensor fusion approaches. This section aims to help choose the
appropriate sensor fusion algorithm for any given data set by making it easy to
compare the utility of different sensor fusion algorithms on the specific data set
of interest. Several approaches have been proposed for sensor fusion such as
weighted average, fuzzy logic, neural networks, Bayesian Inference and probability techniques, Dempster-Shafer evidence theory and Kalman filters. Intrusion detection using machine learning algorithms has the advantage in identifying new or unknown data or signal that a machine learning system is not aware
of during training. We also investigate and compare the performance achieved
by different machine learning algorithms for sensor fusion namely the statistical

Chapter 4

70

approaches, Artificial Neural Networks (ANN), Radial Basis Functions (RBF),


Support Vector Machines (SVM) and Naive Bayes (NB) trees. The conditions
under which each of these techniques operate efficiently is identified and the
detection effectiveness of these strategies are compared.
4.2.1 Machine Learning for intrusion detection
Machine Learning in intrusion detection is a problem that is researched for last
12 years. The most prominent works on data mining for intrusion detection have
been conducted in University of New Mexico (S. Forrest and S. A. Hofmeyr),
Purdue University (T. Lane and C. E. Brodley), Reliable Software Technologies
(A.K. Ghosh, A. Schwartzbard, and M. Schatz), University of Minnesota (V.
Kumar, P. Dokas, L. Ertoz, and A. Lazarevic), Columbia University (S. Stolfo
and E. Eskin), North Carolina State University (W. Lee), Florida Institute of
Technology (P. Chan and M. Mahoney), George Mason University (S. Jajodia,
D. Barbara, and N. Wu), Arizona State University (Nong Ye). Of course, this
list is not exhaustive.
It is quite intriguing that both Bagging and Boosting worked quite badly with
the DARPA data set. We are using the classification method to solve an artificial
classification problem to which we have reduced the original outlier detection
problem. This reduced classification problem tends to be highly noisy because
the artificial examples are in the background of the real ones. As it is known
in the literature, Boosting tends to work poorly in the presence of high noise
because it puts too much weight on the incorrectly labeled examples.
Statistical approaches

Statistical approaches are mostly based on modeling the data based on its statistical properties and using this information to estimate whether a test samples
comes from the same distribution or not. The simplest approach can be based
on constructing a density function for data of a known class, and then assuming that data is normal computing the probability of a test sample of belonging
to that class. The probability estimate can be thresholded to signal the intrusion.
Two main approaches exist to the estimation of the probability density function,

Chapter 4

71

parametric and non-parametric methods. The parametric approach assumes that


the data comes from a family of known distributions, such as the normal distribution and certain parameters are calculated to fit this distribution. However, in
most real world situations, the underlying distribution of the data is not known
and hence such techniques have little practical importance. In non-parametric
methods the overall form of the density function is derived from the data as well
as the parameters of the model. As a result non-parametric methods give greater
flexibility in general systems.
Neural networks

Some issues for sensor fusion such as their ability to generalize, computational
expense during training and further expense when they need to be retrained are
critical to neural networks in comparison to statistical methods. A subjective
view supports the use of neural network for sensor fusion in order to achieve
novelty detection in intrusion detection applications. The neural networks gain
experience by training the system to correctly identify the preselected examples
of the problem. The back-propagation algorithm can be used in the learning
phase to adapt the weights of the neural network. The computational complexity
of neural networks has always been an important consideration for practical
applications. One important consideration with neural networks is that they
cannot be as easily retrained as statistical models. Retraining is done when new
class data is to be added to the training set or when the training data no longer
reflects the environmental conditions.
Support Vector Machines

The support vector machine (SVM) is a supervised classification system that


minimizes an upper bound on its expected error. It attempts to find the hyperplane separating two classes of data that will generalize best to future data. Such
a hyperplane is the so called maximum margin hyperplane, which maximizes
the distance to the closest points from each class. Generally, they work well
when the number of features is magnitudes higher than the available training
data. They also avoid the two problems of dimensionality; they generalize well
to unseen data and they are efficient as they avoid explicit use of higher order
dimensional spaces.

Chapter 4

72

Bayesian classifiers

Bayes estimator is an estimator or decision rule that maximizes the posterior


expected value of a utility function or minimizes the posterior expected value of
a loss function. The estimator which minimizes the posterior expected loss also
minimizes the Bayes risk like the mean square error and therefore is a Bayes
estimator.
Decision tree and Naive Bayes

Decision trees (D-trees) dominate SVMs which in turn dominate NB in both the
precision and recall values. However, D-trees show a much larger fluctuation
in accuracy in the initial stages. This is to be expected because decision trees
are known to be unstable classifiers. SVMs are better in the initial stages of active learning when the training data is small but they loose out later. SVMs are
known to excel on accuracy but the uncertainty value measured as the distance
from SVM separator is perhaps not too meaningful. D-trees turn out to be better
in the combined metric.
An intuitive method for measuring uncertainty for separator based classifiers
like SVMs is to make it inversely proportional to the distance of the instance
from the separator. Similarly for Bayesian classifiers, the posterior probabilities
of classes can be used as an estimate of certainty. For decision trees, typically
uncertainty is derived from the error of the leaf into which the instance falls.
NB tree

The complementary behavior of NB and the D-trees has given rise to their hybrid which outperforms most of the earlier methods for intrusion detection application.
4.2.2 Evidence Theory
The Dempster-Shafer (DS) method is a powerful tool that can deal with subjective hypothesis for evidence as well as statistical data combination. The DS

Chapter 4

73

method does not have the requirement like Bayesian that the sensor set be predefined and sensors joint observation probability distribution be known beforehand. The DS rule corresponds to a conjunction operator: it builds the belief
induced by accepting two pieces of evidence, i.e., by accepting their conjunction. Shafer developed the DS theory of evidence based on the model that all
the hypotheses in the FoD are exclusive and that the frame is exhaustive, which
is true with the decisions of multiple IDSs that are to be fused.
The DS infers the true state of the system without having an explicit model
of the system. It is based only on some observations that can be considered as
hints (with some uncertainty) towards some system states. DS theory makes
the distinction between uncertainty and ignorance, so it is a very useful way to
reason with uncertainty based on incomplete and possibly contradictory information extracted from a stochastic environment.
4.2.3 Kalman filter
The Kalman filter is an efficient recursive filter that estimates the state of a dynamic system from a series of noisy measurements. A linear system in which
the mean squared error between the desired output and the actual output is minimized when the input is a random signal generated by white noise.
4.2.4 Bayesian network
A Bayesian network or a belief network is a probabilistic graphical model that
represents a set of variables and their probabilistic independencies. Formally,
Bayesian networks are directed acyclic graphs whose nodes represent variables,
and whose missing edges encode conditional independencies between the variables. Nodes can represent any kind of variable. Efficient algorithms exist that
perform inference and learning in Bayesian network.

4.3 Related work in sensor fusion


Blum [81] suggests that analytical studies of fusion performance can augment
existing experimental studies by addressing some aspects that are difficult to

Chapter 4

74

study using experimental methods. In his work, an estimation theory approach


is employed using a mathematical model based on the observation that each different sensor can provide a different quality when viewing a given object in the
scene.
Dasarathy [82] considers a generalized input-output (I/O) descriptor pair based
characterization of the sensor fusion process. The fusion system design philosophy expounded in his work is that an exhaustive exploitation of the sensor
fusion potential should explore fusion under all of the different I/O-based fusion
modes conceivable under such a characterization. Fusion system architectures
designed to permit such exploitation offer the requisite flexibility for developing the most effective fusion system designs for a given application.
Cohen et al. [83] presents a method for evaluating sensor fusion algorithms
based on a quantitative comparison, which is independent of the data acquired
and the sensors used. The sensor fusion performance measures and performance
analysis procedure provide a basis for modeling, analyzing, experimenting, and
comparing different sensor fusion algorithms. The statistical analysis provides
a systematic method for comparing sensor fusion algorithms. Quantitative procedures are developed to ensure that specific environmental conditions do not
influence the evaluation.
Iyengar and Brooks in their book [77] comment that understanding multi-sensor
fusion helps in achieving the most sophisticated way to deliver accurate realworld data to computer systems. Li et al. [84] compare three different fusion
rules for an arbitrary number of sensors, with complete, incomplete or no prior
information about the estimate. Krogh and Vedelsby [85] prove that at a single
data point the quadratic error of the ensemble estimator is guaranteed to be less
than or equal to the average quadratic error of the component estimators. The
qualitative benefits of sensor fusion like the increased confidence of detection,
improved system reliability and reduced ambiguity of inferences result in operational advantages. The uncertainty of the estimated detection on sensor fusion
is also expected to be smaller than the uncertainty of the individual detection
alone. Thus the qualitative benefits in turn give a complete detailed description

Chapter 4

75

of the improved performance with sensor fusion.


Hall and McMullen [86] state that if the tactical rules of detection require that
a particular certainty threshold must be exceeded for attack detection, then the
fused decision result provides an added detection up to 25% greater than the
detection at which any individual IDS alone exceeds the threshold. This added
detection equates to increased tactical options and to an improved probability
of true negatives [86]. Another attempt to illustrate the quantitative benefit of
sensor fusion is provided by Nahin and Pokoski [87]. Their work demonstrates
the benefits of multisensor fusion and their results also provide some conceptual
rules of thumb.
Chair and Varshney [35] present an optimal data fusion structure for distributed
sensor network, which minimizes the cumulative average risk. The structure
weights the individual decision depending on the reliability of the sensor. The
weights are functions of probability of false alarm and the probability of detection. The maximum a posteriori (MAP) test or the Likelihood Ratio (L-R)
test requires either exact knowledge of the a priori probabilities of the tested
hypotheses or the assumption that all the hypotheses are equally likely. This
limitation is overcome in the work of Thomopoulos et al. [88]. Thomopoulos et
al. [88] use the Neyman-Pearson test to derive an optimal decision fusion. Baek
and Bommareddy [89] present optimal decision rules for problems involving n
distributed sensors and m target classes.
Aalo and Viswanathan [90] perform numerical simulations of the correlation
problems to study the effect of error correlation on the performance of a distributed detection systems. The system performance is shown to deteriorate
when the correlation between the sensor errors is positive and increasing, while
the performance improves considerably when the correlation is negative and
increasing. Drakopoulos and Lee [91] derive an optimum fusion rule for the
Neyman-Pearson criterion, and use simulation to study its performance for a
specific type of correlation matrix. Kam et al. [92] consider the case in which
the class-conditioned sensor-to-sensor correlation coefficient are known, and
expresses the result in compact form. Their approach is a generalization of the

Chapter 4

76

method adopted by Chair and Varshney [35] for solving the data fusion problem
for fixed binary local detectors with statistically independent decisions. Kam et
al. [92] use Bahadur-Lazarsfeld expansion of the probability density functions.
Blum and Kassam [93] study the problem of locally most powerful detection
for correlated local decisions. The approach for an optimal data fusion for individual decisions that are correlated is in terms of the conditional correlation
coefficients of all orders.

4.4 Related work using sensor fusion in intrusion detection


application
Siaterlis and Maglaris [79] present the use of data fusion in the field of DoS
anomaly detection. The Dempster-Shafer theory of evidence is used as the
mathematical foundation for the development of a novel DoS detection engine.
The detection engine is evaluated using the real network traffic. Tim Bass [94]
presents a framework to improve the performance of intrusion detection systems based on data fusion. A few first steps towards developing the engineering
requirements using the art and science of multi-sensor data fusion as an underlying model is provided in his work. Giacinto et al. [95], propose an approach
to intrusion detection based on fusion of multiple classifiers. With each member
of the classifier ensemble trained on a distinct feature representation of patterns,
the individual results are combined using a number of fixed and trainable fusion
rules.
Didaci et al. [28] attempt the formulation of the intrusion detection problem
as a pattern recognition task using data fusion approach based on multiple classifiers. Their work confirms that the combination reduces the overall error rate,
but may also reduce the generalization capabilities. Wang et al. [96] in their
work brings out the superiority of data fusion technology applied to intrusion
detection systems. The method uses information collection from the network
and host agents and application of Dempster-Shafer theory of evidence. Another work incorporating the Dempster-Shafer theory of evidence is by Hu et
al. [97]. The Dempster-Shafer theory of evidence in data fusion is observed to
solve the problem of how to analyze the uncertainty in a quantitative way. In the

Chapter 4

77

evaluation, the ingoing and outgoing traffic ratio and service rate are selected as
the detection metrics, and the prior knowledge in the DDoS domain is proposed
to assign probability to evidence.
Siraj et al. [98] discuss a Decision Engine for an Intelligent Intrusion Detection
System (IIDS) that fuses information from different intrusion detection sensors
using an artificial intelligence technique. The Decision Engine uses Fuzzy Cognitive Maps (FCMs) and fuzzy rule-bases for causal knowledge acquisition and
to support the causal knowledge reasoning process. Thomopolous in one of his
work [88], concludes that with the individual sensors being independent, the
optimal decision scheme that maximizes the probability of detection at the fusion for fixed false alarm probability consists of a Neyman-Pearson test at the
fusion unit and the likelihood ratio test at the sensors. In the work of Lee et al.
[51] they note that, the best way to make intrusion detection models adaptive is
by combining existing models with new models trained on new intrusion data
or new normal data. In that work, they combined the rule sets that were inductively generated on separate days to produce a more accurate composite rule set.
The other somewhat related works albeit distantly are the alarm clustering method
by Perdisci et al. [99], aggregation of alerts by Valdes et al. [100], combination of alerts into scenarios by Dain et al. [101], the alert correlation by Cuppens et al. [102], the correlation of Intrusion Symptoms with an application of
chronicles by Morin et al. [103], and aggregation and correlation of intrusiondetection alerts by Debar et al. [104], etc. The correlation of alerts is mainly
by grouping alerts that are part of the same attack trend and hence completely
avoids the duplicate alerts. The aggregation of alerts is based on certain criteria
to aggregate severity level, reveal trends, and clarify attackers intentions. The
work of Valeur et al. [105] presents a general correlation model that includes a
comprehensive set of components and a framework based on this model. These
works address the issue of efficiently managing the large number of alerts by
providing an unified description of the alerts from individual IDSs.
Considering the literature on various sensor fusion techniques used for intrusion detection applications, it is seen that many machine learning algorithms do

Chapter 4

78

not handle skewed data sets well. To counter the effect of data skewness, either
downsampling the number of normal events or upsampling the number of attack
events is normally done. Sampling the normal set might reduce the information
content and only present a subset of all available normal events, in turn leading to false positives being reported by the system. This again establishes that
sensor fusion is the only promising approach for performance enhancement of
IDSs. The mathematical basis of sensor fusion is attempted in the remaining
sections of this chapter.

4.5 Theoretical formulation


Sensor Fusion can be defined as the process of collecting information from
multiple and possibly heterogeneous sources and combining them to obtain a
more descriptive, intuitive and meaningful result [79]. The choice of when to
perform the fusion depends on the types of sensor data available and the types
of preprocessing performed by the sensors. The fusion can occur at the various
levels like:
1. raw data level prior to feature extraction,
2. feature vector level prior to identity declaration,
3. decision level after each sensor has made an independent declaration of
identity.
In data level fusion, data from individual sensors are fused directly, with subsequent feature extraction and identity declaration from the fused data. To perform data level fusion, the sensors must either be identical or commensurate.
Association is performed on the raw data to ensure that data being fused relate to the same object. The identification process proceeds identical to the
process for a single sensor. In a feature level fusion each sensor observes an
object and feature extraction is performed. The result is a separate feature vector representing the object from each sensor. An association process must then
be used to sort feature vectors into meaningful groups. These feature vectors
are fused and an identity declaration is made based on the joint feature vector.
In decision level fusion each sensor performs a feature extraction to obtain an
independent declaration of identity. Association is then performed to partition

Chapter 4

79

the identity declarations into groups representing observations belonging to the


same observed entity. The associated declarations of identity from each sensor
are subsequently fused.
Sensor fusion is expected to result in both qualitative and quantitative benefits for the intrusion detection application. The primary aim of sensor fusion
is to detect the intrusion and to make reliable inferences, which may not be
possible with a single sensor alone. The particular quantitative improvement in
estimation that results from using multiple IDSs depends on the performance
of the specific IDSs involved. Thus the fused estimate takes advantage of the
relative strengths of each IDS, resulting in an improved estimate of the intrusion
detection. The error analysis techniques also provide a means for determining
the specific quantitative benefits of sensor fusion in the case of intrusion detection. The quantitative benefits discover the phenomena that are likely rather
than merely chance of occurrences.
Consider a single detector decision with multiple error sources. The overall
error estimate is given as:
X
1
eest = (
Component Errors2 ) 2
(4.1)
The overall error estimate of a single detector as given in equation 4.1 is (reasonably) larger than the largest single error source, and it is often dominated by
the largest one. On the contrary, when multiple detector decisions are made of
the same observation with different detectors, their individual contributions are
weighted by the reciprocals of the squares of their individual error estimates, so
that the overall error estimate is given as:
X
1
eest = (
Component Errors2 ) 2
(4.2)
The overall error estimate of the fused sensor as given in equation 4.2 is (reasonably) less than the smallest individual error estimate, and it is often dominated
by the smallest one. The reduction in the overall error estimate clarifies to an
extent the need for sensor fusion. While none of these are compelling arguments for the benefits of using multi-sensor fusion, it does illustrate that there
are both qualitative and quantitative benefits to be derived from sensor fusion.

Chapter 4

80

In this section, an attempt is made to study the performance of the theoretically


best fusion approach using mathematical analysis. The motivation was the fact
that before the empirical evaluation is attempted, it is extremely necessary to
prove the acceptability of sensor fusion in performance enhancement of IDSs.
The analytical evaluation can be extremely useful with a complete addressing
of the problem with sound mathematical and logical concepts. Thus the mathematical analysis of decision fusion develops a rational basis which is free from
the various techniques used. This is later augmented with empirical evaluation.
A system of n sensors IDS1 , IDS2 , ..., IDSn is considered; corresponding to
an observation with parameter x; x <m . Consider the sensor IDSi to yield
an output si ; si <m according to an unknown probability distribution pi . The
decision of the individual IDSs that take part in decision is expected to be dependent on the input and hence the output of IDSi in response to the input xj
can be written more specifically as sji . A successful operation of a multiple
sensor system critically depends on the methods that combine the outputs of
the sensors, where the errors introduced by various individual sensors are unknown and not controllable. With such a fusion system available, the fusion
rule for the system has to be obtained. The problem is to estimate a fusion rule
f : <nm <m , independent of the sample or the individual detector that take
part in fusion, such that the expected square error is minimized over a family of
fusion rules.
To perform the theoretical analysis, it is necessary to model the process under
consideration. Consider a simple fusion architecture as given in Fig. 4.1 with
n individual IDSs combined by means of a fusion unit. To start with, consider
a two dimensional problem with the detectors responding in a binary manner.
Each of the local detector collects an observation xj <m and transforms it to a
local decision sji {0, 1}, i = 1, 2, ..., n, where the decision is 0 when the traffic
is detected normal or else 1. Thus sji is the response of the ith detector to the
network connection belonging to class j = {0, 1}, where the classes correspond
to normal traffic and the attack traffic respectively. These local decisions sji are
fed to the fusion unit to produce an unanimous decision sj , which is supposed to

Chapter 4

81

minimize the overall cost of misclassification and improve the overall detection
rate. The fundamental problem of network intrusion detection can be viewed
Figure 4.1: Fusion architecture with decisions from n IDSs
IDS1

IDS2
INPUT (x)

S1
S2

.
.
.
.
.

IDSn

OUTPUT (y)
FUSION UNIT
Sn

as a detection task to decide whether network connection x is a normal one or


an attack. Assume a set of unknown features e = {e1 , e2 , ..., em } that are used
to characterize the network traffic. The feature extractor is given by ef (x) e.
It is assumed that this observed variable has a deterministic component and a
random component and that their relation is additive. The deterministic component is due to the fact that the class is discrete in nature, i.e., during detection,
it is known that the connection in either normal or an attack. The imprecise
component is due to some random processes which in turn affects the quality of
extracted features. Indeed, it has a distribution governed by the extracted feature
set often in a nonlinear way. By ignoring the source of distortion in extracted
network features ee (x), it is assumed that the noise component is random (while
in fact they may not be if it is possible to systematically incorporate all possible
variations into the base-expert model).
In a statistical framework, the probability that x is identified as normal or as
attack after a detector s observes the network connection can be written as:
si = si (ef (x))

(4.3)

where x is the sniffed network traffic, ef is a feature extractor, and i is a set


of parameters associated with the sensor indexed i. There exists several types
of intrusion detectors, all of which can be represented by equation 4.3. Sensor
fusion results in the combination of data from sensors competent on partially
overlapping frames. The output of a fusion system is characterized by a variable
s, which is a function of uncertain variables s1 , ..., sn , being the output of the

Chapter 4

82

individual IDSs and given as:


s = f (s1 , ..., sn )

(4.4)

where f (.) corresponds to the fusion function. The independent variables (i.e.,
information about any group of variables does not change the belief about others) s1 , ..., sn , are imprecise and dependent on the class of observation and hence
given as:
sj = f (sj1 , ..., sjn )
(4.5)
where j refers to the class of the observation.
Variance of the IDSs determines the average quality when each IDS acts individually. Lower variance corresponds to better performance. Covariance among
detectors measures the dependence of the detectors. The more the dependence,
the lesser the gain benefited out of fusion. Let us consider two cases here. In the
first case, for each access, n responses are available and are used independently
of each other. The average of variance of sj over all i = 1, 2, ..., n, denoted as
j 2
(av
) is given as:
n
1X j 2
j 2
( )
(av ) =
(4.6)
n i=1 i
In the second case, all n responses are used together and are combined using
the mean operator; the variance of over many accesses, denoted as (fj usion )2 is
called the variance of average and can be calculated as follows:

(fj usion )2 =

1
n2

= n1

Pn

j 2
i=1 (i )

Pn

1
j 2
i=1 (av ) + n2

1
n2

Pm

Pn

j
j j
k=1,k<i i,k i k

i=1,i<k

Pm
i=1,i<k

Pn

j
j j
k=1,k<i i,k i k

(4.7)

where ji,k is the correlation coefficient between the ith and kth detectors and
for j taking the different class values. The first term is the average variance of
the base-experts while the second term is the covariance between ith and kth

Chapter 4

83

detectors for i 6= k. This is because the term ji,k ij kj is by definition equivalent


to correlation. On analysis, it is seen that:
j 2
(fj usion )2 (av
)

(4.8)

It can be observed that the resultant variance of the final score will be reduced
with respect to the average variance of the two original scores when two detector scores are merged by a simple mean operator. Since 0 jm,n 1,
1 j 2
(4.9)
(av ) (fj usion )2
n
The two equations 4.8 and 4.9 give the lower and upper bound of (fj usion )2 ,
attained with correlation and uncorrelation respectively. Any positive correlation results in a variance between these bounds. Hence, by combining responses
using the mean operator, the resultant variance is assured to be smaller than the
average (not the minimum) variance. Fusion of the scores reduces variance,
which in turn results in reduction of error (with respect to the case where scores
are used separately). To measure explicitly the factor of reduction in variance,
1 j 2
j 2
(av ) (fj usion )2 (av
)
n
Factor of reduction in variance, vr =

j 2
(av
)
;1
k
(f usion )2

(4.10)

vr n

This clearly indicates that the reduction in variance is more when more detectors are used, i.e., increasing n, the better will be the combined system, even
if the hypotheses of underlying IDSs are correlated. This comes at a cost of increased computation, proportional to the value of n. The reduction in variance
of the individual classes results in lesser overlap between the class distributions.
Thus the chances of error reduces, which in turn results in improved detection.
This forms the argument in this work for why fusion using multiple detectors
works for intrusion detection application. Experimental results provide strong
evidence to support this claim.
Following common possibilities encountered on combining two detectors are
analyzed:

Chapter 4

84

1. combining two uncorrelated experts with very different performances;


2. combining two highly correlated experts with very different performances;
3. combining two uncorrelated experts with very similar performances;
4. combining two highly correlated experts with very similar performances.
Fusing IDSs of similar and different performances are encountered in almost all
practical fusion problems. Considering the first case, without loss of generality
it can be assumed that system 1 is better than system 2, i.e., 1 < 2 and = 0.
Hence, for the combination to be better than the best system, i.e., system 1, it is
required that
(fj usion )2 < (1j )2
(1j )2 +(2j )2 +21j 2j
4

< (1j )2

(2j )2 < 3(1j )2 21j 2j


The covariance is zero in general for cases 1 and 3. Hence, the combined system
will benefit from the fusion when the variance of one ((2j )2 ) is at most less than
3 times of the variance of the other (1j )2 since = 0 . Furthermore, correlation
[or equivalently covariance; one is proportional to the other] between the two
systems penalizes this margin of 3(1j )2 . This is particularly true for the second
case since > 0. Also, it should be noted that < 0 (which implies negative
correlation) could allow for larger (2j )2 . As a result, adding another system
that is negatively correlated, but with large variance (hence large error) will imj 2
) ). Unfortunately, with intrusion detection
prove fusion ((fj usion )2 < n1 (av
systems, two systems are either positively correlated or not correlated, unless
these systems are jointly trained together by algorithms such as negative correlation learning [80]. For a given detector i, si for i = 1, ..., n, will tend to agree
with each other (hence positively correlated) most often than to disagree with
each other (hence negatively correlated). By fusing scores obtained from IDSs
that are trained independently, one can almost be certain that 0 m,n 1. For
the third and fourth cases, we have (1j )2 (2j )2 . Hence, (2j )2 < (1j )2 . Note
that for the third case with 0, the above constraint gets satisfied. Therefore,

Chapter 4

85

fusion will definitely lead to better performance. On the other hand, for the
fourth case where 1, fusion may not necessarily lead to better performance.
From the above analysis using a mean operator as fusion, the conclusion drawn
are the following:
The analysis explains and shows that fusing two systems of different performances is not always beneficial. The theoretical analysis shows that if the
weaker IDS has (class-dependent) variance three times larger than the variance
of the best IDS, the gain due to fusion breaks down. This is even more true
for correlated base-experts as correlation penalizes this limit further. It is also
seen that fusing two uncorrelated IDSs of similar performance always result in
improved performance. Finally, fusing two correlated IDSs of similar performance will be beneficial only when the covariance of the two IDSs are less than
the variance of the IDSs. It is necessary to show that a lower bound of accuracy
results in the case of sensor fusion. This can be proved as below:
P
Given the fused output as s = i wi si , the quadratic error of a sensor indexed
i, ei , and also the fused sensor, ef usion are given by:
ei = (si c)2

(4.11)

ef usion = (sf usion c)2

(4.12)

and
respectively, where wi is the weighting on the ith detector, and c is the target.
The ambiguity of the sensor is defined as:
ai = (si s)2

(4.13)

The squared error of the fused sensor is seen to equal the weighted average
squared error of the individuals, minus a term which measures average correlaP
tion. This allows for non-uniform weights (with the constraint i wi = 1) so
P
the general form of the ensemble output is s = i wi si . The ambiguity of the
fused sensor is given as:

Chapter 4

af usion =
=
=
=
=
=

86

P
P
P
P
P
P

i wi ai

i wi (si

s)2

i wi (si

c + c s)2

i wi ((si

c) (s c))2

i wi ((si

c)2 2(si c)(s c) + (s c)2

i w i ei

2
i wi 2(si c)(s c)

(4.14)

On solving equation 4.14, the error due to the combination of several detectors
is obtained as the difference between the weighted average error of individual
detectors and the ambiguity among the fusion member decisions.
X
X
ef usion =
wi (si c)2
wi (si s)2
(4.15)
i

The ambiguity among the fusion member decisions is always positive and hence
the combination of several detectors is expected to be better than the average
over several detectors. This result turns out to be very important for the focus
of this work.

4.6 Solution approaches


In the case of fusion problem, the solution approaches depend on whether there
is any knowledge regarding the traffic and the intrusion detectors. This section
initially considers no knowledge of the IDSs and the intrusion detection data
and later with a knowledge of available IDSs and evaluation data set.
There is an arsenal of different theories of uncertainty and methods based on
these theories for making decisions under uncertainty. There is no consensus as
to which method is most suitable for problems with epistemic uncertainty, when
information is scarce and imprecise. The choice of heterogeneous detectors is

Chapter 4

87

expected to result in decisions that conflict or be in consensus, completely or


partially. The detectors can be categorized by their output si , i.e., probability (within the range [0, 1]), Basic Probability Assignment(BPA) m (within the
range [0, 1]), membership function (within the range [0, 1]), distance metric
(more than or equal to zero), or log-likelihood ratio (a real number).
Consider a body of evidence (F ; m), where F represents the set of all focal
elements and m their corresponding basic probability assignments. The above
analysis without any knowledge about the system or the data, attempting to
prove the acceptance of sensor fusion in improving the intrusion detection performance is unlimited in scope. With such an analysis favoring the use of sensor
fusion in enhancing the performance of IDSs, the Dempster-Shafer fusion operator is used since it is more acceptable for intrusion detection applications.
Dempster-Shafer theory considers two types of uncertainty; 1) due to the imprecision and 2) due to the conflict in the evidence. Non specificity and strife measure the uncertainty due to imprecision and conflict, respectively. The larger
the focal elements of a body of evidence, the more imprecise is the evidence
and, consequently, the higher is non specificity. When the evidence is precise
(all the focal elements consist of a single member), non specificity is zero. In
the challenge problems, the broader the interval of the experts, the higher is
non specificity. Strife measures the degree to which pieces of evidence contradict each other. Consonant (nested) focal elements imply little or no conflict.
Disjoint elements imply high conflict in the evidence. For example, if the experts intervals are disjoint, the experts contradict each other. Therefore, strife
is large. For finite sets, when evidence is precise, strife reduces to Shannons
entropy, which measures conflict in probability theory. Non specificity measures the epistemic/reducible uncertainty, the uncertainty associated with the
sizes (cardinalities) of relevant sets of alternatives.
It is required to model the uncertainty in the independent variables, and derive a model about the uncertainty in the performance variable s and assess the
performance enhancement of the fusion system. This has been attempted in the
next section. The importance of Dempster-Shafer theory in intrusion detection

Chapter 4

88

is that in order to track statistics, it is necessary to model the distribution of


decisions. If these decisions are probabilistic assignments over the set of labels, then the distribution function will be too complicated to retain precisely.
The Dempster-Shafer theory of evidence solves this problem by simplifying the
opinions to Boolean decisions, so that each detector decision lies in a space
having 2 elements, where defines the working space or the Frame of Discernment (FoD). In this way, the full set of statistics can be specified using 2
values.
4.6.1 Dempster-Shafer combination method
Dempster-Shafer (DS) theory is required to model the situation in which a classification algorithm cannot classify a target or cannot exhaustively list all of the
classes to which it could belong. This is most acceptable in the case of unknown
attacks or novel attacks or the case of zero a priori knowledge of data distribution. DS theory does not attempt to formalize the emergence of novelties, but it
is a suitable framework for reconstructing the formation of beliefs when novelties appear. An application of decision making in the field of intrusion detection
illustrates the potentialities of DS theory, as well as its shortcomings.
The DS rule corresponds to conjunction operator since it builds the belief induced by accepting two pieces of evidence, i.e., by accepting their conjunction.
Shafer developed the DS theory of evidence based on the model that all the hypotheses in the FoD are exclusive and the frame is exhaustive. The purpose is
to combine/aggregate several independent and equi-reliable sources of evidence
expressing their belief on the set. The Dempster-Shafer theory is a mathematical theory of evidence based on belief functions and plausible reasoning, which
is used to combine separate pieces of information (evidence) to calculate the
probability of an event. Fusion should result in a thought on the decisions and
not merely a response that aggregates the decisions. The aim of using the DS
theory of fusion is that with any set of decisions from heterogeneous detectors,
sensor fusion can be modeled as utility maximization.
DS theory of combination conceives novel categories that classify empirical

Chapter 4

89

evidence in a novel way and, possibly, are better able to discriminate the relevant aspects of emergent phenomena. Novel categories detect novel empirical
evidence, that may be fragmentary, irrelevant, contradictory or supportive of
particular hypotheses. The DS theory approach for quantifying the uncertainty
in the performance of a detector and assessing the improvement in system performance, consists of three steps:
1. Model uncertainty by considering each variable separately. Then a model
that considers all variables together is derived.
2. Propagate uncertainty through the system, which results in a model of uncertainty in the performance of the system.
3. Assess the system performance enhancement.
In the case of Dempster-Shafer theory, the FoD is expected to contain all propositions of which the information sources (IDSs) can provide evidence. When a
proposition corresponds to a subset of a frame of discernment, it is said that the
frame discerns that proposition. It is expected that the elements of the frame of
discernment, are assumed to be exclusive propositions. This is a constraint,
which gets always satisfied in intrusion detection application because of the discrete nature of the detector decision. The belief of likelihood of the traffic to
be in an anomalous state is detected by various IDSs by means of a mass to the
subsets of the FoD.
The DS theory is a generalization of the classical probability theory with its
additivity axiom excluded or modified. The probability mass function (p) is a
mapping which indicates how the probability mass is assigned to the elements.
The Basic Probability Assignment (BPA) function (m) on the other hand is the
P
set mapping, and the two can be related A as m(A) = BA p(B) and
hence obviously m(A) relates to a belief structure. The mass m is very near to
the probabilistic mass p, except that it is shared not only by the single hypothesis but also to the union of the hypotheses. In DS theory, rather than knowing
exactly how the probability is distributed to each element B , we just know
by the BPA function m that a certain quantity of a probability mass is somehow divided among the focal elements. Because of this less specific knowledge
about the allocation of the probability mass, it is difficult to assign exactly the

Chapter 4

90

probability associated with the subsets of the FoD, but instead we assign two
measures: the (1) belief (Bel)and (2) plausibility (P l), which correspond to the
lower and upper bounds on the probability,
i.e., Bel(A) p(A) P l(A)
where the belief function, Bel(A), measures the minimum uncertainty value
about proposition A, and the Plausibility, P l(A), reflects the maximum uncertainty value about proposition A.
The following are the key assumptions made with the fusion of intrusion detectors:
If some of the detectors are imprecise, the uncertainty can be quantified
about an event by the maximum and minimum probabilities of that event.
Maximum (minimum) probability of an event is the maximum (minimum)
of all probabilities that are consistent with the available evidence.
The process of asking an IDS about an uncertain variable is a random experiment whose outcome can be precise or imprecise. There is randomness
because every time a different IDS observes the variable, a different decision can be expected. The IDS can be precise and provide a single value or
imprecise and provide an interval. Therefore, if the information about uncertainty consists of intervals from multiple IDSs, then there is uncertainty
due to both imprecision and randomness.
If all IDSs are precise, they give pieces of evidence pointing precisely to specific values. In this case, a probability distribution of the variable can be build.
However, if the IDSs provide intervals, such a probability distribution cannot be
build because it is not known as to what specific values of the random variables
each piece of evidence supports.
In the case of DS theory, the additivity axiom of probability theory p(A) +
= 1 is modified as m(A) + m(A)
+ m() = 1, with uncertainty introp(A)
is the mass
duced by the term m(). m(A) is the mass assigned to A, m(A)
assigned to all other propositions that are not A in FoD and m() is the mass assigned to the union of all hypotheses when the detector is ignorant. This clearly

Chapter 4

91

explains the advantages of evidence theory in handling an uncertainty where the


detectors joint probability distribution is not required.
= 1, which is equivalent to Bel(A) = P l(A),
The equation Bel(A) + Bel(A)
holds for all subsets A of the FoD if and only if Bel0 s focal points are all singletons. In this case, Bel is an additive probability distribution. Whether normalized or not, the DS method satisfies the two axioms of combination; 0 m(A)
1 and

m(A) = 1 . The third axiom

m() = 0 is not satisfied by

A
the unnormalized DS method. Also, independence of evidence is yet another
requirement for the DS combination method. The problem is formalized as follows: Considering the network traffic, assume a traffic space , which is the
union of the different classes, namely, the attack and the normal. The attack
class have different types of attacks and the classes are assumed to be mutually
exclusive. Each IDS assigns to the traffic, the detection of any of the traffic
sample x, that denotes the traffic sample to come from a class which is an
element of the FoD, . With n IDSs used for the combination, the decision of
each one of the IDSs is considered for the final decision of the fusion IDS.
This chapter presents a method to detect the unknown traffic attacks with an
increased degree of confidence by making use of a fusion system composed of
detectors. Each detector observes the same traffic on the network and detects
the attack traffic with an uncertainty index. The frame of discernment consists
of singletons that are exclusive (Ai Aj = , i 6= j) and are exhaustive since
the FoD consists of all the expected attacks which the individual IDS detects
or else the detector fails to detect by recognizing it as a normal traffic. All the
constituent IDSs that take part in fusion are assumed to have a global point of
view about the system rather than separate detectors being introduced to give
specialized opinion about a single hypothesis.
The DS combination rule gives the combined mass of the two evidence m1

Chapter 4

92

and m2 on any subset A of the FoD as m(A) given by:


P
m(A) =

m1 (X)m2 (Y )

X Y =A
1

m1 (X)m2 (Y )

X Y =
(4.16)

The numerator of Dempster-Shafer combination equation 2.16 represents the


influence of aspects of the second evidence that confirm the first one. The denominator represents the influence of aspects of the second evidence that contradict the first one. The denominator of equation 2.16 is 1 k, where k is
the conflict between the two evidence. This denominator is for normalization,
which spreads the resultant uncertainty of any evidence with a weight factor,
over all focal elements and results in an intuitive decision. i.e., the effect of
normalization consists of eliminating the conflicting pieces of information between the two sources to combine, consistently with the intersection operator.
Dempster-Shafer rule does not apply if the two evidence are completely contradictory. It only makes sense if k < 1 . If the two evidence are completely
contradictory, they can be handled as one single evidence over alternative possibilities whose BPA must be re-scaled in order to comply with equation 2.16.
Dempster-Shafer rule says that compatible evidence on a possibility must be
evaluated as a fraction of total compatible evidence. The meaning of DempsterShafer rule 2.16 can be illustrated in the simple case of two evidence on an
observation A. Suppose that one evidence is m1 (A) = p, m1 () = 1 p and
that another evidence is m2 (A) = q, m() = 1 q. The total evidence in favor
of A = The denominator of equation 2.16 is given as 1 (1 p)(1 q).
The fraction supported by both the bodies of evidence =

pq
(1p)(1q)

Chapter 4

93

Specifically, if a particular detector indexed i taking part in fusion has probability of detection mi (A) for a particular winning class A. It is expected that
fusion results in the probability of that class as m(A), which is expected to
be more that mi (A) i and A. Thus the confidence in detecting a particular winning class is improved, which is the key aim of sensor fusion. Thus,
Dempster-Shafer theory for sensor fusion aids in attaining an increased value of
confidence in detection by means of increased probability of detection of individual classes. Note that the Dempster-Shafer rule is independent of the order
in which evidence are combined.
The above analysis is simple since it considers only one class at a time. The
variance of the two classes can be merged and the resultant variance is the sum
of the normalized variances of the individual classes. Hence, the class label can
be dropped.
4.6.2 Analysis of detection error assuming traffic distribution
The previous sections analyzed the system without any knowledge of the underlying traffic or detectors. In this section, the Gaussian distribution is assumed
for both the normal and the attack traffic due to its acceptability in practice.
Often, the data available in databases is only an approximation of the true data.
When the information about the goodness of the approximation is recorded,
the results obtained from the database can be interpreted more reliably. Any
database is associated with a degree of accuracy, which is denoted with a probability density function, whose mean is the value itself. Formally, each database
value is indeed a random variable; the mean of this variable becomes the stored
value, and is interpreted as an approximation of the true value; the standard deviation of this variable is a measure of the level of accuracy of the stored value.
Assuming the attack connection and normal connection scores to have the mean
values yij=1 = 1 and yij=0 = 0 respectively, 1 > 0 without loss of generality. Let 1 and 0 be the standard deviation of the attack connection and normal
connection scores. The two types of errors committed by IDSs are often measured by False Positive Rate (F Prate ) and False Negative Rate (F Nrate ). F Prate
is calculated by integrating the attack score distribution from a given threshold

Chapter 4

94

T in the score space to , while F Nrate is calculated by integrating the normal


distribution from to the given threshold T .
Z
F Prate =
(pk=0 )dy
(4.17)
T

F Nrate =

(pk=1 )dy

(4.18)

The threshold T is an unique point where the error is minimized, i.e., the difference between F Prate and F Nrate is minimized by the following criterion:
T = argmin(F Prate F Nrate )

(4.19)

At this threshold the resultant error due to F Prate and F Nrate is a minimum.
This is because the F Nrate is an increasing function (a cumulative density function, cdf ) and F Prate is a decreasing function (1 cdf ). T is the point where
these two functions intersect. Decreasing the error introduced by the F Prate
and the F Nrate implies an improvement in the performance of the system. The
fusion algorithm accepts decisions from many IDSs, where a minority of the
decisions are false positives or false negatives. A good sensor fusion system is
expected to give a result that accurately represents the decision from the correctly performing individual sensors, while minimizing the decisions from erroneous IDSs. Approximate agreement emphasizes precision, even when this
conflicts with system accuracy. However, sensor fusion is concerned solely with
the accuracy of the readings, which is appropriate for sensor applications. This
is true despite the fact that increased precision within known accuracy bounds
would be beneficial in most of the cases. Hence the following strategy is being
adopted:
. The false alarm rate F Prate can be fixed at an acceptable value 0 and then
the detection rate can be maximized. Based on the above criteria a lower
bound on accuracy can be derived.
. The detection rate is always higher than the false alarm rate for every IDS,
an assumption that is trivially satisfied by any reasonably functional sensor.
. Determine whether the accuracy of the IDS after fusion is indeed better

Chapter 4

95

than the accuracy of the individual IDSs in order to support the performance enhancement of fusion IDS.
. To discover the weights on the individual IDSs that gives the best fusion.
Given the desired false alarm rate which is acceptable, F Prate = 0 , the threshold (T ) that maximizes the T Prate and thus minimizes the F Nrate ;
T Prate = P r[alert|attack] = P r[

n
X

wi si T |attack ]

(4.20)

i=1

F Prate

n
X
= P r[alert|normal] = P r[
wi si T |normal] = 0

(4.21)

i=1

The fusion of IDSs becomes meaningful only when F P F Pi i and


T P T Pi i. In order to satisfy these conditions, an adaptive or dynamic
weighting of IDSs is the only possible alternative. Model of the fusion output
is given as:
s=

n
X

wi si

i=1

and
T Pi = P r[si = 1|attack],

F Pi = P r[si = 1|normal]

where T Pi and F Pi are the detection rate and the false positive rate of an individual IDS indexed i. It is required to provide low value of weight to any individual sensor that is unreliable, hence meeting the constraint on false alarm as
given in equation 4.21. Similarly the fusion improves the T Prate as the detectors
get appropriately weighted according to their performance. One justification for
this evaluation metric is that when searching large databases like the network
traffic, it is more reasonable to have the results to be most relevant (precision),
without caring whether all the relevant examples are seen (recall) or not. We
chose the number 0 depending on the proportion of attacks in the normal traffic (base-rate). This threshold is of course adjustable and one may vary the scale
of the measured performance numbers by adjusting it. It also happens that these

Chapter 4

96

features precision-at-0 scores are quite distinct from one another, facilitating
meaningful comparison.
Fusion of the decisions from various IDSs is expected to produce a single decision that is more informative and accurate than any of the decisions from the individual IDSs. Then the question arises as to whether it is optimal. Towards that
end, a lower bound on variance for the fusion problem of independent sensors,
or an upper bound on the false positive rate or a lower bound on the detection
rate for the fusion problem of dependent sensors is presented in this work.
Fusion of Independent Sensors

The decisions from various IDSs are assumed to be statistically independent


for the sake of simplicity so that the combination of IDSs will not diffuse the
detection. In sensor fusion, improvements in performances are related to the
degree of error diversity among the individual IDSs.
The successful operation of a multiple sensor system critically depends on the methods that combine
the outputs of the sensors. A suitable rule can be inferred using the training examples, where the errors introduced by various individual sensors are unknown
and not controllable. The choice of the sensors has been made and the system is
available, and the fusion rule for the system has to be obtained. A system of n
sensors IDS1 , IDS2 , ..., IDSn is considered; corresponding to an observation
with parameter x, x <m , sensor IDSi yields output si , si <m according to
an unknown probability distribution pi . A training lsample (x1 , y1 ), (x2 , y2 ),
..., (xl , yl ) is given where si = (si1 , si2 , ..., sin ) and sji is the output of IDSi in response to the input xj . The problem is to estimate a fusion rule f : <nm <m ,
based on the sample, such that the expected square error is minimized over a
family of fusion rules based on the given lsample.
Variance and Mean Square Error of the estimate of fused output

Consider n independent IDSs with the decisions of each being a random variable with Gaussian distribution of zero mean vector and covariance matrix diagonal (12 , 22 , . . . , n2 ). Assume s to be the expected fusion output, which is the
unknown deterministic scalar quantity to be estimated and s to be the estimate

Chapter 4

97

of the fusion output. In most cases the estimate is a deterministic function of


the data. Then the mean square error (M SE) associated with the estimate s for
a particular test data set is given as E[(s s)2 ]. For a given value of s, there are
two basic kinds of errors:
. Random error, which is also called precision or estimation variance,
. Systematic error, which is also called accuracy or estimation bias.
Both kinds of errors can be quantified by the conditional distribution of the estimates pr(
s s). The M SE of a detector is the expected value of the error
and is due to the randomness or due to the estimator not taking into account the
information that could produce a more accurate result.
M SE = E[(s s)2 ] = V ar(
s) + (Bias(
s, s))2
The M SE is the absolute error used to assess the quality of the sensor in terms
of its variation and unbiasedness. For an unbiased sensor, the M SE is the variance of the estimator, or the root mean squared error (RM SE) is the standard
deviation. The standard deviation measures the accuracy of a set of probability
assessments. The lower the value of RM SE, the better it is as an estimator in
terms of both the precision as well as the accuracy. Thus, reduced variance can
be considered as an index of improved accuracy and precision of any detector.
Hence, this section proves the reduction in variance of the fusion IDS to show
its improved performance. The Cramer-Rao inequality can be used for deriving
the lower bound on the variance of a sensor.
The Cramer-Rao lower bound is used
to get the best achievable estimation performance. Any sensor fusion approach
which achieves this performance is optimum in this regard. CR inequality states
that the reciprocal of the Fisher information is an asymptotic lower bound on
the variance of any unbiased estimator s. Fisher information is a method for
summarizing the influence of the parameters of a generative model on a collection of samples from that model. In this case, the parameters considered are the
means of the Gaussians. Fisher information is the variance, ( 2 ) of the score
(partial derivative of the logarithm of the likelihood function of the network
traffic with respect to 2 ).
Cramer-Rao Bound (CRB) for fused output

Chapter 4

98

score =

ln(L( 2 ; s))
2

Basically, the score tells us how sensitive the log-likelihood is to changes in


parameters. This is a function of variance, 2 and the detection s and this score
is a sufficient statistic for variance. The expected value of this score is zero, and
hence the Fisher information is given as:
E[ 2 ln(L( 2 ; s))2 | 2

Fisher information is thus the expectation of the squared score. A random variable carrying high Fisher information implies that the absolute value of the score
is often high. Cramer-Rao inequality expresses a lower bound on the variance
of an unbiased statistical estimator, based on the Fisher information.
2

1
1

=
F isher inf ormation E [ 2 ln(L( 2 ; X))]2 | 2

If the prior probability of detection of the various IDSs are known, the weights
wi|i=1,n can be assigned to the individual IDSs. The idea is to estimate the
local accuracy of the IDSs. The decision of the IDS with the highest local accuracy estimate will be having the highest weighting on aggregation. The best
fusion algorithm is supposed to choose the correct class if any of the individual IDS did so. This is a theoretical upper bound for all fusion algorithms. Of
course, the best individual IDS is a lower bound for any meaningful fusion algorithm. Depending on the data, the fusion may sometimes be no better than
Bayes. In such cases, the upper and lower performance bounds are identical and
there is no point in using a fusion algorithm. A further insight into CRB can be
gained by understanding how each IDS affects it. With the architecture shown
P
in Fig. 4.1, the model is given by s = ni=1 wi si . The bound is calculated from
2
the effective variance of each one of the IDSs as i2 = wi2 and then combining
i
them to have the CRB as Pn 1 1 .
i=1
2
i

Chapter 4

99

The weight assigned to an IDS is inversely proportional to its variance. This


is due to the fact that, if the variance is small, the IDS is expected to be more
dependable. The bound on the smallest variance of an estimation s is given as:
1
2 = E[(
s s)2 ] Pn

(4.23)

wi2
i=1 i2

It can be observed from equation 4.23 that any IDS decision that is not reliable
will have a very limited impact on the bound. This is because the non-reliable
IDS will have a much larger variance than other IDSs in the group; n2 12 ,2 and hence 1 1 , - - , 1 . The bound can then be approximated
- - , n1
2
2
2
as

1
Pn1
i=1

2
i

n1

Also, it can be observed from equation 4.23 that the bound shows asymp2 =
totically optimum behavior of minimum variance. Then, i2 > 0 and min
min[i2 , , n2 ], then
1
CRB = Pn

2 2
< min
i

1
i=1 2
i

(4.24)

From equation 4.24, it can be shown that perfect performance is apparently


possible with enough IDSs. The bound tends to zero as more and more individual IDSs are added to the fusion unit.
CRBn = Ltn

1
12

1
++

1
n2

(4.25)

For simplicity assume homogeneous IDSs with variance 2 ;


CRBn = Ltn

1
n
2

= Ltn

2
=0
n

(4.26)

From equation 4.25 and equation 4.26, it can be easily interpreted that increasing the number of IDSs to a sufficiently large number will lead to the performance bounds towards perfect estimates. Also, due to monotone decreasing
nature of the bound, the IDSs can be chosen to make the performance as close

Chapter 4

100

to perfect.
Fusion of Dependent Sensors

In most of the sensor fusion problems, individual sensor errors are assumed to
be uncorrelated so that the sensor decisions are independent. While independence of sensors is a good assumption, it is often unrealistic in the normal case.
Considering the general case of statistically dependent decisions, the BahadurLazarsfeld expansion of probability density functions can be used for analysis.
Consider s = [s1 , ..., sn ] to be a vector of the
correlated decisions from individual sensors and P (s) be the probability density function of s. With the prior probabilities of normal traffic and attack
traffic being P0 and P1 respectively, the conditional pdf P (s|attack) is insi pi
troduced as normalized random variable ri1 =
pi qi , where pi = P (si =
1|attack) and qi = 1 pi , i = 1, 2, ..., n, and pdf P (s|normal) as norsi pi
malized random variables ri0 =
pi qi , where pi = P (si = 1|normal) and
qi = 1 pi , i = 1, 2, ..., n. The normalized random variables ri1 and ri0 have
zero mean and unit variance. The Bahadur-Lazarsfeld polynomials are defined
as: i (s) = [1, r1 , r2 , ..., rn , r1 r2 , r1 r3 , , ..., r1 r2 ..rn ] for the respective values
of i = [0, 1, 2, ..., n, n + 1, n + 2, ..., 2n 1] and recalling that the BahadurLazarsfeld polynomial is a product of normalized variables ri , and simplifying,
P
the correlation coefficients of {ri }ni=1 by order ij = s ri rj P (s) (second-order
P
correlation coefficient) and ij...n = s ri rj ...rn P (s) (nth -order correlation coefficient).
Bahadur-Lazarsfeld polynomials

Using the decisions of the local sensors as its input, the fusion unit performs
a likelihood ratio test in order to make a global decision. The optimal fusion
rule of the fusion unit is given by the likelihood ratio as below:

Y (1 F N )
FN
P (S|attack)
=
(
)(si ) (
)(1si)
L(S) =
P (S|normal) i=1
FP
(1 F P )

Chapter 4

X
1+
i<j
X
1+

101

ij1 ri1 rj1

1
ij1 ri1 rj1 rk1 + ... + 12...n
r11 r21 ...rn1

i<j<k

ij0 ri0 rj0

i<j

0
ij0 ri0 rj0 rk0 + ... + 12...n
r10 r20 ...rn0

i<j<k

The log-likelihood ratio for the problem of deciding between the hypotheses,
attack or normal is given by:
logL(s) =

n
X
i=1

1+
log

X
(1 F N )(1 F P )
FN
]+
log(
)+
si [log
FN FP
(1

F
P
)
i=1

ij1 ri1 rj1 +

i<j

1+

1
r11 r21 ..rn1
ij1 ri1 rj1 rk1 + .. + 12..n

i<j<k

ij0 ri0 rj0 +

i<j

0
r10 r20 ..rn0
ij0 ri0 rj0 rk0 + .. + 12..n

i<j<k

where the global detections are given in terms of r1 , ..., rn . The log-likelihood
ratio gives the data fusion rule for a distributed detection system with correlated
local decisions. If the conditional correlation coefficients above a certain order
can be neglected, as in many practical applications, the computational burden
can be reduced. If most correlation coefficients of the local decisions are zero,
the computation gets simplified to the optimal data fusion rule developed by
Chair and Varshney [35] for independent local decisions.
As an illustration, let us consider
a system with three individual IDSs, with a joint density at the IDSs having a
covariance matrix of the form:
Setting bounds on false positives and true positives

1 12 13
V
= 21 1 23
31 32 1

With fusion doing an aggregation of the individual decisions, the false alarm

Chapter 4

102

rate () at the fusion center can be written as:


max

Z
= 1P r(s1 = 0, s2 = 0, s3 = 0|normal) = 1

Ps (s|normal)ds

(4.29)
where Ps (s|normal) is the density of the sensor observations under the hypothesis normal and is a function of the correlation coefficient, . Assuming a
single threshold, T, for all the sensors, and with the same correlation coefficient,
between different sensors, a function Fn (T |) = P r(s1 = 0, s2 = 0, s3 = 0)
can be defined.
Z

Fn (T |) =

y
F n(
)g(y)dy
1

(4.30)

where g(.) and F (.) are the standard normal density and cumulative distribution
function respectively.
F n (X) = [F (X)]n

(4.31)

1
Equation 4.29 can be written depending on whether > n1
or not, as:
Z

y
3 T
max = 1
F (
)f (y)dy
0<1
(4.32)
1

and
max = 1 F 3 (T |)

0.5 < 1

(4.33)

With this threshold T , the probability of detection at the fusion unit can be
computed as:
Z

y
T

T Pmin = 1
F 3(
)f (y)dy
0 < 1 (4.34)
1

and
T Pmin = 1 F 3 (T S |)

0.5 < 1

(4.35)

The above equations 4.32, 4.33, 4.34, 4.35, show the performance improvement of sensor fusion where the upper bound on false positive rate and lower
bound on detection rate are fixed. The system performance was shown to deteriorate when the correlation between the sensor errors is positive and increasing,

Chapter 4

103

while the performance improves considerably when the correlation is negative


and increasing.

4.7 Summary
One of the common reasons for the avoidance of the IDSs as the second and last
stage of defense is due to its less than satisfactory performance. Consequently,
improving the IDS performance is a significant research challenge. In this chapter, we prove that it is possible to improve the performance with multiple IDSs
using advances in sensor fusion. The chapter includes mathematical basis for
sensor fusion in IDS with the theoretical formulation and analysis on the acceptability of sensor fusion in intrusion detection. The sensor fusion system is
characterized and modeled with no knowledge of the IDSs and the intrusion detection data. The need of sensor fusion in IDS is envisaged. The evidence theory
is the method most suited for the fusion of IDSs as seen in this chapter. Having
chosen the sensor fusion method, we address the issues related to sensor fusion
like choosing the threshold bounds, rule-based fusion, Data-dependent Decision
fusion and the modified evidence theory in chapters 5, 6, and 7 respectively.
The study undertaken in this chapter contributes to fusion field in several aspects. It is expected that positive correlation improves reliability of fusion while
negative correlation, improves fusion by means of improved coverage. In this
theoretical study, independent as well as dependent detectors were considered
and the study clarifies the intuition that independence of detectors is crucial in
determining the success of fusion operation. In the case when they are dependent, fusion will lead to improved results but the gain will be smaller. This
is explained by variance reduction due to the combination. The later half of
the chapter takes into account the analysis of the sensor fusion system with a
knowledge of the network traffic distribution. This analysis also resulted in the
acceptance of sensor fusion for enhancing the performance of the intrusion detection. These results are further supported by empirical evidence in the later
chapters.

Chapter 5
Selection of Threshold Bounds for
Effective Sensor Fusion
I have not failed. I have just found 10,000 ways that wont work.
Thomas Alva Edison

5.1 Introduction
In this chapter, we prove the distinct advantages of sensor fusion over individual IDSs. Fusion threshold bounds are derived using the principle of Chebyshev inequality at the fusion center using the false positive rates and detection rates of the IDSs. The goal is to achieve best fusion performance with
the least amount of model knowledge, in a computationally inexpensive way.
The anomaly-based IDSs detect anomalies beyond a set threshold level in the
features it detects. Threshold bounds instead of a single threshold give more
freedom in steering system properties. Any threshold within the bounds can
be chosen depending on the preferred level of trade-off between detection and
false alarms.
All the related work in the field of sensor fusion has been carried out mainly
with one of the methods like probability theory, evidence theory, voting fusion
theory, fuzzy logic theory or neural network in order to aggregate information.
The Bayesian theory is the classical method for statistical inference problems.
The fusion rule is expressed for a system of independent learners, with the distribution of hypotheses known a priori. The Dempster-Shafer decision theory
104

Chapter 5

105

is considered a generalized Bayesian theory. It does not require a priori knowledge or probability distribution on the possible system states like the Bayesian
approach and it is mostly useful when modeling of the system is difficult or
impossible [106]. An attempt to prove the distinct advantages of sensor fusion
over individual IDSs is done in the next section using the Chebyshev inequality,
as an extension to the work done by Zhu et al. [107].

5.2 Modeling the fusion IDS by defining proper threshold


bounds
Every IDS participating in the fusion has its own detection rate Di , and false
positive rate Fi , due to the preferred heterogeneity of the sensors in the fusion
process. Each IDS indexed i gave an alert or no-alert indicated by si taking a
value of one or zero respectively. The fusion center collected these local decin
X
si , where n is
sions and formed a binomial distribution s as given by s=
the total number of IDSs taking part in the fusion.

i=1

Let D and F denote the unanimous detection rate and the false positive rate
respectively. The mean and variance of s in case of attack and no-attack, are
given by the following equations:
E[s|alert] =

n
X

Di , V ar[s|alert] =

i=1

E[s|alert] =

n
X
i=1

n
X

Di (1 Di ); in case of attack

i=1

Fi , V ar[s|alert] =

n
X

Fi (1 Fi ); in case of no-attack

i=1

The fusion IDS is required to give a high detection rate and a low false positive rate. Hence the threshold T has to be chosen well above the mean of the
false alerts and well below the mean of the true alerts. The Figure 5.1 shows a
typical case where the threshold T is chosen at the point of overlap of the two
parametric curves for normal and attack traffics. Consequently, the threshold
bounds are given as:

Chapter 5

106

Figure 5.1: Parametric curve showing the choice of threshold T

n
X
i=1

Fi < T <

n
X

Di

i=1

The detection rate and the false positive rate of the fusion IDS is desired to
surpass the corresponding weighted averages and hence:
n
X

D>

Di2

i=1
n
X

(5.1)
Di

i=1

and

n
X

F <

(1 Fi )Fi

i=1
n
X

(5.2)
(1 Fi )

i=1

Now, using simple range comparison,


n
n
X
X
D = P r{s T |attack} = P r{|s
Di (
Di T )|attack}.
i=1

i=1

Using Chebyshev inequality on the random variable s, with M ean = E[s] =


n
n
X
X
Di and V ariance = V ar[s]=
Di (1 Di ),
i=1

i=1

Chapter 5

107

P r {|s E(s)| k}

V ar(s)
k2

With the assumption that the threshold T is greater than the mean of normal
activity,
n
X
Di (1 Di )
n
n
X
X
P r{|s
Di (
Di T )|attack}

1 i=1n
X
i=1
i=1
(
Di T )2
n
X

From equation 5.1 it follows that 1

i=1
n
X

Di (1 Di )

i=1
n
X

Di2

i=1

n
X

Di T )2

i=1

Di

i=1

The upper bound of T is derived from the above equation as:


v
u n
n
X
uX
Di t
Di
T
i=1

i=1

Similarly, for the false positive rate, F = P r{S T | no-attack}, in order


to derive the lower bound of T ,
n
X

From equation 5.2 it follows that

n
X

Fi (1 Fi )

i=1

(T

n
X
i=1

Fi )2

i=1
n
X

(1 Fi )

i=1

The lower bound of T is derived from the above equation as:


v
u n
n
X
uX
T
Fi + t (1 Fi )
i=1

i=1

The threshold bounds for the fusion IDS is:

Fi (1 Fi )

Chapter 5

108

v
uX
N
u n
P

Fi + t (1 Fi ),
i=1
i=1

n
X

u n
uX
Di t
Di .

i=1

i=1

Since the threshold T is assumed to be greater than the mean of normal activity,
the upper bound of false positive rate F can be obtained from the Chebyshev
inequality as:
V ar[s]
F
(5.3)
(T E[s])2
In a statistical intrusion detection system, a false positive is caused due to the
variance of network traffic during normal operations. Hence, to reduce the false
positive rate, it is important to reduce the variance of the normal traffic. In
the ideal case, with normal traffic the variance is zero. The above equation
5.3 shows that as the variance of the normal traffic approaches zero, the false
positive rate should also approach zero. Also, since the threshold T is assumed
to be less than the mean of the intrusive activity, the lower bound of the detection
rate D can be obtained from the Chebyshev inequality as:
D 1

V ar[s]
(E[s] T )2

(5.4)

For an intrusive traffic, the factor Di (1 Di ) remains almost steady and hence
the variance given as:
n
X
Di (1 Di ), is an appreciable value. Since the variance of
V ariance =
i=1

the attack traffic is above a certain detectable minimum, from equation 5.4, it
is seen that the correct detection rate can approach an appreciably high value.
Similarly the true negatives will also approach a high value since the false positive rate is reduced with IDS fusion.
It has been proved above that with IDS fusion, the variance of the normal traffic
is clearly dropping down to zero and the variance of the intrusive traffic stays
above a detectable minimum. This additionally supports the proof that the fusion IDS gives better detection rate and a tremendously low false positive rate.

Chapter 5

109

5.3 Results and discussion


5.3.1 Experimental evaluation
The fusion IDS and all the IDSs that form part of the fusion IDS were separately
evaluated with the same two data sets, namely 1) the real-world network traffic
and 2) the DARPA 1999 data set. The real traffic within a protected University
campus network was collected during the working hours of a day. This traffic of around two million packets was divided into two halves, one for training
the anomaly IDSs, and the other for testing. The test data was injected with
45 HTTP attack packets using the HTTP attack traffic generator tool called libwhisker [108]. The test data set was introduced with a base rate of 0.0000225,
which is relatively realistic. The test data of the DARPA data set consisted of
190 instances of 57 attacks which included 37 probes, 63 DoS attacks, 53 R2L
attacks, 37 U2R/Data attacks with details on attack types given in Table 4.1. The
large observational data set were analyzed to find unsuspected relationships and
was summarized in novel ways that were both understandable and useful for
the detector evaluation. There are many types of attacks in the test set, many
of them not present in the training set. Hence, the selected data also challenged
the ability to detect the unknown intrusions. When a discrete IDS was applied
to a test set, it yields a single confusion matrix. Thus, a discrete IDS produced
only a single point in the ROC space, whereas scoring IDSs can be used with a
threshold to produce different points in the ROC space.
The fusion IDS was initially evaluated with the DARPA 1999 data set. The
individual IDSs chosen in this work are PHAD and ALAD, two research IDSs
that are anomaly-based and having extremely low false alarm rate of the order
of 0.00002. The other reason for the choice of PHAD and ALAD was that the
two are almost complementary in attack detection. This helps in achieving the
best results from the fusion process. The analysis of PHAD and ALAD has
resulted in a clear understanding of the individual IDSs expected to succeed or
fail under a particular attack. On combining the two sensor alerts and removing
the duplicates, an improved rate of detection is achieved as shown in Table 5.3.
The performance in terms of F-score of PHAD, ALAD and the combination of PHAD and ALAD is shown in the Tables 5.4, 5.5 and 5.6 respectively

Chapter 5

110

Table 5.1: Types of attacks detected by PHAD at 0.00002 FP rate (100 FPs)
Attack type Total attacks Attacks detected % detection
Probe
37
22
60%
DOS
63
24
38%
R2L
53
6
11%
U2R/Data
37
2
5%
Total
190
54
28%

Table 5.2: Types of attacks detected by ALAD at at 0.00002 FP rate (100 FPs)
Attack type Total attacks Attacks detected % detection
Probe
37
6
16%
DOS
63
19
30%
R2L
53
25
47%
U2R/Data
37
10
27%
Total
190
60
32%

Table 5.3: Types of attacks detected by the combination of ALAD and PHAD at 0.00004 FP
rate (200 FPs)
Attack type Total attacks Attacks detected % detection
Probe
37
24
65%
DOS
63
39
62%
R2L
53
26
49%
U2R/Data
37
10
27%
Total
190
99
52%

Table 5.4: F-score of PHAD for different choice of false positives


FP TP Precision Recall Overall Accuracy F-score
50 33
0.39
0.17
0.99
0.24
100 54
0.35
0.28
0.99
0.31
200 56
0.22
0.29
0.99
0.25
500 56
0.10
0.29
0.99
0.15

Table 5.5: F-score of ALAD for different choice of false positives


FP TP Precision Recall Overall Accuracy F-score
50 42
0.45
0.21
0.99
0.29
100 60
0.37
0.31
0.99
0.34
200 66
0.25
0.34
0.99
0.29
500 72
0.12
0.38
0.99
0.18

Chapter 5

111

Table 5.6: F-score of fused IDS for different choice of false positives
FP TP Precision Recall Overall Accuracy F-score
50
44
0.46
0.23
0.99
0.31
100 73
0.42
0.38
0.99
0.40
200 99
0.33
0.52
0.99
0.40
500 108
0.18
0.57
0.99
0.27

Figure 5.2: Detection rate vs Threshold

for various values of false positives by setting the threshold appropriately. The
improved performance of the combination of the alarms from each system can
be observed in Table 5.6, corresponding to the false positives between 100 and
200, by fixing the threshold bounds appropriately. Thus the combination works
best above a false positive of 100 and much below a false positive of 200. In
each of the individual IDSs, the number of detections were observed at false
positives of 50, 100, 200 and 500, when trained on inside week 3 and tested
on weeks 4 and 5. Figures 5.2, 5.3, 5.4 and 5.5 show the selected thresholds
for the false positives of 50, 100, 200 and 500. The fusion IDS has improved
performance than single IDSs for all the threshold values. The performance is
seen to be optimized within the bounds of 100 to 200 false positives.
The improved performance of the fusion IDS over some of the fusion alternatives using the real-world network traffic is shown in Table 5.7.

Chapter 5

112

Figure 5.3: Precision vs Threshold

Figure 5.4: F-score vs Threshold

Figure 5.5: False Negative Rate vs Threshold

Chapter 5

113

Detector/
Total
Fusion Type Attacks
PHAD
45
ALAD
45
OR
45
AND
45
SVM
45
ANN
45
Fusion IDS
45

TP

FP

Precision

Recall

F-score

10
18
22
9
19
19
20

45
45
77
29
49
68
37

0.18
0.29
0.22
0.24
0.3
0.22
0.35

0.22
0.4
0.49
0.2
0.42
0.42
0.44

0.2
0.34
0.30
0.22
0.35
0.29
0.39

Table 5.7: Comparison of the evaluated IDSs using the real-world network traffic

5.4 Summary
Simple theoretical model is initially illustrated in this chapter for the purpose
of showing the improved performance of fusion IDS. The detection rate and
the false positive rate quantify the performance benefit obtained through the
fixing of threshold bounds. Also, the more independent and distinct the attack
space is for the individual IDSs, the better the fusion IDS performs. The theoretical proof was supplemented with experimental evaluation, and the detection
rates, false positive rates, and F-score were measured. In order to understand
the importance of thresholding, the anomaly-based IDSs, PHAD and ALAD
have been individually analyzed. Preliminary experimental results prove the
correctness of the theoretical proof. The chapter demonstrates that our technique is more flexible and also outperforms other existing fusion techniques
such as OR, AND, SVM, and ANN using the real-world network traffic embedded with attacks. The experimental comparison using the real-world traffic
has thus confirmed the usefulness and significance of the method. The unconditional combination of alarms avoiding duplicates as shown in Table 5.3 results
in a detection rate of 52% at 200 false positives, and F-score of 0.4. The combination of highest scoring alarms as shown in Table 5.6 using the DARPA 1999
data set results in a detection rate of 38% and threshold fixed at 100 false positives, and F-score of 0.4.

Chapter 6
Performance Enhancement of IDS using
Rule-based Fusion and Data-dependent
Decision Fusion
There is no greatness where there is no simplicity, goodness and truth.
Leo Tolstoy

6.1 Introduction
In the previous chapter the utility of sensor fusion for improved sensitivity and
reduced false alarm rate was illustrated. In this chapter we have further explored
the general problem of the poorly detected attacks. The poorly detected attacks
reveal the fact that they are characterized by features that do not discriminate
them much. This chapter discusses the improved performance of multiple IDSs
using rule-based fusion and Data-dependent Decision fusion (or DD fusion for
the purposes of this document). The DD fusion approach gathers an in-depth
understanding about the input traffic and also the behavior of the individual
IDSs by means of a neural network learner unit. This information is used to
fine-tune the fusion unit since the fusion depends on the input feature vector.
Thus fusion implements a function that is local to each region in the feature
space. It is well-known that the effectiveness of sensor fusion improves when
the individual IDSs are uncorrelated. The training methodology adopted in this
work takes note of this fact. The performance of Snort has been improved by

114

Chapter 6

115

enhancing its rule base. The overall performance of the fused IDSs using rulebased fusion shows an overall enhancement in the performance with respect to
the performance of individual IDSs. For illustrative purposes two different data
sets, namely the DARPA 1999 data set as well as the real-world network traffic
embedded with attacks, have been used. The DD fusion shows a significantly
better performance with respect to the performance of individual IDSs.
The related work of sensor fusion in intrusion detection application is discussed in chapter 4. The problem of designing IDSs to work effectively and
yield higher accuracies for minority attacks like R2L and U2R even in the mix
of data skewness has been receiving serious attention in recent times. Other
than the related work discussed in chapter 4, predictive classifier models for
rare events are given in [33] and [109]. But, none of these attempts have shown
any significant contribution in overcoming the data skewness problems. Hence
in spite of all the earlier attempts, there is still room for a significant improvement in the detection of rare attacks.
The chapter is organized as follows. Section 6.2 discusses the rule-based
fusion of IDSs. Section 6.3 explains the proposed Data-dependent Decision
(DD) fusion architecture. Section 6.4 describes the algorithm of the proposed
data-dependent decision fusion architecture. This chapter also illustrates and
discusses the results of the proposed architecture. In section 6.5 the conclusion
of the chapter is drawn.

6.2 Rule-based fusion


In sensor fusion, if the sources are overlapping in their decisions and are independently maintained, then the chances of inconsistent decisions are high. Confronted with an inconsistent set of decisions, there are many solutions which
have already been researched and seen in the previous chapter. The different fusion schemes are: (i) combination of all the alarms from each system and avoiding the duplicates, (ii) taking the alarms from each system by fixing threshold
bounds on fusion unit, and (iii) rule-based fusion with a prior knowledge of the
individual sensor performance.
The architecture followed in all of these methods is given in Figure 4.1. The

Chapter 6

116

appropriate adjustment of the fusion threshold optimizes the performance of


the resultant IDS in terms of a high detection rate and a low false alarm rate.
The upper bound of fusion threshold is obtained by Chebyshev inequality by assuming the fusion threshold to be greater than the mean of the normal activity.
Similarly, the lower bound of the threshold is obtained by assuming the threshold to be less than the mean of the intrusive activity. In rule-based method, an
enhancement on the performance of the combined detector using simple rulebased fusion is used, with fusion making use of the objective certainty of a
hypothesis to occur given a particular sensor as the component in fusion. The
rules were introduced with the knowledge of the optimal IDS under different
attack conditions.
An observation of the poorly detected attacks reveal the fact that they are characterized by features that do not discriminate them much. This is claimed by
investigating the relevance of each feature in the 1999 DARPA IDS evaluation
data set. The data set is consolidated into network connections with 41 features
identified per connection. The probes in the traffic are identified by host-based
traffic features, whereas DoS attacks are identified by the basic features derived
from the packet header as well as the time-based traffic features. Thus, both
the anomaly-based detectors PHAD and ALAD perform with sufficiently high
detection rates for DoS attacks, and PHAD outperforms ALAD in detecting
the probes. The attacks R2L and U2R are characterized by the basic features
derived from the packet headers and also the content-based features. Hence,
ALAD performs better than PHAD in detecting the R2L and U2R attacks. The
rule-based combination with the fusion has been used making use of the fact
that given a sensor, the objective certainty of a hypothesis to occur is used for
improving the performance of the combined sensor. The rules are as follows:
If any category of Probe except for ntinfoscan then PHAD
If Probe is ntinfoscan then ALAD
If any category of R2L except for xlock then ALAD
If R2L is xlock then PHAD
If any category of U2R then ALAD

Chapter 6

117

If DoS is due to (fragmentation || checksum error || URG-FIN flag set ||


small packet size) then PHAD
If DoS is due to malicious payload then ALAD

Table 6.1: Types of attacks detected by the rule-based combination of ALAD and PHAD at a
FP rate of 0.000025 (125 FPs)
Attack type Total attacks Attacks detected % detection
Probe
37
24
65%
DOS
63
39
62%
R2L
53
26
49%
U2R/Data
37
10
27%
Total
190
99
52%

Experimental results show that the rule-based fusion performs better with a detection rate of 52% at 125 false positives, F-score being 0.48, than the other
two combinations in the previous chapter. The rule-based fusion works significantly well, compared to the threshold based fusion in the case of detection of
known attacks. However, the threshold-based approach offers an advantage in
identifying the unknown attacks. The rule-based fusion IDS can detect some of
the well-known intrusions with high detection rate, but it is difficult to detect
novel intrusions, and its rule set has to be updated manually and frequently [7].
Thus, while the results were encouraging, it was realized that rule-based fusion
has no possibility of generalizing from previously observed behavior. As a result, the research was pursued further to generalize rule-based fusion and also
to overcome the other disadvantages of rule-based fusion.

6.3 Data-dependent decision fusion


An IDS is expected to work with a very large input data. The rule-based fusion works with only small input data and there is a need for machine learning
algorithm to handle the type of data appearing on the network traffic. The rulebased fusion also has the disadvantage of being dependent on the individual
IDSs that are used for sensor fusion. It is necessary to incorporate an architecture that considers a method for improving the detection rate by gathering an

Chapter 6

118

in-depth understanding about the input traffic and also the behavior of the individual IDSs. This helps in automatically learning the individual weights for the
combination when the IDSs are heterogeneous and shows difference in performance. The architecture should thus be data-dependent and hence the rule set
has to be developed dynamically.
A new data-dependent architecture underpinning sensor fusion to significantly enhance the IDS performance was introduced and implemented in this
work. To this end, the decisions of various IDSs were combined with weights
derived using a machine learning approach. This architecture is different from
conventional fusion architectures and guarantees improved performance in terms
of detection rate and false alarm rate, works well even for large data sets, is capable of identifying novel attacks since the rules are dynamically updated and
has improved scalability.
6.3.1 Motivation
After the 1998 DARPA IDS evaluation, the MIT Lincoln Laboratory has reported that if the best performing systems against each one of the different categories of attacks were combined into a single system, then roughly between 60
to 70 percentage of the attacks would have been detected with a false positive
rate of lower than 0.01%, i.e., lower than 10 false positives a day. However,
none of the previous work of sensor fusion in IDS has reached the Lincoln
Laboratory prediction. None of these approaches can avoid the effect due to
systematic errors of the individual IDSs. They are also prone to mistakes for
unrealistic confidences of certain IDSs. The availability of large volume of experimental data has motivated us to use the machine learning concepts to fuse
the data. The individual weights of the IDSs can be obtained by learning the
behavior of various IDSs for different attack classes, and these weighted decisions can be combined in efficient ways.

6.3.2 Data-dependent decision fusion architecture


This section introduces a better architecture which explicitly incorporates data
dependence in the fusion technique. The disadvantage of the commonly used

Chapter 6

119

fusion techniques which are either implicitly data dependent or data independent, is due to the unrealistic confidence of certain IDSs. The idea in the proposed architecture is to properly analyze the data and understand when the individual IDSs fail. The fusion unit should incorporate this learning from input
as well as from the output of detectors to make an appropriate decision. We
Figure 6.1: Data-dependent Decision fusion architecture
INPUT (x)

IDS1

IDS2

S1

IDSn

S2

Sn

S1
S2
Sn

S1

S2

Sn

OUTPUT (y)
FUSION UNIT

w1
w2

NEURAL NETWORK LEARNER


wn

proposed a three-stage architecture, with optimizing of the individual IDSs as


the first stage, the Neural Network learner determining the weights of the individual IDSs as the second stage, and the fusion unit doing the weighted aggregation as the final stage. The neural network learner can be considered as a
pre-processing stage to the fusion unit. The neural network is most appropriate
for weight determination, since it becomes difficult to define the rules clearly,
mainly as more number of IDSs are added to the fusion unit. When a record is
correctly classified by one or more detectors, the neural network will accumulate this knowledge as a weight and with more number of iterations, the weight
gets stabilized. The architecture is independent of the data set and the structures employed, and can be used with any real valued data set, which is not the
case with rule-based aggregation. Thus, it is reasonable to make use of neural
network learner unit to understand the performance and to assign weights to
various individual IDSs in the case of a large data set. The neural network has
the capability to generalize from past observed behavior to identify novel attack
inputs and hence to give the proper weighting to the individual IDSs.
The weight assigned to any IDS not only depends on the output of that IDS
as in the case of the probability theory or the Dempster-Shafer theory, but also
on the input traffic which causes this output. A neural network unit is fed with
the output of the IDSs along with the respective input for an in-depth understanding of the reliability estimation of the IDSs. The alarms produced by the

Chapter 6

120

different IDSs when they are presented with a certain attack clearly tell which
sensor generated more precise result. The output of the neural network unit corresponds to the weights which are assigned to each one of the individual IDSs.
The IDSs can be fused to produce an improved resultant output with these improved weight factors. Thus, the proposed architecture refers to a collection
of diverse IDSs that respond to an input traffic and the weighted combination
of their predictions. The weights are learned by looking at the response of the
individual sensors for every input traffic connection. The fusion output can be
represented as:
y = Fj (wij (x, sji ), sji ),
where the weights wij are dependent on both the input x as well as individual IDSs output Si , the suffix i refers to the IDS index and the prefix j refers to
the class label. The fusion unit gives a value of zero or one depending on the set
threshold being higher or lower than the weighted aggregation of the decisions
of individual IDSs respectively.
In the case of intrusion detectors ALAD and PHAD, the training is done by
considering more of the data and at the same time optimally, which will likely
decrease the bias of the individual detectors. So the individual IDSs chosen are
of low bias, comparatively high variance (which gets reduced on fusion), and a
low error correlation (or they make different error or a high variance component
of error). Hence the proposed data-dependent architecture allows the IDSs to
develop diversity while being trained. What is required of the final fusion unit
is that it generalizes well after training (reduced bias) on an unexpected traffic stream and additionally avoid over-fitting, which ensures variance reduction.
Thus the proposed architecture exploits the experimental observation made in
work of Giacinto et al.[95] that training is done in different feature subspaces.
The test has been conducted on the entire test set and then the evidence is combined to produce the final decision. The neural network learner was introduced
to process the entire available feature set to extract more effective signatures
than the ones hand-coded by the rule-based fusion. The algorithm of the proposed data-dependent architecture is given in section 6.5.3.

Chapter 6

121

6.3.3 Detection of rarer attacks


In most of the available literature the imbalance in the network data is overcome by resampling the training distribution. The resampling is commented in
general, and in particular with the experiment that has been conducted in this
thesis in the following manner: There is no point in reducing the normal data
in the training data set since this data set is an expected replication of the realworld data and the data set available has a distribution more or less like the
naturally occurring class distribution. Additionally, changing the data distribution complicates the analysis of IDS because it will result in the IDSs behaving
in unexpected or unpredictable manner. Varying the size of the training set affects the accuracy of the IDS in predicting the class of test samples that belong
to each of these classes. Hence, it is a good idea to learn from the data in the
same form as it is available. In the case of anomaly detectors learning from the
normal data, the more the normal data, the more efficient they are in detecting
the attacks. Hence, while using anomaly detectors that learn from the normal
traffic, the normal samples in the data should not be reduced. Also, the base-rate
fallacy is not a factor that can be avoided. The only counter-measure is to set
the acceptable false alarm rate to be extremely low, almost as low as the prior
probability as established in chapter 2.
Several of the detection algorithms present the results with high detection
rate and low false positive rate without considering the real impact of the huge
number of false alerts generated with skewness in network traffic. Since one
of the main goals of this work is to prevent the misinterpretation of the metrics
used, a good estimate for a low false alarm rate is to set it almost equal to the
prior probability of attack. In that case the precision approximates 0.5 assuming
the detection rate to be very high. The new evaluation criterion is defined as:
max T P rate s.t : precision pmin
F P rate, T P rate ROC space
The ROCspace is defined by F P rate and T P rate as x and y axes respectively,
which depicts the relative trade-offs between the T P s and the F P s.

Chapter 6

122

6.4 Results and discussion


6.4.1 Test setup
The weight analysis of the IDS data coming from the three IDSs, PHAD, ALAD,
and Snort was carried out by the neural network supervised learner before it
was fed to the fusion element. The detectors PHAD and ALAD produces the
IP address along with anomaly score whereas the Snort produces the IP address
along with severity score of the alert. The alerts produced by these IDSs are
converted to a standard binary form. The neural network learner inputs these
decisions along with the particular traffic input, which was monitored by the
IDSs. The traffic input to the neural network were in terms of the connectionbased features namely source IP, destination IP, source port, destination port,
transport layer protocol, session duration, bytes exchanged and the throughput
of the session.
The neural network learner was designed as a feed forward back propagation
algorithm with a single hidden layer and 25 sigmoidal hidden units in the hidden
layer. Experimental proof is available for the best performance of the neural
network with the number of hidden units being log(T ), where T is the number
of training samples in the dataset [111]. In order to train the neural network, it
is necessary to expose them to both normal and anomalous data. Hence, during
the training, the network was exposed to weeks 1, 2, and 3 of the training data
and the weights were adjusted using the back propagation algorithm. An epoch
of training consisted of one pass over the training data. The training proceeded
until the total error made during each epoch stopped decreasing or 1000 epochs
had been reached.
The fusion unit performed the weighted aggregation of the IDSs outputs for
the purpose of identifying the attacks in the test data set. It used binary fusion
by giving an output value of 1 or 0 depending on the value of the weighted
aggregation of the decisions from the IDSs. The packets were identified by their
timestamp on aggregation. A value of 1 at the output of the fusion unit indicated
the record to be under attack and a 0 indicated the absence of an attack.

Chapter 6

123

6.4.2 Data set


All the intrusion detection systems that form part of the fusion IDS were separately evaluated with the same two data sets, namely 1)the real-world network
traffic and 2) the DARPA 1999 data set as discussed in chapter 5. The empirical evaluation of the data-dependent decision fusion method was also observed
with the same two data sets.
With majority of the IDSs it was observed that probes and DoS attacks have
high detection rate whereas attacks like R2L and U2R have lower detection
rates. The reason was again evident by observing the training data of the week
two which had appreciably high proportion of probes and DoS attacks. Also, the
R2L and the U2R attacks in the training and testing data sets represent dissimilar
target hypotheses. ALAD and PHAD being anomaly detectors, the evaluations
try to avoid these biasing by getting trained with the attack-free training data
of the week one and week three respectively. Since the third IDS, Snort performs misuse detection, even Snort was unaffected with this disproportionate
R2L and U2R traffic in the training and test data. It is important to mention at
this point that the proposed architecture can be generalized beyond the data set
or the IDSs that were used. The proposed method is independent of the input
traffic or the individual IDSs that take part in fusion.
6.4.3 Data-dependent decision fusion algorithm
Training of IDSs
Input:
The DARPA 1999 training dataset {(xn , yn )}where n refers to the number
of the record in the dataset.
The two anomaly IDSs are trained with ALAD on week one and PHAD
on week three.
Testing of IDSs
Input:

Chapter 6

124

The DARPA 1999 test dataset {(xj )}where j refers to the number of the
record in the dataset.
The testing of the three IDSs are done on the test data of weeks four and
five.
Output:
The IDSs output {si } where i corresponds to the IDS number.
Training of Neural Network learner
Input:
The IDSs output {si } where i corresponds to the IDS index.
The DARPA training data set {(xn , yn )}where n refers to the number of
the record in the data set.
The IDS outputs as well as the training class labels are such that si , yi Ck
where Ck are the 58 class labels, and k varies from 1 to 58. With the IDSs used
in this experiment, it was simplified as a binary detector with class labels either
zero or one depending on the anomaly score of the anomaly detectors or the
severity of the Snort alert.
Training:
MATLAB Neural Network tool box is used.
Algorithm:Feed Forward Back Propagation
:Four input neurons
:One hidden layer with 25 sigmoidal units
:Output layer of three neurons
The three inputs correspond to the outputs of the three constituent IDSs
and the fourth input neuron is a vector which corresponds to a single record
of the DARPA data set where all values of the vector are run.

Chapter 6

125

Testing of the Neural Network learner


Input:
The IDSs output {si } where i corresponds to the index of IDSs.
The DARPA test dataset {(xj )}where j refers to the number of the record
in the dataset.
The IDS outputs are such that si Ck where Ck are the 58 class labels, and k
varies from 1 to 58. With the IDSs used in this experiment, it was simplified as
a binary detector with class labels either zero or one depending on the anomaly
score of the anomaly detectors or the severity of the Snort alert.

Testing:
MATLAB Neural Network tool box was used.
Algorithm:Feed Forward Network
:Four input neurons
:One hidden layer with 25 sigmoidal units
:Output layer of three neurons
Output:
n o
wij is the output of the NN learner which is expected to give a measure
of the reliability of each of the IDS, i depending on the observed data class
type j.
Fusion Unit
Input:
The IDSs output {si } where i corresponds to the index of IDSs.

Chapter 6

126

Table 6.2: Types of attacks detected by PHAD at a false positive rate of 0.00002 (100 FPs)
Attack Type Total attacks Attacks detected % detection
Probe
37
26
70%
DoS
63
27
43%
R2L
53
6
11%
U2R/Data
37
4
11%
Total
190
63
33%

n o
The weight factor for the fusion process, wij which is the output of the
NN learner which is expected to give a measure of the reliability of each
of the IDS, i depending on the observed data class type j.
Output:
Binary fusion output is one if the weighted linear aggregation of the output
from all the IDSs is greater than zero and zero otherwise.
y = 1,
0,

if

wij si > 0

otherwise

6.4.4 Experimental evaluation


All the IDSs that take part in fusion were modified and separately evaluated
with the same data set, and then the empirical evaluation of the proposed datadependent decision fusion method was also presented. It can be observed from
the Tables 6.2, 6.3 and 6.4 that the attacks detected by different IDSs were not
necessarily the same and also that no single IDS was able to provide acceptable
values of all the performance measures. A quantitative analysis provides the
correlation coefficient among the different sensors as follows:
Correlation coefficient of PHAD and ALAD: -0.36
Correlation coefficient of PHAD and snort: -0.42
Correlation coefficient of ALAD and snort: 0.59
The results as seen from Table 6.5 and Figure 6.2 support the validity of the
proposed approach compared to the various existing fusion methods of IDSs.
The results evaluated in Table 6.6 show that the accuracy and AUC were heavily biased to favor the majority class. The ROC Semilog curves of the individual

Chapter 6

127

Table 6.3: Types of attacks detected by ALAD at a false positive rate of 0.00002 (100 FPs)
Attack Type Total attacks Attacks detected % detection
Probe
37
9
24%
DoS
63
23
37%
R2L
53
31
59%
U2R/Data
37
15
31%
Total
190
78
41%
Table 6.4: Types of attacks detected by Snort at a false positive rate of 0.0002 (1000 FPs)
Attack Type Total attacks Attacks detected % detection
Probe
37
15
41%
DoS
63
35
56%
R2L
53
30
57%
U2R/Data
37
34
92%
Total
190
115
61%
Table 6.5: Types of attacks detected by DD fusion IDS at a false positive rate of 0.00002 (100
FPs)
Attack Type Total attacks Attacks detected % detection
Probe
37
28
76%
DoS
63
40
64%
R2L
53
34
64%
U2R/Data
37
34
92%
Total
190
136
70%
Table 6.6: Comparison of the evaluated IDSs with various evaluation metrics
Detector
P
R
F-score Accuracy AUC
PHAD
0.39 0.33
0.36
0.99
0.66
ALAD
0.44 0.41
0.42
0.99
0.71
Snort
0.10 0.61
0.17
0.99
0.80
Data-dependent fusion 0.42 0.7
0.53
0.99
0.85
Table 6.7: Detection of different attack types by single IDSs and data-dependent decision fusion
IDS
Fusion/
Attack Type PHAD ALAD Snort Data-dependent decision fusion
Detection %
Probe
70%
24%
41%
76%
DoS
43%
37%
56%
64%
R2L
11%
59%
57%
64%
U2R/Data
11%
31%
92%
92%
False Positive% 0.002% 0.002% 0.02%
0.002%

Chapter 6

128

IDSs and the DD fusion IDS are given in Figure 6.3. The log-scale was used
for the x-axis to identify the points which would otherwise be crowded on the
y-axis. The results presented in Table 6.7 and Figure 6.4 indicate that the DD
fusion method performs significantly better with high recall as well as high precision as against achieving the high accuracy alone using the DARPA data set.
In the case of an IDS, there are both the security requirements and the acceptability requirements. The security requirement is determined by the T Prate and
the acceptability requirement is decided by the number of F P s because of the
low base rate in the case of network traffic. The hypothesis that the proposed
model is suitable for the detection of rare classes of attacks is empirically evaluated in the next section using the DARPA 1999 data set. It may be noted that
the false positive rates differ in the case of Snort as it was extremely difficult to
try for a fair comparison with equal false positive rates for all the IDSs because
of the unacceptable ranges for the detection rate under such circumstances.
Figure 6.2: Performance of evaluated systems

Figure 6.3: Semilog ROC curve of single and DD fusion IDSs


ROC SEMILOG CURVE
1
0.9

TRUE POSITIVE RATE

0.8
PHAD
ALAD
Snort
DD

0.7
0.6
0.5
0.4
0.3
0.2
0.1
0 6
10

10

10

10

10

10

FALSE POSITIVE RATE (LOG SCALE)

10

Chapter 6

129

Figure 6.4: Comparison of evaluated systems

Figure 6.5: Detection of Attack Types

Chapter 6

130

Detector/
Fusion Type
PHAD
ALAD
Snort
OR
AND
SVM
ANN
Data-dependent
Decision Fusion

Total
Attacks
45
45
45
45
45
45
45
45

TP

FP

Precision

Recall

F-score

11
20
13
34
9
23
25
27

45
45
400
470
29
44
94
42

0.2
0.31
0.03
0.07
0.22
0.24
0.21
0.39

0.24
0.44
0.29
0.76
0.2
0.51
0.56
0.6

0.22
0.36
0.05
0.13
0.22
0.33
0.31
0.47

Table 6.8: Comparison of the evaluated IDSs using the real-world data set
Table 6.9: Performance comparison of single IDSs and DD fusion IDS
Detector pairs/
DD fusion and PHAD DD fusion and ALAD DD fusion and Snort
Z-number
ZR
7.2
5.7
1.9
ZP
0.3
-0.2
4.1

Table 6.8 demonstrates that the DD fusion method outperforms other existing
fusion techniques such as OR, AND, SVM, and ANN using the real-world network traffic.
The comparison of IDSs with the metric F-score has the limitation in directly applying tests of significance to it in order to determine the confidence
level of the comparison. The primary goal of this work is to achieve improvement in recall as well as precision for the rare classes. Hence an improved IDS
comparison test called the P-test [110] was included to take into account the
improvement in both recall as well as precision. This advanced test to compare
IDS performance takes into account the improvement in both recall as well as
precision. The result in Table 6.9 shows that the DD fusion performs significantly better than any of the individual IDSs. It performs better than PHAD
and ALAD in terms of recall and is comparable to PHAD and ALAD in case of
precision. The DD fusion works exceptionally better than the Snort in terms of
both precision and recall. Hence the proposed approach outperforms the existing state-of-the-art techniques of its class, for an optimum performance in terms
of both recall and precision.

Chapter 6

131

In real world network environment, the rare attacks like U2R and R2L are more
dangerous than probe and DoS attacks. Hence, it is essential to improve the
detection performance of these rare classes of attacks while maintaining a reasonable overall detection rate. The results presented in Table 6.7 and Figure
6.4 indicate that the proposed method performs significantly better for rare attack types with a high recall as well as a high precision as against achieving
the high accuracy alone. The claim that the proposed method performs better is supported by a statement from Kubat et al. [1998], which states that
a classifier that labels all regions as majority class will achieve an accuracy of
96%...a system achieving 94% on the minority class and 94% on the majority
class will have worse accuracy yet be deemed highly successful. With the proposed method, an intrusion detection of 70% with a false positive of as low as
0.002% has been achieved. The F-score has been improved to 0.53.
6.4.5 Discussion
Most of the U2R attacks like loadmodule, perl and sqlattack are made stealthy
by running the attack over multiple sessions. These attacks are detected by Snort
but at the expense of a higher false positive rate. Snort was comparably better
in the detection of all the new attacks like queso, arppoison, dosnuke, selfping, tcpreset, ncftp, netbus, netcat, sshtrojan, ntfsdos, and sechole for which
Snort had the signatures available. Snort identifies the attack warezclient which
downloads illegal copies of software by an addition of the rule of looking for
executable codes on the FTP port. Thus each rule in the rule set uses the most
discriminating feature values for classifying a data item into one of the class
types.
Although the research discussed in this thesis has thus far focused on the
three IDSs, namely, PHAD, ALAD and Snort, the algorithm works well with
any IDS. The proposed system provides great benefit to a security analyst. Snort
was comparably better in the detection of all the new attacks for which the signatures were available with Snort. The computational complexity introduced by
the proposed method can be justified by the possible gains which are illustrated.
The result of the data-dependent decision fusion method is better than what
has been predicted by the Lincoln Laboratory after the DARPA IDS evaluation.

Chapter 6

132

With the fusion architecture proposed in this chapter, an improved intrusion detection of 70% with a false positive of as low as 0.002% and F-score to 0.53
were achieved.

6.5 Summary
We have adapted and extended notions from the field of multisensor data fusion
for the rule-based fusion and the data-dependent decision fusion. An enhancement in the performance of the combined detector using simple rule-based fusion is demonstrated in the initial part of this chapter, with fusion making use
of the objective certainty of a hypothesis to occur given a particular sensor as
the component in fusion. The extensions are principally in the area of generalizing feature similarity functions to comprehend observances in the intrusion
detection domain. The approach has the ability to fuse decisions from multiple,
heterogeneous and sub-optimal IDSs.
In the proposed data-dependent decision fusion architecture, a neural network unit was used to generate a weight factor depending on the input as well
as the IDSs outputs. The method considers appropriate weights to various individual IDSs that take part in fusion. This results in a more accurate and precise
detection for a wider class of attacks. If the individual sensors were complementary and looked at different regions of the attack domain, then this DD fusion
enriches the analysis on the incoming traffic to detect attack with appreciably
low false alarms. The individual IDSs that are components of this architecture
in this particular work were PHAD, ALAD and Snort with detection rates 0.33,
0.41 and 0.61 respectively after modifications to these IDSs. The false positive
rates of PHAD and ALAD were acceptable whereas that of Snort was unexceptionally high. The results obtained by the proposed architecture illustrate that
the DD approach improved beyond the existing fusion approaches with the best
performance in terms of improved accuracy. The marginal increase in the computational requirement introduced by the data-dependency can be justified by
the acceptable ranges of false alarms and an overall detection rate of 0.7, which
have resulted with exceptionally large data set and suboptimal constituent IDSs.
It is also shown that our technique is more flexible and also outperforms other

Chapter 6

133

existing fusion techniques such as OR, AND, SVM, and ANN. The experimental comparison of the DD fusion method using the real-world traffic has
confirmed its usefulness and significance.
The data skewness in the network traffic demands for an extremely low false
positive rate of the order of the prior probability of attack for an acceptable value
of the Bayesian attack detection rate. The research and development efforts in
the field of IDS, and the state-of-the-art IDSs, all are still with marginal detection rates and high false positive rates, especially in the case of stealthy, novel
and R2L attacks. In the environment in which an IDS is expected to operate, the
attacks are the minority, requiring very low false positive rates for acceptable
detection. A basic domain knowledge about network intrusions makes us understand that U2R and R2L attacks are intrinsically rare. The poor performance
of the detectors has been improved by discriminative training of anomaly detectors and incorporating additional rules into the misuse detector. This chapter proposes a new approach of machine learning method where corresponding
learning problem is characterized by a number of features, skewness in data and
the class of interest being the minority class and the minority attack type, and
the non uniform misclassification cost. The proposed method has successfully
demonstrated that the neural network learner encapsulates expert knowledge for
the weighted fusion of individual detector decisions. This creates an adaptable
algorithm that can substantially outperform state-of-the art methods for minority class type detection in both coverage and precision. The evaluations show
the strength and ability of the proposed approach to perform very well with
64% detection for R2L attacks and 92% detection for U2R attacks with an overall false positive rate of 0.002%. The experimental comparison of this method
has confirmed its usefulness and significance.

Chapter 7
Modified Dempster-Shafer Theory for
Intrusion Detection Systems
A consensus mean everyone agrees to say collectively what no one else believes individually.
Abba Eban

7.1 Introduction
Sensor fusion using heterogeneous IDSs are employed to aggregate different
views of the same event. This helps in achieving improved detection through
detector reinforcement or complementarity. There is a factor of uncertainty in
the results of most of the IDSs available in literature. The main reasons for uncertainty are vagueness and imprecision. One of the techniques of sensor fusion
is the Dempster-Shafer evidence theory [112, 113, 114], which can be used to
characterize and model various forms of uncertainty. In DS theory, evidence
can be associated with multiple possible events, e.g., sets of events. As a result,
evidence in DS theory can be meaningful at a higher level of abstraction without
having to resort to assumptions about the events within the evidential set.
The use of data fusion in the field of DoS anomaly detection is presented by
Siaterlis and Maglaris [79]. The Dempster-Shafer theory of evidence is used
as the mathematical foundation for the development of a novel DoS detection
engine. The detection engine is evaluated using the real network traffic. The
superiority of data fusion technology applied to intrusion detection systems is
134

Chapter 7

135

presented in the work of Wang et al. [96]. The method used is information collection from the network and host agents and application of Dempster-Shafer
theory of evidence. Another work incorporating the Dempster-Shafer theory of
evidence is by Hu et al. [97]. The Dempster-Shafer theory of evidence in data
fusion is observed to solve the problem of how to analyze the uncertainty in a
quantitative way. In the evaluation, the ingoing and outgoing traffic ratio and
service rate are selected as the detection metrics, and the prior knowledge in the
DDoS domain is proposed to assign probability to evidence.
The most prominent of the alternative combination rules to the Dempster-Shafer
method is a class of unbiased operators developed by Ron Yager [115]. Yager
points out that an important feature of combination rules is the ability to update
an already combined structure when new information becomes available. This
is frequently referred to as updating and the algebraic property that facilitates
this is associativity. Dempsters rule is an example of an associative combination
operation and the order of the information does not impact the resulting fused
structure. Yager [116] points out that in many cases a non-associative operator is
necessary for combination. A familiar example of this is the arithmetic average.
The arithmetic average is not itself associative, i.e., one cannot update the information by averaging an average of a given body of data and a new data point to
yield a meaningful result. However, the arithmetic average can be updated by
adding the new data point to the sum of the pre-existing data points and dividing by the total number of data points. This is the concept of a quasi-associative
operator that Yager introduced in his work [116]. Quasiassociativity means that
the operator can be broken down into associative sub-operations. Through the
notion of quasi-associative operator, Yager develops a general framework to
look at combination rules where associative operators are a proper subset.
The Transferable Belief Model (TBM) is an elaboration on the Dempster-Shafer
theory of evidence developed by Smets [117], based on the intuition that in the
case of conflict, the result should allocate most of the belief weight to the empty
set. Technically, this would be done by using the TBM conjunction rule for
non-interactive sources of information, which is the same as Dempsters rule
of combination without renormalization. While most other theories adhere to

Chapter 7

136

the axiom that the probability (or belief mass) of the empty set is always zero,
Smet has an intuitive reason to drop this axiom: the open-world assumption. It
applies when the frame of reference is not exhaustive, when there are reasons
to believe that an event not described in the frame of reference will occur.
Murphy [118] presents another problem of classical Dempsters combination
rule, the failure to balance multiple bodies of evidence. The averaging best
solves the normalization problems and has much attractive features such as
identifying combination problems, showing the distribution of the beliefs and
preserving a record of ignorance. However, averaging does not offer convergence toward certainty. The fusion technique proposed in this chapter was expected to combine the output of various IDSs with subjective judgements. The
feasibility of this idea has been demonstrated via an analysis case study with
several IDSs distributed over a LAN network and using the replayed DARPA
data set. It was expected to have a sensor fusion architecture to face the new
challenges in sensor fusion. It can be later modified to a generalizable solution
beyond any specific application.
This chapter is organized as follows: In section 7.2, we briefly recall the DempsterShafer theory of evidence, its weighted extension and also its disadvantages.
Section 7.3 illustrates the disjunctive combination of evidence which helps in
evidence aggregation. Section 7.4 discusses the modified evidence approach
with a more detailed observation on the performance of this approach. A general discussion on the proposed approach for the particular application of sensor
fusion in intrusion detection is also included. Section 7.5 includes experimental
evaluation with a brief discussion on the impact of this work. In section 7.6 the
summary of this chapter is drawn.

7.2 Dempster Shafer combination method


The Dempster-Shafer theory is explained in detail in chapter 4. In this chapter
we formalize the problem as follows: Considering the DARPA data set, assume
a traffic space = DoS P ortsweep R2L U 2R N ormal, of five
mutually exclusive classes. Each IDS assigns to the traffic, the detection of any

Chapter 7

137

of the traffic sample x, that denotes the traffic sample to come from the class
which is an element of the FoD, . With n IDSs used for the combination,
the decision of each one of the IDSs is considered for the final decision of the
fusion IDS.
7.2.1 Motivation for choosing the Dempster Shafer combination method
Even though the research work discussed in chapter five has given encouraging results, we have realized that there was no possibility of detecting the novel
attacks because of the difficulty in generalizing from any previously observed
behavior. As a result, we pursued further by using a neural network learner that
understands the reliability of each one of the IDSs corresponding to the data
and accordingly provided a weight to every IDS decision and then make use of
an appropriate fusion operator. Specifically, we are interested in the capability
of neural network to learn the confidence to be assigned to every IDS and then
the fusion unit that optimally fuses the IDSs.
This thesis presents a method to detect the traffic attacks with an increased degree of confidence by a fusion system composed of different detectors. Each
IDS observes the same network traffic from various points on the network and
also detects the attack traffic with an uncertainty index. The frame of discernment consists of singletons that are exclusive (Ai Aj = , i 6= j) and are
exhaustive since the FoD consists of all the expected attacks which the individual IDS detects or else the detector fails to detect by recognizing it as normal
traffic. The DS rule corresponds to a conjunction operator: it builds the belief
induced by accepting two pieces of evidence, i.e., by accepting their conjunction. Shafer developed the DS theory of evidence based on the model that all
the hypotheses in the FoD are exclusive and that the frame is exhaustive. The
purpose is to combine/aggregate several independent and equi-reliable sources
of evidence expressing their belief on the set. The DS combination rule gives

Chapter 7

138

the combined mass of the two evidence m1 and m2 on any subset A of the FoD.
P
m(A) =

m1 (X)m2 (Y )

X Y =A
1

(7.1)
m1 (X)m2 (Y )

X Y =
The denominator of the equation 7.1 is of the form 1 k, where k is the conflict between the two evidence. This denominator is for normalization, which
spreads the resultant uncertainty of any evidence with a weight factor, over all
focal elements and results in an intuitive decision. Thus, the effect of normalization consists of eliminating the conflicting pieces of information between the
two sources to combine, consistently with the intersection operator. Whether
normalized or not, the DS method satisfies the two axioms of combination:
0 m(A) 1 and

m(A) = 1 . The third axiom,

m() = 0 is not

A
satisfied by the unnormalized DS method. Also, independence of evidence is
yet another requirement for the DS combination method. The classical DS theory treats all sensors democratically, but this is not reality since some are more
precise and accurate than others. Hence, we have the weighted Dempster-Shafer
method. In simplified situations this weight factor matches with the prior probability in the classical Bayesian inference method. The weighted and extended
DS can be used to:
Realize differential trust scheme on sensors
Mitigate conflicts that cause counter-intuitive results using classical DS
evidence combination rule.
7.2.2 Limitations of the Dempster-Shafer combination
In the case of full contradiction between the bodies of evidence, k = 1, and such
=1
a case occurs when there exists A such that Bel1 (A) = 1 and Bel2 (A)
as in Table 7.1. The computation of the combined evidence is done for DS

Chapter 7

139

m1
m2

A B
1 0
0 1

Table 7.1: Evidence with total conflict

method and its other alternative methods like Yager [115, 116], Smets TBM
[117], and the Murphys averaging [118].
DS method: m(A) = 0, m(B) = 0
Yager method: m(A) = 0, m(B) = 0, m() = 1
Smets TBM: m(A) = 0, m(B) = 0, m() = 1
Murphys averaging: m(A) = 0.5, m(B) = 0.5
One more case of contradiction between the bodies of evidence is shown with a
different example in Table 7.2.

m1
m2

A
0.9
0

B
C
0.1 0
0.1 0.9

Table 7.2: Evidence with conflict

DS method: m(A) = 0, m(B) = 1, m(C) = 0


Yager method: m(A) = 0, m(B) = 0.01, m(C) = 0, m() = 0.99
Smets TBM: m(A) = 0, m(B) = 0.01, m(C) = 0, m() = 0.99
Murphys averaging: m(A) = 0.45, m(B) = 0.1, m(C) = 0.45
The conflict in evidence either gave non-intuitive results as with DS, or conflicts were ported over to uncertainty as in Yager or to the null set as in Smets
TBM or averaged as in Murphys. However, none of them seem to be intuitive
or reasonable from the point of view of improving the belief. We conclude that
conflict is not expected to draw a clear conclusion in one step. Final decision has
to be made depending on the collection of additional evidence. Hence we should
not forcefully converge the conflicting evidence till we get more evidence. It is
better to aggregate evidence by the union operator without suppressing any of
the available evidence as in the case of DS.

Chapter 7

140

PHAD(m1 )
ALAD(m2 )
Snort(m3 )

N ormal
0.4
0.1
0

P robe
0.6
0.1
0.3

DoS
0
0
0

U 2R
0
0
0

R2L
0
0.8
0.7

Table 7.3: Evidence from four sensors with one unreliable using the DS method

Another major drawback with DS and its alternatives except for Murphys is
that since it uses conjunctive combination, if any one or more sensors fail to
give evidence on a particular class, the evidence from other sensors on that particular class will have no effect and the intersection becomes a null set. A sensor
might fail to give evidence in cases when it is not tuned for that particular class
of attack due to the shortcomings of the technology used or due to some other
reason. This disadvantage was overcome by Murphys averaging, but this result
also looks counterintuitive since if one evidence fails, the belief of that hypothesis gets weakened. The DS combination with one of the sensors in the fusion
being totally unreliable gives rise to counter-intuitive results as illustrated with
an example in Table 7.3.
DS method: m(N ormal) = 0, m(P robe) = 1, m(DoS) = 0, m(U 2R) = 0,
m(R2L) = 0
Yager method: m(N ormal) = 0, m(P robe) = 0.018, m(DoS) = 0, m(U 2R) =
0, m(R2L) = 0, m() = 0.982
Smets TBM: m(N ormal) = 0, m(P robe) = 0.018, m(DoS) = 0, m(U 2R) =
0, m(R2L) = 0, m() = 0.982
Murphys averaging: m(N ormal) = 0.17, m(P robe) = 0.33, m(DoS) = 0,
m(U 2R) = 0, m(R2L) = 0.5
In this particular case with the evidence m1 unreliable, the result turns out to
be counter-intuitive on providing equal weight factor to all the evidence. If
proper weight factor is given to the evidence depending on their reliability, the
Murphys method gives acceptable results as in the modified averaging method.
We propose to use the union operator for aggregating the evidence in all the
above cases where the DS fails. This is because in case of conflict, further evidence aids convergence. Also in case of zero evidence from any one or more of

Chapter 7

141

the sensors, the union operator works the same as the averaging operator, which
is the best that can be thought of. Thus, if the intersection of the evidence is
not empty, the sources overlap and the combination rule can be intersection.
Otherwise at least one of the sources is necessarily wrong, and a more natural
combination rule is union, which assumes that all the sources are not wrong.

7.3 Disjunctive combination of evidence


The union approach can be considered under situations where different IDSs are
specialized on different types of attack detection and hence may not respond to
a certain attack. Thus the union approach focuses on the best-case behavior
of each IDS. The combination of IDSs is done so as to utilize the strength of
each IDS. Since the hypothesis includes each IDS in the binary state realizing
the traffic as a particular attack or as normal, all the hypotheses are singletons.
Hence it is more simpler than the generalized case with the hypotheses taking
any possible subset of the power set of FoD. The equation to find the mass of
singletons is given by:
P
(m1 (X) + m2 (Y ))
X=Y =A
m(A) =
P
P
(m1 (X) + m2 (Y ))
m(A)
A
X 6= Y

P
=

(m1 (X) + m2 (Y ))

X=Y =A
P
(m1 (X) + m2 (Y ))
X=Y
(7.2)

The numerator of the above equation relates to the disjunctive combination and
the final mass is calculated by the normalization with respect to the entire power

Chapter 7

142

set 2 which is closed under union, intersection and complement and hence is a
sigma algebra. This normalization allows the disjunctive combination equation
to satisfy all the axioms of the evidence theory.
Additionally, conflicts can be thought of as due to uncertainty whereby the IDS
cannot take the decision correctly and the collective information will also be
ambiguous. Hence only if the reliability of the sensors are known, a conclusion
can be drawn by suppressing some evidence over the others, else it is better
to aggregate all the evidence and combine conjunctively with other evidence
agreeable to the aggregated evidence. The normalization is not done till the
conjunctive combination is done for converging the results.
The properties of associativity and commutativity are satisfied with a disjunctive combination if normalization is done only at the final combination stage.
In intrusion detection application, with singletons as the expected hypotheses,
Bel(AU B) = Bel(A) + Bel(B), which is same as Bayesian method, since
in case of singletons DS simplifies to Bayesian. However, the advantage of
evidence combination is that more evidence can be combined in a single step,
without the knowledge of the associated probability distribution.

7.4 Context-dependent operator


The DS operator is most acceptable except for its two disadvantages that were
highlighted in section 7.2.2. Hence we require a Context-Dependent operator
for the decision fusion which is supposed to utilize enough available information before making a final decision and this operator is expected to be:
Conjunctive, if the sources have very low conflict and also when all masses
are non-zero. The fusion should then behave as a severe operator where
the common or redundant part gets chosen and the mass of the less certain
information gets reduced.
Disjunctive, if sources conflict and also when any one or more beliefs happen to be zero.
Compromise or average, in case of partial conflict.

Chapter 7

143

The context-dependent operator is expected to have an adaptive feature of combining the information related to one class in one way, and the information
related to another class in another way.
The proposed hybrid operator works the same way as the DS operator except in
the case of conflict and when any belief mass happens to be zero, and also when
varying reliability need to be introduced on the different sensors. The combined
operator has a mass l referring to the modified mass, since it can exceed one at
intermittent stages due to para-consistency and is given by:

l=

n
P
i=1

wi l i

1k

n
Q
i=1
k li 6= 0

wi li

n
P
i=1
k = 0, any li = 0

1k

wi li

(7.3)

where wi is the weight associated to each sensor, k is the conflict between the
combining sensors and li is the mass associated with each sensor. The conditions and requirements of using this operator are the following:
The proportionate sensor weighting factor is used since the intrusion detection systems used for the combination are binary in nature. The axiom of
combination; i.e.,

m(A) = 1 gets satisfied only when exponential

A
weighting factors are made use of.
The weights assigned to the sensors should add to one;
n
P
i.e.,
wi = 1.
i=1

Chapter 7

144

The value of k lies between 0 and 1 and is the parameter that controls
the degree of compensation between the intersection and union parts. The
value of the conflict factor (k) between any two sensors can be calculated
as the Euclidean distance between the two sensors. k takes a value of zero
if the sensors are in consensus and non-zero in case of conflict.
The method that was proposed with this operator gives the most intuitive result
and works as follows:
Disjunctive combination is done on diverse pair of IDSs (averages out
without suppressing any evidence). Pair-wise disjunctive combination is
done on all the IDSs which are necessarily not redundant (since it is intuitive to think of a stronger evidence in case of redundancy rather than averaging out which does not give any additional support even though both
IDSs support it).
The results are then conjunctively combined if not totally contradicting
after the pair-wise aggregation, since at this stage suppression of any evidence will not happen to a higher extent and also suppression of certain
evidence helps in faster convergence.
In the case of redundancy, if we use disjunctive combination it is required
to work without normalization in all the intermittent combinations for the
sake of making the strong evidence still stronger.
In order to satisfy all axioms of evidence theory it is required to do normalization at the final stage so that all the masses of a particular evidence
sum to one.
It can be concluded that the proposed method combines IDSs reasonably
well under all conditions by disjunctively combining diverse or contradicting pairs and finally suppressing the weak hypotheses by a pair-wise
conjunctive combination.
In the very specific case of binary evidence to singletons, it can be observed that there is no additional support happening with the addition of
redundant evidence. Hence, it is better to go in for disjunctive combination
in all the intermittent combinations till the final step where the conjunctive
combination helps in a faster convergence.

Chapter 7

145

7.4.1 Performance of the proposed combination operator


The proposed operator () is:
Same as the DS operator which is a consensus operator. This consensus
operator cannot provide information from a set of measures among which
one or more are zeros.
Union operator in case of conflict and also when the mass/es are zeros.
The union operator is a para-consistent combination operator and hence
the combination mass can exceed one.
The operator satisfies the commutative, continuity, monotonicity (after normalization), quasi-associativity (if normalization is done only at the final stage of
combination) and no idempotence (para-consistency of union operator).
The reason for using the context dependent operator is that the union operator is
totally acceptable in case of diversity or conflict because of uncertainty. However, disjunction gives averaging with redundant observations. Even though it is
reasonable to give average value where the additional observation by one more
IDS does not add to the increase in the belief, intuition makes us to think that
some method which increases the belief of the strong hypothesis is required.
The context-dependent operator subsumes the celebrated DS operator except
for the cases of conflict and zero evidence and hence all the axioms of the DS
theory appear with the context-dependent operator also.

m(A) = 1,

A
m() = 0 and 0 m(A) 1. Also, independence of the sources which combine is another assumption taken in the case of applicability of the combination.
Most of the commercial IDSs are signature-based which fail to identify the zeroday attacks while working with real-time traffic. In such cases, misclassification
or False Negative (which is again misclassification, since the FoD contains the
hypothesis normal which is an expected output of the IDS) is expected. Hence
the combination operator can assume a closed-world assumption as in the case
of DS method.

Chapter 7

146

The same example which was used to illustrate the performance of the DS
and other operators is taken again to illustrate the performance of the contextdependent operator. Applying the context-dependent method, with aggregation

PHAD(m1 )
ALAD(m2 )
Snort(m3 )

N ormal
0.4
0.1
0

P robe
0.6
0.1
0.3

DoS
0
0
0

U 2R
0
0
0

R2L
0
0.8
0.7

Table 7.4: Evidence from four sensors with one unreliable using the context-dependent operator

of information from the conflicting evidence and subsequently converging down


by the consensus operator:
((m1 m2 )m3) gives m(N ormal) = 0, m(P robe) = 0.27, m(DoS) = 0,
m(U 2R) = 0, m(R2L) = 0.73
The operator will give additional advantage when a lower weighting factor is
given to the first evidence when it is observed to conflict with all the other two
evidence. However, the complexity in choosing a weight factor is very high.
Hence even though in the operator equation we have incorporated the weight
factor, we have got good result even without applying the weight factor and
hence the method as such gets simplified appreciably.
Weighted disjunctive combination

If an IDS is known to be more dependable than the rest of the IDSs in


detecting a particular attack, then that particular IDS is weighted high for
that particular class. Even in the worst case with all other detectors unable
to identify the particular attack and with all of them recognizing it for
something else, the correct detector optimized for the particular attack has
a high weight factor which makes possible the correct detection.
Even with the substitution of weight factors, it is necessary to delay the
normalization till the final stage of disjunctive combination in order to
satisfy the axiom of evidence theory. Then all the advantages of disjunctive
combination, like the associativity property also gets satisfied till the final
stage of combination.

Chapter 7

147

Consider an experiment of observing a slow scan in network traffic with three


sensors, PHAD, ALAD and Snort. Snort responded with an alert whereas the
other two sensors could not detect the slow scan. We have made use of a datadependent decision learner using neural network to obtain the weight for each
one of the IDSs under various input traffic. The weights assigned by the neural
network learner is fed to the fusion unit along with the output from individual
IDSs. The fusion is done by the proposed context-dependent operator and the
resultant assignment values corresponding to each of the hypothesis are as follows:
m(P robe) = 1, m(DoS) = 0, m(R2L) = 0, m(U 2R) = 0, m(N ormal) = 0
whereas DS method of combination would have resulted in:
m(P robe) = 0, m(DoS) = 0, m(R2L) = 0, m(U 2R) = 0, m(N ormal) = 0.
Advantages of the proposed operator

This operator can combine evidence from two IDSs with different FoD.
Then the combined FoD will be the union of the FoDs.
The closure property is satisfied so as to stay within a given mathematical
framework.
This operator works under all conditions and states of the individual IDS.
This operator has been developed quite intuitively and hence the result is
most intuitive. The conjunctive operator is acceptable when all sources
happen to be reliable and similar, whereas the union operator corresponds
to the data aggregation from weaker reliability sources. Thus, conjunctive
combination makes sense when the mass distribution significantly overlap
and if not, the combination will have at least one of the sources as wrong
and it is better to choose disjunctive combination. Also, our intuition was
that a certain diversity among classifiers assures versatility whereas a certain redundancy assures reliability.
The combination operation is simple and easy.

Chapter 7

148

The combination takes care that no information is unnecessarily suppressed,


but at the same time convergence is assured.
The non-idempotence property is counted as an advantage in sensor fusion,
since the same observation from two sensors should improve the belief in
that observation, rather than idempotence. Bel Bel 6= Bel; even though
Bel Bel will favour the same subsets as Bel but with, as it were, twice
the weight of evidence. If each source supports a hypothesis for independent reasons, it is natural to conclude that the hypothesis is supported
strongly since we have different reasons for considering it as such. Also,
adopting idempotence is a matter of context; acceptable when sensors are
homogeneous.
Other properties like commutative, continuity, and distributive properties
are satisfied.
Associativity is not absolutely required, since our combination algorithm
is not associative since it considers an ordering of sources. A weaker property such as quasi-associativity is often sufficient, if we delay the normalization till the end of the combination.
This operator subsumes the most celebrated DS method of evidence and
hence all axioms of DS theory of evidence is incorporated as it is.
This operator has the property of dipolarity, which means that the more
one proposition is supported by all the evidence, the more it can obtain
belief masses after combination. This property is seen to be satisfied by
DS operator also.
This operator is relatively tolerant of inaccurate, incomplete or inconsistent
evidence.
This operator aggregates the conflicting evidence and then as the next stage
comprehend the aggregated pairwise results for an improved sensitivity
and for a false alarm suppression.

Chapter 7

149

Disadvantages of the proposed operator

Choice of the operator function depend on the context and hence has to be
carefully chosen.
Sources need to be independent for the combination. The Demspters idea
on the combination of independent sources of information can be stated
as follows: Suppose there are n pieces of evidence which are given in the
form of n probability spaces (i , i , mi ) where i is a subset of P (), the
power set of the FoD, each of which has a mapping relation with the same
space S through a multivalued mapping. These n sources are independent
and the explanation by Dempster is as follows: opinion of different sensors based on overlapping experiences could not be regarded as independent sources. Dempster assumes statistical independence of sources as
different measurements by different observations on different equipment
would often be regarded as independent ...
For a parallel combination of any model, the basic requirement is that the
combination should be associative.
7.4.2 Discussion
1. The selection of the IDSs has been done by choosing the sensors depending
on minimum correlation among sensors. The correlation coefficient n of
the available sensors is given by the formula:
f

+N )
,
n = N N fn(N
N t +n(N f +N t )
where n is the number of sensors, N is the number of experiments, N f is
the number of experiments where all classifiers fail to detect, and N t is the
number of experiments where all IDSs detect correctly. We refer to redundancy of classifiers when the correlation coefficient is one, similarity when
the correlation coefficient is greater than 0.5, diversity of classifiers when
correlation coefficient is less than or equal to 0.5 and totally contradicting
when the correlation coefficient is zero.

Chapter 7

150

2. It is quite intuitive to think that the fusion method should work with minimum number of IDSs and get the advantages of fusion techniques whatever be the fusion technique used. Every best IDS when merged should improve the confidence of the existing evidence and thereby converge faster
i.e., the fusion method makes strong evidence stronger so that confusion
(uncertainty ) is eliminated.
3. There are inherent advantages in using best IDSs in the fusion scheme.
This also makes sense intuitively in the case of evidence theory of fusion.
i.e., if one sensor gives its evidence and the second also gives a similar
evidence, the belief is reinforced stronger, or the contradiction in evidence
reduces the belief.
4. The DS method of combination implicitly has a closed world assumption,
i.e. the set of possible hypotheses is perfectly known. We assume that
represents a set of states that are mutually exclusive and exhaustive. In
intrusion detection, when dealing with real time traffic, need not necessarily be exhaustive, since the traffic may contain many novel attacks not
included in the . However in such cases, the intrusion detection systems
may be unable to detect it and hence it appears as normal which is also a
hypothesis included in the FoD. Hence an additional label denoting none
of the above is not needed because the none of the above attacks gets
included in the hypothesis normal by the evidence only or else this novel
attack gets misclassified as some other attack only.
5. The sources are assumed to be independent with DS, (m(A) = m1 (A)
m2 (A)), even though we may use IDSs that are trained from the same
training set, the two are independent of each other and they in turn depend
on the training data set. Decisions acquired from multiple IDSs are more
likely to be independent when they look at entirely different features of the
traffic. Else if the two IDSs make use of the same features for detection, the
detectors may give a consensus in their decision, which is different from
the dependence between the sensors. When fusing by means of mathematical decision rules, it is necessary to have independent detectors, because
this will simplify construction of the rule and enhance its efficiency.
6. It is important to note that when we apply the union operator, (m(A) =

Chapter 7

151

m1 (A) + m2 (A)), we do not assume mutual exclusiveness. The event


that an IDS alerts with {DoS} and the event that another IDS alerts with
{DoS} are not mutually exclusive. The elements in the FoD are mutually
exclusive where as the two sets which are sample space of IDS1 and IDS2
are not mutually exclusive. Since they are not mutually exclusive, the
result of Union operator will be para-consistent. The union of two events
IDS1 alerting DoS and IDS2 alerting DoS, denoted as IDS1 IDS2 is the
event containing all elements belonging to IDS1 alerting as DoS or IDS2
alerting as DoS or to both alerting as DoS. The idea is the aggregation
of the support to the events so that the belief improves. However, at the
same time we have tried for disjoint training sets to both the anomalybased IDSs, PHAD and ALAD by training them on week one and week
three of the DARPA99 data set respectively.
7. Shafer [112] describes the requirement of the different IDSs to be independent and non-interacting, which is just that all their interaction should
be in terms of the issues discerned by the FoD. That clearly says that the
FoD should discern the interaction of the evidence (hence if we have
the singletons A, B and C, then the FoD consists of the eight possibilities
which give the total interaction within the FoD).
8. We concede that for highly conflicting cases, this method is same as an
averaging operator. However, we argue that this method still has considerable application due to weight factors and data-dependency which results
in highly intuitive results.

7.5 Experimental evaluation


All the Intrusion Detection Systems that form part of the fusion IDS were separately evaluated with the same two data sets, namely 1) the real-time traffic and
2) the DARPA 1999 data set.
m(N ormal) = 0; m(P robe) = 0; m(DoS) = 0; m(R2L) = 1; m(U 2R) = 0;
m(N ormal) = 0; m(P robe) = 1; m(DoS) = 0; m(R2L) = 0; m(U 2R) = 0;

Chapter 7

152

PHAD
ALAD
Snort
Fusion output

Normal
1
0
0
0

Probe
0
0
0
0

DoS
0
0
0
0

R2L
0
1
1
1

U2R/ Data
0
0
0
0

Table 7.5: Belief of each of the IDSs for a R2L attack

PHAD
ALAD
Snort
Fusion output

Normal
0
1
1
0

Probe
1
0
0
1

DoS
0
0
0
0

R2L
0
0
0
0

U2R/ Data
0
0
0
0

Table 7.6: Belief of each of the IDSs for a stealthy probe


Attack type
Probe
DoS
R2L
U2R/ Data
Total

Total attacks
37
63
53
37
190

Attacks detected
26
27
6
4
63

% detection
70%
43%
11%
11%
33%

Table 7.7: Type of attacks detected by PHAD at 100 false alarms


Attack type
Probe
DoS
R2L
U2R/ Data
Total

Total attacks
37
63
53
37
190

Attacks detected
9
23
31
15
78

% detection
24%
37%
59%
31%
41%

Table 7.8: Type of attacks detected by ALAD at 100 false alarms


Attack type
Probe
DoS
R2L
U2R/ Data
Total

Total attacks
37
63
53
37
190

Attacks detected
15
35
30
34
115

% detection
41%
56%
57%
92%
61%

Table 7.9: Type of attacks detected by Snort at 1000 false alarms

Chapter 7

153

Attack type
Probe
DoS
R2L
U2R/ Data
Total

Total attacks
37
63
53
37
190

Attacks detected
31
44
34
34
143

% detection
84%
70%
64%
92%
75%

Table 7.10: Type of attacks detected by context-dependent fusion at 100 false alarms

Figure 7.1: Detection of Attack Types

7.5.1 Impact of this work


The fusion technique adopted in this work is expected to combine IDSs outputs
with subjective judgements. This concept is nicely suitable to intrusion detection, where the concern usually involves human subjects activity and intention.
So the solution is to freely use subjective sensors, i.e., the sensors outputs can,
not only depend on observation of statistical process, but also depend on rational
human reasoning. Since there are multiple sensors, we need to coordinate them
and combine their results. The combination operator proposed in this chapter
for sensor fusion is mainly used because it is difficult to represent the information supplied by the sensors by means of single probability distributions, due
to imprecision and/or lack of statistical evidence. The context-dependent operator functions either as a conjunctive or as a disjunctive operator depending
on the context is particularly suitable when the sources are heterogeneous. The

Chapter 7

154

Detector/
Fusion Type
PHAD
ALAD
Snort
OR
AND
Data-dependent
Decision Fusion

Total
Attacks
45
45
45
45
45
45

TP

FP

Precision

Recall

F-score

11
20
13
34
9
31

45
45
400
470
29
39

0.20
0.31
0.03
0.07
0.24
0.44

0.24
0.44
0.29
0.76
0.2
0.69

0.22
0.36
0.05
0.13
0.22
0.54

Table 7.11: Comparison of the evaluated IDSs using the real-world data set

feasibility of this idea was demonstrated via an analysis case study with several IDSs distributed over a LAN network and using the replayed DARPA data
set. The technique gives a performance better than any of the individual intrusion detection systems which were fused. Even though it was validated for a
particular application, it should be a generalizable solution beyond any specific
application case.

7.6 Summary
Different IDSs have different detection rate and false alarm rate and these may
be complementary, competitive or cooperative. What sensor fusion is all about
is how to combine multiple sensor outputs to reveal the best truth regarding the
objects of interest in terms of practical utility. The context dependent operator proposed in this chapter was demonstrated to be feasible for sensor fusion.
The research in this thesis have improved over the existing DS alternatives in
that it can better handle uncertainty and ambiguity in sensed context. The individual IDSs that are components of this architecture in this particular work
were PHAD, ALAD and Snort with detection rates 0.33, 0.41 and 0.61 respectively. The false alarm rates for PHAD and ALAD were acceptable whereas
that of Snort was unexceptionally high. Our algorithm has resulted in acceptable ranges of false alarms and the significant improvement in detection rate for
all types of attacks with an overall detection rate of 0.75 which have resulted
with exceptionally large data set and suboptimal constituent intrusion detection
systems. The detection rate for the real-world network traffic has improved to
0.69. The F-Score has improved to 0.66 and 0.54 for the DARPA data set and

Chapter 7

155

the real-world traffic respectively. The evaluations show the strength and ability of the data-dependent decision fusion approach using the modified evidence
theory to perform very well for the real-world network traffic as well as for the
DARPA data set. It is also shown that our technique is more flexible and also
outperforms other existing fusion techniques such as OR and AND. The experimental comparison of this method using the real-world traffic has confirmed its
usefulness and significance.
The experiments in this work used only three IDSs. It is possible that the use of
more sensors will necessarily lead to higher performance improvement of the
fusion IDS. Also, the context-dependent operator can provide a generalizable
solution for a wide range of applications. Thus it supports the claim that synergistic interaction between sensor fusion and intrusion detection facilitates the
sensor fusion for detection improvement.

Chapter 8
Modeling of Intrusion Detection Systems
and Sensor Fusion
I find that the harder I work, the more luck I seem to have.
Thomas Jefferson

8.1 Introduction
This chapter addresses the problem of optimizing the performance of intrusion
detection systems using sensor fusion. Considering the utility of sensor fusion
for improved sensitivity and false alarm reduction as demonstrated in the earlier chapters, we have explored the general problem of deciding the threshold
for differentiating the malicious traffic from the normal traffic and also modeling the individual components of proposed sensor fusion architecture. In the
proposed method, the performance optimization of the individual IDSs is first
addressed. The neural network supervised learner has been designed to determine the weights of the individual IDSs, which incorporates data-dependency
in the architecture. A sensor fusion unit doing the weighted aggregation in order to make an appropriate decision, forms the final stage of the data-dependent
decision fusion architecture. This chapter theoretically models the fusion of intrusion detection systems for the purpose of demonstrating the improvement in
performance, in order to supplement the empirical evaluation in the previous
two chapters.

156

Chapter 8

157

The remaining of this chapter is organized as follows. In section 8.2, the motivation for this chapter is discussed. In section 8.3 model of the proposed DD
fusion architecture is presented by modeling the constituent parts. Algorithms
for optimizing the local detectors along with a data-dependent decision fusion
architecture for optimizing the fusion criterion are also presented in section 8.3.
Finally, the concluding comments are presented in section 8.4.

8.2 Motivation
This chapter attempts to realize that there exists more effective means of analyzing the information provided by existing IDSs using sensor fusion resulting
in an effective data refinement for knowledge recovery. The improved performance of the DD fusion architecture is shown experimentally with an approach
adopted for optimizing both the local sensors and the fusion unit with respect
to the error rate. The optimal performance along with the complexity of the
task bring to the fore the need for theoretically sound basis for the sensor fusion
techniques in IDSs. The theoretical analysis of the improved performance of
the architecture has been done in chapter 4.
The motivation of the present work was the fact that the empirical evaluation
as seen in previous chapter was extremely promising with the DD fusion. The
modeling can be extremely useful with a complete addressing of the problem
with sound mathematical and logical concepts as introduced in chapter 4. Thus
the present work employs modeling to augment the effective mathematical analysis of the improved performance of sensor fusion and to develop a rational
basis which is free from the various techniques used.

8.3 Modeling of data-dependent decision fusion system


The Data-dependent Decision fusion approach proposed in chapter six was extended to include the modeling with optimization done at every single stage,
thereby arriving at an optimum architecture showing improved performance
than what has been reported so far in literature. The architecture is three stage,

Chapter 8

158

optimizing the individual IDSs as the first stage, the neural network learner determining the weights of the individual IDSs as the second stage, and the fusion
unit doing the weighted aggregation as the final stage.
8.3.1 Modeling of Intrusion Detection Systems
Consider an IDS that either monitors the network traffic connection on the network or the audit trails on the host. The network traffic connection or the audit
trails monitored are given as x X, where X is the entire domain of network
traffic features or the audit trails respectively. The model is based on the hypothesis that the security violations can be detected by monitoring the network
for traffic connections of malicious intent in the case of network-based IDS
and a systems audit records for abnormal patterns of system usage in the case
of host-based IDS. The model is independent of any particular operating system, application, system vulnerability or type of intrusion, thereby providing a
framework for a general-purpose IDS.
When making an attack detection, a connection pattern is given by xj <jk
where j is the number of features from k consecutive samples used as input
to an IDS. As seen in the DARPA data set, for many of the features the distributions are difficult to describe parametrically as they may be multi-modal
or very heavy-tailed. These highly non-Gaussian distributions has led to investigate non-parametric statistical tests as a method of intrusion detection in the
initial phase of IDS development. The detection of an attack in the event x is
observed as an alert. In the case of network-based IDS, the elements of x can
be the fields of the network traffic like the raw IP packets or the pre-processed
basic attributes like the duration of a connection, the protocol type, service etc.
or specific attributes selected with domain knowledge such as the number of
failed logins or whether a superuser command was attempted. In host-based
IDS, x can be the sequence of system calls, sequence of user commands, connection attempts to local host, proportion of accesses in terms of TCP or UDP
packets to a given port of a machine over a fixed period of time etc. Thus IDS
can be defined as a function that maps the data input into a normal or an attack
event either by means of absence of an alert (0) or by the presence of an alert
(1) respectively and is given by:

Chapter 8

159

IDS : X {0, 1}
To detect attacks in the incoming traffic, the IDSs are typically parameterized
by a threshold T. The IDS uses a theoretical basis for deciding the thresholds for
analyzing the network traffic to detect intrusions. Changing this threshold allows the change in performance of the IDS. If the threshold is very low, then the
IDS tends to be very aggressive in detecting the traffic for intrusions. However,
there is a potentially greater chance for the detections to be irrelevant which result in large false alarms. A large value of threshold on the other hand will have
an opposite effect; being a bit conservative in detecting attacks. However, some
potential attacks may get missed this way. Using a 3 based statistical analysis,
the higher threshold (Th ) is set at +3 and the lower threshold (Tl ) is set at 3.
This is with the assumption that the traffic signals are normally distributed. In
general the traffic detection with s being the sensor output is given by:

Sensor Detection =

attack, Tl < s < Th


normal, s Tl , s Th

Signature-based IDS functions by looking at the event feature x and checking whether it matches with any of the records in the signature database Db .
Signature-based IDS : X {1}
: X {0}

x Db
x
/ Db

Anomaly-based IDS generates alarm when the input traffic deviates from the
established models or profiles Pf .
Anomaly-based IDS : X {1}
: X {0}

x
/ Pf
x Pf

8.3.2 Modeling the fusion IDS


Consider the case where n IDSs monitor a network for attack detection and each
IDS makes a local decision si and these decisions are aggregated in the fusion

Chapter 8

160

unit f . This architecture is often referred to as the parallel decision fusion network and is given in Figure 8.1. The fusion unit makes a global decision, s,
about the true state of the hypothesis based on the collection of the local decisions gathered from all the sensors. The problem is casted as a binary detection
S1

IDS1

IDS2
INPUT (x)

S2

.
.
.
.
.

OUTPUT (y)
FUSION UNIT
Sn

IDSn

Figure 8.1: Parallel Decision Fusion Network

problem with the hypothesis Attack or N ormal. Every IDS participating


in the fusion has its own detection rate Di , and false positive rate Fi , due to the
preferred heterogeneity of the sensors in the fusion process. Each IDS indexed i
gives an alert or no-alert indicated by si taking a value one or zero respectively,
depending on the observation x.

si =

0, normal is declared to have been detected


1, attack is declared to have been detected

The fusion center collects these local decisions si and forms a binomial disn
X
tribution s as given by s =
si , where n is the total number of IDSs that
take part in fusion.

i=1

Theorem 1
The output of a binary fusion unit is decided by a function f given by:
f : s1 x s2 ..... x sn x x {0, 1}, where the decisions of the individual detectors given by si are deterministic and the data x is a random parameter.
Lemma 1
The decision rule used by each of the individual detectors is deterministic and

Chapter 8

161

can be expressed as a function fi given by:


fi : si {0, 1} defined as:

fi (xj ) =

0,
1,

if p(sji = 0|xj ) = 1
otherwise

where j corresponds to the class of the network traffic on which the fusion
rule as well as the respective sensor outputs depend. Since fusion center makes
the final decision, the assumption is made that the output of the fusion rule is
binary, i.e., either N ormal or Attack. It is the same case with all the individual
IDSs: each IDS classifies the incoming traffic as N ormal or Attack.
8.3.3 Statement of the problem
The problem statement is defined in the following steps:
The random variable x represents the observation to be made. This observation belongs to either of the two groups of the hypothesis: N ormal or
Attack with probabilities p or q = 1 p, respectively.
A set of n IDSs monitors the random variable x and detects the presence
of attack in the traffic. The set of detections by the n sensors is given by
{s1 , s2 .....sn }, where si is the output of the IDS indexed i. Each si is a
function of the input x, i.e., si = fi (x).
The problem of optimum detection with n IDSs selecting either of the two
possible hypotheses is considered from the decision theory point of view.
The loss function ` is defined in terms of the decisions made by each IDS
along with the observation and is given by:

` : {0, 1} x {0, 1} x .....{0, 1} x {N ormal, Attack} R.

(8.1)

The average of the loss is then minimized. The objective of the decision
strategy is to minimize the expected penalty (loss function) incurred as:

Chapter 8

162

minE[`(s1 , s2 .....sn , H)], where H is the hypothesis and the minimization


is over the decision rules of each detector.
With `(s1 , s2 .....sn , H) = k being the cost incurred for the IDS1 deciding s1 , IDS2 deciding s2 , and so on. The minimum value of this
cost function ` occurs when all the sensors make the correct decisions as
`(0, 0, ...0, N ormal) = `(1, 1, ...1, Attack) = 0 and increases to 1 if
any one IDS only is incorrect and so on. Thus the cost function ` takes the
maximum value of n when all the IDSs are unable to make the correct
decision. This is a trivial case where all cases of n errors is penalized by
the same amount and the function ` can be reduced by using affine transformations. From the cost matrix of the KDD IDS evaluations [52], it is
clear that `(0, s1 , s2 .....sn , Attack) > `(1, s1 , s2 .....sn , N ormal), or it is
more costly for any detector to miss an attack compared to a false alarm,
regardless of the detection of other detectors. The minimization of the loss
leads to sets of coupled inequalities in terms of the likelihood ratio of each
IDS and the decisions made at the other sensors.
As k decreases from 2 to 1, the thresholds would change in a way which
increases the probability of error, as double errors are discounted to single
ones. As k increases from 2, double errors become prohibitively expensive, so it is to be expected that some mechanism will emerge to reduce
their likelihood. Thus, for k varying from 1 to n, there are n solutions to
minimize equation 1, one of which being the global minimum and thus the
optimal threshold pair.

8.3.4 The effect of setting threshold


To detect the attack in the incoming traffic, the IDSs are typically parameterized
with a threshold, T. Changing this threshold allows the change in performance
of the IDS. If the threshold is very large, some potentially dangerous attacks
get missed. A small threshold on the other hand results in more detections,
with a potentially greater chance that they are not relevant. The final step in the

Chapter 8

163

approach towards solving of the fusion problem is taken by noticing that the decision function fi (.) is characterized by the threshold Ti and the likelihood ratio
(if independence is assumed). Thus the necessary condition for optimal fusion
decision occurs if the thresholds (T1 , T2 , ...Tn ) are chosen optimally. However,
this does not satisfy the sufficient condition. These refer to the many local minima, each need to be checked to assure the global minimum.
The counterintuitive results at the individual sensors with the proper choice
of thresholds will be advantageous in getting an optimum value for the fusion
result. They are excellent paradigms for studying distributed decision architectures, to understand the impact of the limitations, and even suggest empirical
experiments for IDS decisions.
The structure of the fusion rule plays a crucial role regarding the overall performance of the IDS since the fusion unit makes the final decision about the state of
the environment. While a few inferior IDSs might not greatly impact the overall performance, a badly designed fusion rule can lead to a poor performance
even if the local IDSs are well designed. The fusion IDS can be optimized by
searching the space of fusion rules and optimizing the local thresholds for each
candidate rule. Other than for some simple cases, the complexity of such an
approach is prohibitive due to exponential growth of the set of possible fusion
rules with respect to the number of IDSs. Searching for the fusion rule that
leads to the minimum probability of error is the main bottleneck due to discrete
nature of this optimization process and the exponentially large number of fusion rules. In our experiment we are trying to maximize the true positive rate
by fixing the false positive rate at 0 . 0 determines the threshold T by trial and
error. We have noticed that within two or three trials in our case. This is done
with the training data and hence it is done off line.
The computation of thresholds couples the choice of the local decision rules
so that the system-wide performance is optimized, rather than the performance
of the individual detector. This requirement is taken care of by the DD fusion
architecture proposed and discussed in the previous chapter. The weights assigned to the individual sensors is determined by the neural network learner.

Chapter 8

164

The neural network learner can be considered as a pre-processing stage to the


fusion unit. The neural network is the most appropriate for weight determination as it is difficult to define the rules clearly, mainly when more number
of IDSs are added to the fusion unit. When a record is correctly classified by
one or more detectors, the neural network will accumulate this knowledge as
a weight and with more number of iterations, the weight gets stabilized. This
information is used to fine-tune the fusion unit, since the fusion depends on the
input feature vector. The fusion output is represented as:
y = Fj (wij (xj , sji ), sji ),
where the weights wij are dependent on both the input xj as well as individual
IDSs output sji , where the prefix j refers to the class label and the suffix i refers
to the IDS index. The fusion unit used gives a value of 1 or 0 depending
on the set threshold being higher or lower than the weighted aggregation of the
IDSs decisions. The fusion unit is optimized using this set up with the proper
weighting to each one of the input to the fusion unit. The individual IDS are optimized by the proper choice of the threshold which is decided by the detectionfalse alarm trade-off. ROC curves are used to evaluate IDS performance over a
range of trade-offs between detection rate and the false positive rate. Each IDS
will have an operating point in the ROC curve and the optimum operating point
is located at the relatively top-left point. The optimal decision fusion detection
n
X
wij sji .
rule is obtained by forming the output of the fusion unit as: y = s =
i=1

The architecture is independent of the data set and the structures employed, and
can be used with any real valued data set.
8.3.5 Modeling of neural network learner unit
The neural network unit in the data-dependent architecture is a supervised learning system which learns from a training data set. The training of the neural
network unit by back propagation involves three stages: the feed forward of the
output of all the IDSs along with the input training pattern, which collectively
form the training pattern for the neural network learner unit, the calculation and
the back propagation of the associated error, and the adjustments of the weights.
After the training, the neural network is used for the computations of the feedforward phase. Learning can be defined over an input space X, an output space

Chapter 8

165

Y and a loss function `. The training data can be specified as {(xi , yi )}, where
xi X and yi Y . The output is a hypothesis function fw : X Y . fw
is chosen from a hypothesis space F to minimize the prediction error given by
the loss function. The hypothesis function is that of the neural network and it
represents the non-linear function from the input space X to the output space Y .
It is simple to assume stationarity by assuming the distribution of data points
encountered in the future to be the same as the distribution of the training set.
For simplicity, the DARPA data set is assumed to represent the real time traffic pattern distribution. Stationarity allows us to reduce the predictive learning
problem to a minimization of the sum of the loss over the training set.

P
`(fw (xi ), yi )
fw = argmin
s.t fw F & (xi , yi ) s

(8.2)

Loss functions are typically defined to be non-negative over all inputs and zero
when fw (xi ) = yi .
8.3.6 Dependence on the data and the individual IDSs
Often, the data in the databases is only an approximation of the true data. When
the information about the goodness of the approximation is recorded, the results
obtained from the database can be interpreted more reliably. Any database is
associated with a degree of accuracy, which is denoted with a probability density function, whose mean is the value itself.
In order to maximize the detection rate it is necessary to fix the false alarm
rate to an acceptable value, taking into account the trade-off between the detection rate and the false alarm rate. The threshold (T ) that maximizes the T Prate
and thus minimizes the F Nrate is given as:

Chapter 8

166

FPrate = P [alert|normal] = P

" n
X

#
wi si T |normal

= 0

(8.3)

i=1

TPrate = P [alert|attack] = P

" n
X

#
wi si T |attack

(8.4)

i=1

The fusion of IDSs becomes meaningful only when F P F Pi i and T P


T Pi i; where F P and T P correspond to the false positives and the true
positives of the fused IDS and F Pi and T Pi correspond to the false positives and
the true positives of the individual IDS indexed i. It is required to provide low
value of weight to any individual sensor that is unreliable, hence meeting the
constraint on false alarm as given in equation 3. Similarly, the fusion improves
the T Prate as the detectors get weighted according to their performance.
8.3.7 Threshold optimization
Tenney and Sandell in their work [119] establish the optimum strategy that minimizes a global cost in the case where the a priori probabilities of the hypotheses, the distribution functions of the local observations, the cost functions, and
the fusion rule are given. They concluded that each local detector is optimally a
likelihood ratio detector but that the computation of the optimum thresholds for
these local detectors is complicated due to cross coupling. The global optimization criterion for a distributed detection system would encompass local decision
statistics, local decision thresholds, the fusion center decision statistic, and the
fusion center decision threshold.
For each input traffic observation x, the set of n local thresholds should be
optimized with respect to the probability of error. With a fusion rule given by
a function f , the average probability of error at the fusion unit is given by the
weighted sum of false positive and false negative errors.
Pe (T, f ) = p P (s = 1|N ormal) + q P (s = 0|Attack)
where p and q are the respective weights of false positive and false negative

Chapter 8

167

errors.
Assuming independence between the local detectors, the likelihood ratio is
given by:
n

Y P (si |Attack)
P (s1 , s2 , ..., sN |Attack)
P (s|Attack)
=
=
P (s|N ormal) P (s1 , s2 , ..., sN |N ormal) i=1 P (si |N ormal)
T heoptimumdecisionrulef orthef usionunitf ollows :
f (s) = log

P (s|Attack)
P (s|N ormal)

Depending on the value of f (s) being greater than or equal to the decision
threshold, T , or less than the decision threshold, T , the decision is made for the
hypothesis as Attack or N ormal respectively. Thus the decisions from
the n detectors are coupled through a cost function. It is shown that the optimal
decision is characterized by thresholds as in the decoupled case. As far as the
optimum criterion is concerned, the first step is to minimize the loss function
of equation 8.1. This leads to sets of simultaneous inequalities in terms of the
generalized likelihood ratios at each detector, the solutions of which determine
the regions of optimum detection.

8.4 Results and discussion


The model is validated for the data-dependent decision fusion algorithms that
have been developed in this work. The false positive rate 0 is initially set at
an acceptable value of 0.00002 in all the cases. The maximization of the true
positive is achieved. The Table 8.1 shows the enhanced performance of the
different models by appropriate threshold optimization.

8.5 Summary
The sensor fusion techniques works effectively by gathering complementary
information that can improve the overall detection rate without adversely affecting the false alarm rate. Simple theoretical model is initially illustrated in

Chapter 8

168

Fusion method
PHAD
ALAD
Snort
DD Fusion
DD Fusion with
modified evidence theory

Detection rate
0.33
0.41
0.61
0.72
0.75

Average probability of error


0.39
0.35
0.23
0.17
0.15

Table 8.1: Average probability of error with DD fusion algorithms


Figure 8.2: Average probability of error

this chapter for the purpose of showing the improved performance of IDS using
sensor fusion. The detection rate and the false positive rate quantify the performance benefit obtained through the optimization. The theoretical model is also
validated.

Chapter 9
Conclusions
Whether you think you can, or that you cant, you are usually right.
Henry Ford

Effective intrusion detection is a critical component of cyber infrastructure as


it is in the forefront of the battle against cyber-terrorism. The ingress traffic
to a network can be used for network intrusion detection as well as prevention
whereas the egress traffic can detect malware on a corporate network. The individual approaches to intrusion detection for the sake of network security, as
presented in literature, provide a partial solution to the overall issue of identifying all the possible attacks on the network. Each intrusion detection system
is tailored to provide detection of a particular attack class. However, none of
the available IDSs can offer the full protection of the sensor fusion approach,
drawing from the strengths of all individual systems that take part in fusion to
surmount their respective weaknesses in a symbiotic manner.
In this thesis, it is brought out that relatively high probabilities of intrusion detection, at acceptable false alarm levels, can potentially be achieved in inclement
monitoring, heavy traffic, and sophisticated attack environments. Decisions acquired from multiple sensor systems are more likely to be independent when
they look at entirely different parameters of the traffic. These independent decisions from multiple sensors expand the total information that can be gathered
about a particular connection and aids in effective sensor fusion.
169

Chapter 9

170

In this thesis, we presented a framework for the performance enhancement of intrusion detection systems using advances in sensor fusion. The data-dependent
decision fusion architecture has been implemented and the results of implementation has been observed to be better than that of the individual detectors that
take part in fusion, thus validating the approach. We have also demonstrated the
importance of detecting the rarer and the most significant attacks. The fusion algorithm used modified evidence theory, which aids in providing the better than
the best protection. This chapter presents the conclusions drawn from this thesis
work and discusses the directions for future research.

9.1 Results and discussion


A fair performance comparison of the proposed architecture is almost impossible due to the fact that each detection scheme is constructed for detection of a
specific class of attack. As an illustration of this point we observe three IDSs,
namely PHAD, ALAD and Snort as given in chapter 3. We claim that each
IDS is designed and developed for a specific class of attack. PHAD exhibits
superior performance in detecting the probes and the DoS attacks. On the other
hand, it exhibits sub-optimal performance in detecting the attacks belonging to
R2L and U2R classes. Similarly, the other IDSs also have their own preferences
for attack detection. But, the proposed algorithm claims a better than the best
protection since given the best IDSs, the fusion results in a detection better than
the best detector if the detectors are uncorrelated. Otherwise, as the worst case,
atleast the performance of the best IDS results from fusion.
The results presented in chapters five to eight show that the data-dependent
decision fusion using the modified evidence theory was successful in generating an accurate empirical behavioral model from training data and then could
apply this empirical knowledge to data never seen before. Starting with three
single IDSs, the performance of attack detection was enhanced through various
sensor algorithms developed in this work. The final model developed has high
overall accuracy level, which showed both a high detection rate of 0.75 and
an extremely low false positive rate of 0.00002. The F-score obtained is 0.66.
From these results, it was concluded that data-dependent decision fusion using

Chapter 9

171

modified evidence theory has evolved in this thesis work as a viable method for
empirical model generation for intrusion detection. The improved performance
of the IDS that has been demonstrated, if deployed fully would contribute to
53% reduction of the successful attacks in two years and 66% reduction of the
successful attacks in four years. This is a right step towards making the cyber
space safer over a period of time with the proper deployment of highly efficient
and sophisticated detectors .

9.2 Future work


This thesis has explored the feasibility of extracting information about the behavior of a network system that is more complete and reliable than any data
that had been available before as decisions of various intrusion detection systems. This availability opens multiple possibilities for future exploration and
research, and may lead to the design and development of more efficient, reliable and effective intrusion detection systems. Some of these possibilities are
listed below:
In this thesis the fusion architecture has been developed in order to offer
better than the best protection, which is a requirement for the future security solutions. The future improvements in individual IDSs can also be
easily incorporated in this technique. The approach developed in this thesis
is supposed to find applications in defense-in-depth security architectures.
The main reason for using the DARPA 1999 data sets in majority of our
evaluations was the need of relevant data that can easily be shared with
other researchers, allowing them to duplicate and improve our results. The
common practice in intrusion detection to claim good performance with
real-world network traffic makes it difficult to verify and improve previous
research results, as the traffic is never quantified or released for privacy
concerns.
As a future work, the fusion IDS can be made more efficient by incorporating more number of individual IDSs, since it can be easily proved that
the more the number of individual components that make the fusion IDS,
the better the fusion IDS performs. Different classes of intrusion detection

Chapter 9

172

systems like signature-based, anomaly based, flow-based and packet-based


are all to be included for the purpose of enhanced fusion output. The architecture developed in this thesis can be easily expanded by adding any
number of new IDSs to the fusion unit.
Multiple threshold levels were set for each of the IDSs that are components
of the fusion process illustrated in this work. We approximated the output
of the IDSs to take the binary values: zero for deciding on normal and one
for deciding on attack. This simplification can be avoided in the future
work. The severity or anomaly score can be normalized and multiplied
with the respective weights and used as the basic probability assignments
(bpa).
One of the intriguing properties of all the sensor fusion algorithms is its
ability to invent new features that are not explicit in the input to the unit.
In particular, it learns to represent intermediate features that are useful
for learning the target function and that are only implicit in the input to
the unit. With the increasing incidents of cyber attacks, building effective
intrusion detection models with good accuracy and real-time performance
are essential. More data mining techniques should hence be investigated
for a more effective feature extraction.

9.3 Summary
This thesis discussed the assertion that it is possible to perform intrusion detection for both rare and new attacks using advances in sensor fusion. The previous
chapters described the theoretical and experimental work done to show its validity and the results of experiments provide evidence in support of this thesis.
The experiments emphasize proof-of-concept demonstrating the viability of the
technique and also its efficiency in comparison to the existing methods. The
proposed approaches are shown to significantly improve detection rate and reduce the false alarm rate and hence results in an acceptable and usable intrusion
detection system.
While experimenting with a simple treatment for an enhanced intrusion detection, it was found that the data-dependent decision fusion using modified

Chapter 9

173

evidence theory was highly successful. Hence, it was worthwhile to devise investigations of other applications. This is suggested since it is often useful to
cast the net a bit wider to give the argument presented in this thesis a further
support, or a comparative focus.

Appendix A
Attacks on the Internet: A study
In all science, error precedes the truth, and it is better it should go first than
last.
Hugh Walpole

A.1 Introduction
An attack is realization of threat, the harmful action aiming to find and exploit
the system vulnerability. Computer attacks may involve destroying or accessing data, subverting the computer or degrading its performance. Traditionally
attacks on computers have included methods such as viruses, worms, bufferoverflow exploits and denial of service attacks. Network attacks on the other
hand are mostly attacks on computers that use a network in some way. A network could be used to send the attack (such as a worm), or it could be the
means of attack (such as Distributed Denial of Service attack). In general, network attacks are a subset of computer attacks. However, there are several types
of network attacks that do not attack computers, but rather the network they are
attached to. Flooding a network with packets does not attack an individual computer, but clogs up the network. Although a computer may be used to initiate the
attack, both the target and the means of attacking the target are network related.
There are lots of known computer system attack classifications, and taxonomies
available in literature [120, 15, 122, 123, 125, 126, 127, 128].
Howard [120] classified attacks according to Attackers, Tools, Access, Results

174

Appendix A

175

and Objectives. In the Ph.D. thesis of Kumar [15], he has introduced a classification based on attack signatures used within the IDS IDIOT. This classification
is based on the type of observation required to detect a given attack. Lindqvist
and Jonsson [122] presented an attack taxonomy using two dimensions of an
attack. Probably one of the best known taxonomies is the Defence Advanced
Projects Agency (DARPA) attack taxonomy. This taxonomy was developed in
1998 for classifying attacks in order to simplify the process of evaluating IDSs
[123]. Work done by Chris Rodgers [124] covers many computer and network
attacks with regards to TCP/IP networking. His research was carried out in
2001 and provides a good overview of the threats and attacks that face TCP/IP
networking, as well as attacks such as viruses, worms, trojans and denial of service attacks.
In the most widely used open source network intrusion prevention and detection system, namely the Snort, attack classification is based on its impact on the
computer system. The attacks whose effect is the most critical have the highest
priority. The priority levels are divided into high, medium and low ones. Highlevel priority attacks are such as the attempted administrator privilege gain, the
network Trojan, or the web application attack. Medium priority attacks are
the Denial of service (DoS) attacks, a nonstandard protocol or event, potentially
bad traffic, attempted log-in using a suspicious user etc. Low-level priority
attacks are the ICMP event, the network scan, the generic protocol command
etc.[130].

A.2 History of Internet attacks


Computer and network attacks have evolved greatly over the last few decades.
The attacks are increasing in number and also improving in their strength and
sophistication. Figure A.1 is the well celebrated plot by Julia Allen, which
shows this trend and also some of the trends in the history of attacks. A few of
the developments in the history of computer and network attacks are discussed
below.
In 1978, the concept of a worm [131] was invented by researchers at the Xerox

Appendix A

176

Figure A.1: Plot of Attack sophistication vs Intruder Knowledge over the years

Palo Alto Research Center: The Morris Worm [132]. The first viruses were
released in 1981, among them Apple Viruses 1, 2 and 3 which targeted the Apple II operating system. In 1983, Fred Cohen was the first person to formally
introduce the term computer virus in his thesis [133], which was published in
1985. More recently, new attacks such as denial of service (DoS) (mid 1990s),
distributed DoS (DDoS) attacks (in 1999), botnets and storm botnets have been
developed. Two major recent developments in computer and network attacks
are blended attacks and information warfare. The blended attacks first appeared
in 2001 with the release of Code Red [134] and then followed by Nimda [135],
Slammer [136] and Blaster [137]. Blended attacks contain two or more attacks
merged together to produce a more potent attack.

A.3 Attack motivation and objectives


Attack motivation can be understood by identifying what the attackers do and
how they can be classified. Icove, et al.[141], present a simple classification of
attackers as hackers, criminals (spies, terrorists, corporate raiders, professional

Appendix A

177

criminals) and vandals. The main motivation of a hacker is to access to a system


or data; the main motivation of the criminal is financial or political gain; and the
main motivation of the vandal is to damage. In the thesis work of Howard [142]
the problem with classifying attackers into the three categories is highlighted,
with all the three categories describing criminal behavior.
The incidents of cyberattacks that were serious and harmful in nature, can be
seen to be motivated by political and social reasons as pointed out by Denning[143].
The potential threat of cyberterrorism becoming unavoidable is due to the critical infrastructures that are potentially vulnerable and studies show that the vulnerabilities were steadily increasing, while the costs of attack were decreasing.
The statistics of attacks in the recent years appear in the web site for Web Server
Intrusion Statistics [144].

A.4 Attack taxonomy


There are various classifications of Internet attacks namely,
by the goal of the attacker to
by the effect on system like
by the operating system on the target host
by the attacked service
Some of the very commonly encountered attacks are listed below:
A.4.1 Viruses
Viruses are self-replicating programs that infect and propagate through files.
Usually they will attach themselves to a file, which will cause them to be run
when the file is opened. There are several main types of viruses as identified in
the thesis of Rodgers [124], which are examined below.

Appendix A

178

File infectors

File infector viruses infect files on the victims computer by inserting themselves
into a file. Usually the file is an executable file, such as a .EXE or .COM in
Windows. When the infected file is run, the virus executes as well.
System and boot record infectors

System and boot record infectors were the most common type of virus until the
mid 1990s. These types of viruses infect system areas of a computer such as the
Master Boot Record (MBR) on hard disks and the DOS boot record on floppy
disks. By installing itself into boot records, the virus can run itself every time
the computer is booted up.
Macro viruses

Macro viruses are simply macros for popular programs, such as Microsoft Word,
that are malicious. For example, they may delete information from a document
or insert phrases into it. Propagation is usually through the infected files. If a
user opens a document that is infected, the virus may install itself so that any
subsequent documents are also infected. Often the macro virus will be attached
as an apparently benign file to fool the user into infecting themselves. The
Melissa virus [145] is the best known macro virus. The virus worked by emailing a victim with an email that appeared to come from an acquaintance. The
email contained an Microsoft Word document as an attachment, that if opened,
would infect Microsoft Word and if the victim used the Microsoft Outlook 97
or 98 email client, the virus would be forwarded to the first 50 contacts in the
victims address book. Melissa caused a significant amount of damage, as the
email sent by the virus flooded email servers.
Virus properties

Viruses often have additional properties, beyond being an infector or macro


virus. A virus may also be multi-partite, stealth, encrypted or polymorphic.
Multi-partite viruses are hybrid viruses that infect both files and system and/or
boot-records. This means multi-partite viruses have the potential to be more

Appendix A

179

damaging, and resistant. A stealth virus is one that attempts to hide its presence. This may involve attaching itself to files that are not usually seen by the
user. Viruses can use encryption to hide their payload. A virus using encryption
will know how to decrypt itself to run. As the bulk of the virus is encrypted, it
is harder to detect and analyze. Some viruses have the ability to change themselves as either time goes by, or when they replicate themselves. Such viruses
are called polymorphic viruses. Polymorphic viruses can usually avoid being
eradicated longer than other types of viruses as their signature changes.
A.4.2 Worms
A worm is a self-replicating program that propagates over a network in some
way. Unlike viruses, worms do not require an infected file to propagate. There
are two main types of worms: mass-mailing worms and network-aware worms.
Mass-mailing Worms

A mass-mailing worm is a worm that spreads through email. Once the email
has reached its target it may have a payload in the form of a virus or trojan.
Network-aware Worms

Network-aware worms generally follow a four stage propagation model. The


first step is target selection. The compromised host3 targets a host. The compromised host then attempts to gain access to the target host by exploitation.
Once the worm has access to the target host, it can infect it. Infection may
include loading trojans onto the target host, creating back doors or modifying
files. Once infection is complete, the target host is now compromised and can be
used by the worm to continue propagation. Examples are Blaster, SQL Slammer
etc.
A.4.3 Trojans
Trojans appear to be benign programs to the user, but will actually have some
malicious purpose. Trojans usually carry some payload such as remote access methods, viruses and data destruction. Trojans provide a back door for

Appendix A

180

the malicious attacker and gives them the following abilities: Session logging,
Keystroke logging, File transfer, Program installation, Remote rebooting, Registry editing, and Process management.
Logic bombs

Logic bombs are a special form of trojans that only release their payload once a
certain condition is met. If the condition is not met, the logic bomb behaves as
the program it is attempting to simulate.
A.4.4 Buffer overflows
Buffer overflows are probably the most widely used means of attacking a computer or network. They are rarely launched on their own, and are usually part of
a blended attack. Buffer overflows are used to exploit flawed programming, in
which buffers are allowed to be overfilled. If a buffer is filled beyond its capacity, the data filling it can then overflow into the adjacent memory, and then can
either corrupt data or be used to change the execution of the program. There are
two main types of buffer overflows described below.
Stack buffer overflow

A stack is an area of memory that a process uses to store data such as local
variables, method parameters and return addresses. Often buffers are declared
at the start of a program and so are stored in the stack. Each process has its
own stack, and its own heap. Overflowing a stack buffer was one of the first
types of buffer overflows and is one that is commonly used to gain control of
a process. In this type of buffer overflow, a buffer is declared with a certain
size. If the process controlling the buffer does not make adequate checks, an
attacker can attempt to put in data that is larger than the size of the buffer. An
attacker may place malicious code in the buffer. Part of the adjacent memory
will often contain the pointer to the next line of code to execute. Thus, the buffer
overflow can overwrite the pointer to point to the beginning of the buffer, and
hence the beginning of the malicious code. Thus, the stack buffer overflow can
give control of a process to an attacker.

Appendix A

181

Heap overflows

Heap overflows are similar to stack overflows but are generally more difficult to
create. The heap is similar to the stack, but stores dynamically allocated data.
The heap does not usually contain return addresses like the stack, so it is harder
to gain control over a process than if the stack is used. However, the heap
contains pointers to data and to functions. A successful buffer overflow will
allow the attacker to manipulate the processs execution. An example would be
to overflow a string buffer containing a filename, so that the filename is now an
important system file. The attacker could then use the process to overwrite the
system file (if the process has the correct privileges).
A.4.5 Denial of Service attacks
Denial of Service (DoS) attacks [146], sometimes known as nuke attacks, are
designed to deny legitimate users of a system from accessing or using the system
in a satisfactory manner. DoS attacks usually disrupt the service of a network or
a computer, so that it is either impossible to use, or its performance is seriously
degraded. There are three main types of DoS attacks: host based, network based
and distributed.
Host-based DoS

Host based DoS attacks aim at attacking computers. Either a vulnerability in


the operating system, application software or in the configuration of the host
are targeted. Resource hogging is a possible way of DoS on a host. Resources
such as CPU time and memory use are the most common targets. Crashers are
a form of host based DoS that are simply designed to crash the host system,
so that it must be restarted. Crashers usually target a vulnerability in the hosts
operating system. Many crashers work by exploiting the implementation of network protocols by various operating systems. Some operating systems cannot
handle certain packets, and if received cause the operating system to hang or
crash.

Appendix A

182

Network-based DoS

Network based DoS attacks target network resources in an attempt to disrupt


legitimate use. Network based DoS usually flood the network and the target
with packets. To succeed in flooding, more packets than the target can handle
must be sent, or if the attacker is attacking the network, enough packets must
be flooded so that the bandwidth left for legitimate users is severely reduced.
Three main methods of flooding have been identified:
TCP Floods: TCP packets are streamed to the target.
ICMP Echo Request/Reply6: ICMP packets are streamed to the target.
UDP Floods: UDP packets are streamed to the target.
In addition to a high volume of packets, often packets have certain flags set to
make them more difficult to process. If the target is the network, the broadcast
address7 of the network is often targeted. One simple way of reducing network
bandwidth is through a ping flood. Ping floods can be created by sending ICMP
request packets of a large size to a large number of addresses (perhaps through
the broadcast address) at a fast rate.
Distributed DoS

Distributed DoS (DDoS) attacks are a recent development in computer and network attack methodologies. The DDoS attack methodology was first seen in
1999 with the introduction of attack tools such as The DoS Projects Trinoo[36,
21], The Tribe Flood Network[1, 21] and Stacheldraht8[37]. DDoS attacks
work by using a large number of attack hosts to direct a simultaneous attack on
a target or targets. Figure A.2 shows the process of a DDoS attack. Firstly, the
attacker commands the master nodes to launch the attack. The master nodes
then order all daemon nodes under them to launch the attack. Finally the daemon nodes attack the target simultaneously, causing a denial of service. With
enough daemon nodes, even a simple web page request will stop the target from
serving legitimate user requests.
The DDoS attack takes place when many compromised machines infected by
the malicious code act simultaneously and are coordinated under the control of
a single attacker in order to break into the victims system, exhaust its resources,

Appendix A

183

Figure A.2: Distributed Denial of Service

and force it to deny service to its customers. There are mainly two kinds of
DDoS attacks[ 10]: typical DDoS attacks and distributed reflector DoS (DRDoS) attacks. In a typical DDoS attack, the army of the attacker consists of
master zombies and slave zombies. The hosts of both categories are compromised machines that have arisen during the scanning process and are infected by
malicious code. The attacker coordinates and orders master zombies and they,
in turn, coordinate and trigger slave zombies. More specifically, the attacker
sends an attack command to master zombies and activates all attack processes
on those machines, which are in hibernation, waiting for the appropriate command to wake up and start attacking. Then, master zombies, through those
processes, send attack commands to slave zombies, ordering them to mount a
DDoS attack against the victim. In that way, the agent machines (slave zombies) begin to send a large volume of packets to the victim, flooding its system
with useless load and exhausting its resources.
Unlike typical DDoS attacks, in DRDoS attacks the army of the attacker consists of master zombies, slave zombies, and reflectors[11]. The scenario of this

Appendix A

184

Figure A.3: Distributed Reflector DoS (DRDoS)

type of attack is the same as that of typical DDoS attacks up to a specific stage.
The attackers have control over master zombies, which, in turn, have control
over slave zombies. The difference in this type of attack is that slave zombies are led by master zombies to send a stream of packets with the victims
IP address as the source IP address to other uninfected machines (known as
reflectors), exhorting these machines to connect with the victim. Then the reflectors send the victim a greater volume of traffic, as a reply to its exhortation
for the opening of a new connection, because they believe that the victim was
the host that asked for it. Therefore, in DRDoS attacks, the attack is mounted
by noncompromised machines, which mount the attack without being aware of
the action. Comparing the two scenarios of DDoS attacks, we should note that a
DRDoS attack is more detrimental than a typical DDoS attack. This is because
a DRDoS attack has more machines to share the attack, and hence the attack
is more distributed. A second reason is that a DRDoS attack creates a greater
volume of traffic because of its more distributed nature. Figure A.3 graphically
depicts a DRDoS attack. The general taxonomy of the DDoS is shown in figure

Appendix A

185

Figure A.4: Taxonomy of Distributed DoS

A.4.
A.4.6 Network-based attacks
This section describes several kinds of attacks that operate on networks and
the protocols that run the networks. Network spoofing is the process in which
an attacker passes themselves off as someone else. There are several ways of
spoofing in the standard TCP/IP network protocol stack, including: MAC address spoofing at the data-link layer and IP spoofing at the network layer. By
spoofing who they are, an attacker can pretend to be a legitimate user or can
manipulate existing communications from the victim host.
Session Hijacking Session hijacking is the process by which an attacker takes
over a session taking place between two victim hosts. The attack essentially
cuts in and takes over the place of one of the hosts. Session hijacking usually
takes place at the TCP layer, and is used to take over sessions of applications
such as Telnet and FTP. TCP session hijacking involves use of IP spoofing, as
mentioned above, and TCP sequence number guessing. To carry out a successful TCP session hijacking, the attacker will attempt to predict the TCP sequence
number that the session being hijacked is up to. Once the sequence number has
been identified, the attacker can spoof their IP address to match the host they are
cutting out and send a TCP packet with the correct sequence number. The other

Appendix A

186

host will accept the TCP packet, as the sequence number is correct, and will start
sending packets to the attacker. The cut out host will be ignored by the other
host as it will no longer have the correct sequence number. Sequence number
prediction is most easily done if the attacker has access to the IP packets passing
between the two victim hosts. The attacker simply needs to capture packets and
analyze them to determine the sequence number. If the attacker does not have
access to the IP packets, then the attacker must guess the sequence number.
A.4.7 Password attacks
An attacker wishing to gain control of a computer, or a users account, will often
use a password attack to gain the needed password. Many tools exist to help
the attacker uncover passwords. Password Guessing/Dictionary Attack Password guessing is the most simplest of password attacks. It simply involves the
attacker attempting to guess the password. Often the attacker will use a form
of social engineering to gain clues as to what the password is. A dictionary
attack is similar, but is a more automated attack. The attacker uses a dictionary of words containing possible passwords and uses a tool to see if any are
the required password. Brute force attacks work by calculating every possible combination that could make up a password and testing it to see if it is the
correct password.
A.4.8 Information gathering attacks
The attack process usually involves information gathering. Information gathering is the process by which the attacker gains valuable information about potential targets, or gains unauthorized access to some data without launching an
attack. Information gathering is passive in the sense that no attacks are explicitly launched. Instead networks and computers are sniffed, scanned and probed
for information.
Sniffing

Packet sniffers are a simple but invaluable tool for anyone wishing to gather
information about a network or computer. For the attacker, packet sniffers provide a way to glean information about the host or person they wish to attack,

Appendix A

187

and even gain access to unauthorized information. Traditional packet sniffers


work by putting the attackers Ethernet card into promiscuous mode. An Ethernet card in promiscuous mode accepts all traffic from the network, even when
a packet is not addressed to it. This means the attacker can gain access to any
packet that is traversing on the network they are on. By gathering enough of the
right packets the attacker can gain information such as login names and passwords. Other information can also be gathered, such as MAC and IP addresses
and what services and operating systems are being run on specific hosts. This
form of attack is very passive. The attacker is not sending any packets out, they
are only listening to packets on the network.
Mapping

Mapping is used to gather information about hosts on a network. Information


such as what hosts are online, what services are running and what operating
system a host is using, can all be gathered via mapping. Thus potential targets
and the layout of the network, are identified. Host detection is achieved through
a variety of methods. Simple ICMP queries can be used to determine if a host
is on-line. TCP SYN messages can be used to determine whether or not a port
on a host is open and thus, whether or not the host is on-line. After detecting
if a host is on-line, mapping tools can be used to determine what operating
system and what services are running on the host. Running services are usually
identified by attempting to connect to a hosts ports. Port scanners are programs
that an attacker can use to automate this process. Basic port scanners work
by connecting to every TCP port on a host and reporting back which ports were
open. Either the attacker has to choose an attack using the information gathered,
or more information needs to be gathered through security scanning, discussed
below.
Security scanning

Security scanning is similar to mapping, but is more active and more information is gathered. Security scanning involves testing a host for known vulnerabilities or weaknesses that could be exploited by the attacker. For example, a
security scanning tool may be able to tell the attacker that port 80 of the target
is running an HTTP server, with a specific vulnerability.

Appendix A

188

A.4.9 Blended attacks


While blended attacks are not a new development, they have recently become
popular with attacks such as Code Red and Nimda. Blended attacks are attacks that contain multiple threats, for example multiple means of propagation
or multiple attack payloads. Many of the attacks mentioned previously in this
appendix can be considered as blended. The first instance of a blended attack
occurred in 1988 with the first Internet worm named as Morris Worm. The
Internet is especially susceptible to blended threats, as was shown by the recent SQL Slammer attack, in which the Internet suffered a significant loss of
performance.

A.5 Top ten cyber security menaces for 2008


A list of the attacks most likely to cause substantial damage during 2008 was
compiled by experts [138] in ranked order is provided below:
1. Increasingly sophisticated web site attacks that exploit browser vulnerabilities - especially on trusted web sites
Web site attacks on browsers are increasingly targeting components, such
as Flash and QuickTime, that are not automatically patched when the
browser is patched. Placing better attack tools on trusted sites is giving
attackers a huge advantage over the unwary public.
2. Increasing sophistication and effectiveness in botnets
The Storm worm started spreading in January, 2007 with an email saying,
230 dead as storm batters Europe, and was followed by subsequent variants. Within a week it accounted for one out of every twelve infections on
the Internet, installing rootkits and making each infected system a member
of a new type of botnet. Previous botnets used centralized command and
control; the Storm worm uses peer-to-peer control.
3. Cyber espionage efforts by well resourced organizations looking to extract
large amounts of data - particularly using targeted phishing

Appendix A

189

Economic espionage will be increasingly common as nation-states use cyber theft of data to gain economic advantage in multinational deals. The
attack of choice involves targeted spear phishing with attachments, using
well-researched social engineering methods to make the victim believe that
an attachment comes from a trusted source, and using newly discovered
Microsoft Office vulnerabilities and hiding techniques to circumvent virus
checking.
4. Mobile phone threats, especially against iPhones and android-based phones;
plus VOIP
Mobile phones are general purpose computers, so worms, viruses, and
other malware will increasingly target them. A truly open mobile platform
will usher in completely unforeseen security nightmares. The developer
toolkits provide easy access for hackers.
Attacks on VoIP systems are on the horizon and may surge in the coming
years. VoIP phones and the IP PBXs have had numerous published vulnerabilities. Attack tools exploiting these vulnerabilities have been written
and are available on the Internet.
5. Insider attacks
Insider attacks are initiated by rogue employees, consultants and/or contractors of an organization. Insider-related risk has long been exacerbated
by the fact that insiders usually have been granted some degree of physical
and logical access to systems, databases, and networks that they attack,
giving them a significant head start in attacks that they launch. More recently, however, security perimeters have broken down, something that
allows insiders to attack both from the inside and from outside an organizations network boundaries.
6. Advanced identity theft from persistent bots
A new generation of identity theft is being powered by bots that stay
on machines for three to five months collecting passwords, bank account

Appendix A

190

information, surfing history, frequently used email addresses, and more.


They gather enough data to enable extortion attempts and advanced identify theft attempts where criminals have enough data to pass basic security
checks.
7. Increasingly malicious Spyware
Tools that also increasingly target and dodge anti-virus, anti-spyware, and
anti-rootkit tools to help preserve the attackers control of a victim machine
for as long as possible will become more common.
8. Web application security exploits
Large percentages of web sites have cross site scripting, SQL injection, and
other vulnerabilities resulting from programming errors. Web 2.0 applications are vulnerable because user-supplied data cannot be trusted; your
script running in the users browser still constitutes user supplied data.
9. Increasingly sophisticated social engineering including blending phishing
with VOIP and event phishing
Blended approaches will amplify the impact of many more common attacks. For example, the success of phishing is being radically increased
by first stealing IDs of users of other technologies. Tax filing scams and
scams based on the U.S. Presidential elections were widely used this year,
and many of them have succeeded. A second area of blended phishing
combines email and VoIP. An inbound email, apparently being sent by a
credit card company, asks recipients to re-authorize their credit cards by
calling a 1-800 number. The number leads them (via VoIP) to an automated system in a foreign country that, quite convincingly, asks that they
key in their credit card number, CVV, and expiration date.
10. Supply chain attacks infecting consumer devices (USB Thumb Drives,
GPS Systems, Photo Frames, etc.)
Distributed by Trusted Organizations Retail outlets are increasingly becoming unwitting distributors of malware. Devices with USB connections

Appendix A

191

and the CDs packaged with those devices sometimes contain malware that
infect victims computers and connect them into botnets.

A.6 Conclusion
Even though a lot of attacks have been listed above, many of the terms tend not
to be mutually exclusive. For example, the virus may contain a logic bomb, so
that the categories overlap. Also, any successful attack may get classified into
multiple categories since attackers use multiple methods. This makes the classification ambiguous and difficult to repeat.
We conclude by stating Cohen [133], ...a complete list of the things that can go
wrong with the information systems is impossible to create. People have tried
to make comprehensive lists, and in some cases have produced encyclopedic
volumes on the subject, but there are a potentially infinite number of different
problems that can be encountered, so any list can only serve a limited purpose.

Appendix B
Intrusion Detection Systems: A survey
If I have been able to see farther than others, it was because I stood on the
shoulder of giants.
Sir Isaac Newton

B.1 Introduction
Intrusion Detection is a rapidly evolving and changing technology. Even though
the blooming took place in the early 1980s, all the early intrusion detection work
was done as research projects for US government and military organizations.
The major works in intrusion detection has happened in the mid and late 1990s
along with the explosion of the Internet. The early research work in the field of
intrusion detection often focused on host-based solutions, but the drastic growth
of networking changed the later efforts to be concentrated on network-based
systems. The tools discussed here reflect a core of active research that has
happened in the last two decades. Several surveys have indeed been published
in the past [147, 148, 149, 150, 152, 155], but the growth of IDSs has been
such that a lot of IDSs have appeared in the meantime. This survey hence tries
to present an updated view by starting with the historical developments in the
field of intrusion detection from the perspective of the people who did the initial
research and development and their projects, providing us with a better insight
into the motivation behind it.

192

Appendix B

193

B.2 History of Intrusion Detection Systems


James P Anderson is acknowledged as the first person to document the need for
automated audit trail review to support security goals for the US Department of
Defense in 1978. He published the Reference Monitor concept Computer Security Technology Planning Study 2 in a planning study for US Air Force and this
report is considered to be the seminal work on intrusion detection. Anderson
also published a paper Computer Security Threat Monitoring and Surveillance
[154] in 1980 and this is widely considered to be the first real work in the area
of intrusion detection. The paper proposes taxonomy of classifying internal and
external threats to computer systems. He points out that when a violation occurs, in which the attacker attains the highest level of privilege, such as root
or super user in UNIX, there is no reliable remedy. He also comments on the
problems associated with masqueraders for which he suggests that some sort of
statistical analysis of user behavior, capable of determining unusual patterns of
system use, might represent a way of detecting masqueraders. This suggestion
was tested in the next milestone in Intrusion detection, the IDES project.
The US Navys Space and Naval Warfare Systems Commands (SPAWARS) in
1984 funded a project to research and develop a model for a real-time intrusion
detection system and Dorothy Denning and Peter Neumann came up in 1988
with the Intrusion Detection Expert System (IDES) model. The rare or unusual
traces of traffic were referred to as anomalous and the assumptions made in this
project served as the basis for many intrusion detection research and system
prototypes of the late 1980s. The IDES model is based on the use statistical
metrics and models to describe the behavior of benign users. The IDES prototype used hybrid architecture, comprising an anomaly detector and an expert
system. The anomaly detector used statistical techniques to characterize abnormal behavior. The expert system used a rule-based approach to detect known
security violations. The expert system was included to mitigate the risk that a
patient intruder might gradually change his behavior over a period of time to
defeat the anomaly detector. This situation was possible because the anomaly
detector adapted to gradual changes in behavior to minimize false alarms.
Dennings paper on An Intrusion Detection Model [7] in 1986 illustrates the

Appendix B

194

model of a real-time intrusion-detection expert system capable of detecting


break-ins, penetrations, and other forms of computer abuse. The model is based
on the hypothesis that security violations can be detected by monitoring a systems audit records for abnormal patterns of system usage. The model includes
profiles for representing the behavior of subjects with respect to objects in terms
of metrics and statistical models, and rules for acquiring knowledge about this
behavior from audit records and for detecting anomalous behavior. The model
is independent of any particular system, application environment, system vulnerability, or type of intrusion, thereby providing a framework for a generalpurpose intrusion-detection expert system. This paper is considered to be the
stepping-stone for all the further works in this field. In the following years, an
ever-increasing number of research prototypes are explored. Several of these
efforts will be looked at in brief and more details are available in [182].
B.2.1 The emergence of intrusion detection systems
In 1984, the US Navys SPAWARS funded a research project Audit Analysis at
Sytek and the prototyped system utilized data collected at shell level of a UNIX
machine running in a research environment. The data was then analyzed by
using database tools. This research helped in identifying the normal system usage from the abnormal system usage. The researchers were Lawrence Halme,
Teresa Lunt and John Van Horne.
In 1985 an internal research and development project named Discovery started
at TRW and this monitored the TRWs online credit database application and
not the operating system for intrusions and misuse. Discovery used a statistical
engine to locate patterns in the input data and an expert system detecting and
deterring problems in TRWs online credit database. The principal investigator
was William Tener. Haystack [158] was developed for the US Air Force in 1988
to help security officers detect insider abuse of Air Force Standard Base Level
Computers. Haystack was implemented on an Oracle database management
system and performed anomaly detection in batch mode. Haystack characterized the information from system audit trails as sets of features like session
duration, number of files opened, number of pages printed, number of CPU resources consumed in the session and the number of sub processes created in the

Appendix B

195

session. It used a two-stage statistical analysis to detect anomalies in system activities. The first stage checked each session for unusual activity and the second
stage used a statistical test to detect trends in sessions. The combination of the
two techniques was designed to allow detection of both out-of-bounds activities
as well as activities that gradually deviated from normal over a period of time.
The principal investigators were Smaha and Stephen.
Almost the same time, Multics Intrusion Detection and Alerting System (MIDAS) was developed by the National Computer Security Center to monitor
NCSCs Dockmaster system, which is a highly secure operating system. The
MIDAS was designed to take data from Dockmasters answering system audit
log and used a hybrid analysis strategy, combining statistical anomaly detection
with expert system rule-based approaches. In 1989, Wisdom and Sense from
Los Alamos National Laboratory and Information Security Officers Assistant
(ISOA) from Planning Research Corporation were developed.
In 1990 Kerr and Susan reported all the experimental as well as actually implemented IDSs in the Datamation report titled Using AI to improve security.
In the same year, an audit trail analysis tool Computer Watch was developed
by AT&T and was designed to consume operating system audit trails generated
by UNIX system. An expert system was used to summarize system securityrelevant events and a statistical analyzer and query mechanism allowed statistical characterization of system-wide events. Network System Monitor (NSM)
was developed at the University of California at Davis in 1990, to run on a Sun
UNIX workstation. NSM was the first system, monitoring network traffic and
using that traffic as the primary data source. NSM was a significant milestone in
intrusion detection research because it was the first attempt to extend intrusion
detection to heterogeneous network environments. Principal researchers were
Levitt, Heberlein and Mukherjee.
Network Audit Director and Intrusion Reporter (NADIR) was developed in 1991
by the Computer division of Los Alamos National Laboratory to monitor user
activities on the Integrated Computing Network (ICN) at Los Alamos. NADIR
performs a combination of expert rule-based analysis and statistical profiling.

Appendix B

196

NADIR being a successful intrusion detection system has been extended to


monitor systems beyond the ICN at Los Alamos. Shieh et al. in 1991 presented
a paper A pattern oriented ID model and its applications, with an entirely new
approach, which mentions that a pattern-oriented ID model can analyze object
privilege and data flows in secure computer systems to detect operational security problems. This model addresses context-dependent intrusion and complements the then popular statistical approaches to ID. In the same year, Snapp and
Steven in their paper A system for distributed ID [164] presented a proposed
architecture consisting of the following components: a host manager with a
collection of processes running in background, a LAN manager for monitoring
each LAN in the system and a central manager that receives reports from various hosts and LAN managers and processes these reports, correlates them and
detects intrusions.
The US Air Force in 1992 funded the research for the Distributed Intrusion
Detection System (DIDS)[169]; a major initiative to integrate host and networkbased monitoring approaches. Until 1990, intrusion detection systems were
mostly host-based and then in1990 NSM extended intrusion detection to the
network environment. Integrating the host and network-based IDS offers advantages and disadvantages of both the approaches. It resolves many of the
problems associated with promiscuous network monitoring, while maintaining
the ability to observe the entire communication between victim and attacker.
The principal investigator for DIDS was Steve Smaha.
USTAT, a real-time IDS for Unix [196] was introduced in 1993 by Ilgun and
Koral. USTAT is a state-transition analysis tool for Unix. This is a Unix specific implementation of a generic design STAT, state-transition analysis tool. In
STAT, a penetration is identified as a sequence of state changes that take the
computer system from some initial stage to a target compromised stage. This
approach differs from other rule-based penetration identification tools that pattern match sequences of audit records. Helman and Paul in 1993 came up with a
paper on Statistical foundations of audit trail analysis for the detection of computer misuse where the modeling of computer transactions is done, as generated

Appendix B

197

by two stationary stochastic processes, the normal process and the misuse process. Misuse detection is identification of transactions most likely to have been
generated by misuse process.
In 1994, Crosbie and Spafford suggested the use of autonomous agents in order to improve the scalability, maintainability, efficiency and fault tolerance of
an Intrusion Detection System [165]. Next generation Intrusion Detection Expert System (NIDES) [157] which was developed in 1995 is the successor to
the IDES project. It has a strong anomaly detection foundation using innovative
statistical algorithms, complemented with a signature based expert system component that encodes known intrusion scenarios. NIDES is highly modularized
and is designed to operate in real time to detect intrusions as they occur.
Christoph and Gray in 1995 has expanded NADIR to include processing of
audit and activity records for the Cray UNICOS operating system and called
UNICORN: misuse detection for UNICOS [197]. An approach to address the
scalability deficiencies in most contemporary intrusion detection systems was
proposed with the design and implementation of GrIDS. A Graph based Intrusion Detection System for large networks (GrIDS) [166] was developed in 1996,
with graphs typically codifying hosts on the network as nodes and connections
between hosts as edges between these nodes. The choice of traffic taken to represent activity in the form of edges is made on the basis of user supplied rule
sets. The graph and the edges have global and local attributes, including time
of connection etc., that are computed by user supplied rule sets. These graphs
present network events in a graphic fashion that enables the viewer to determine
if suspicious network activity is taking place.
Kosoresow and Hofmeyr in 1997 published a paper on Intrusion Detection via
System Call Traces [204]. Computer user leaves trails of activity that can reveal signatures of misuse as well as of legitimate activity. Depending on the
audit method used, one can record a users keystrokes, the system resources
used, or the system calls made by some collection of processes. Event Monitoring Enabling Responses to Anomalous Live Disturbances (EMERALD) [30]
is a framework for scalable, distributed, inter-operable computer and network

Appendix B

198

intrusion detection. It was developed in 1997 and targets both external and internal threat agents that attempt to misuse system or network resources. It is
an advanced highly software engineered environment that combines signature
based and statistical analysis components with a resolver that interprets analysis
results, all of which can be used iteratively and hierarchically.
In 1998, Anderson and Khattak offered an innovative approach to intrusion
detection, by incorporating information retrieval techniques into intrusion detection tools. Bonifacio and Mauricio in 1998 were the first to introduce the
application of Neural networks in IDSs [199]. The system works by capturing
packets and uses neural network to identify an intrusive behavior within the
analyzed data stream. The identification is based on the previous well known
intrusion profiles. The system is adaptive, since new profiles can be added to the
database and the neural network re-trained to consider them. The paper presents
the proposed model, the results achieved and the analysis of an implemented
prototype. In 1998 a stand-alone system named Bro [200] for detecting network intruders in real-time by passively monitoring a network link over which
the intruders traffic transits was introduced by Paxson and Vern. Bro would
make high-speed, large volume monitoring of network traffic possible without
dropping packets and it also provides a real-time notification of ongoing or attempted attacks. The system is extensible since it is easy to add knowledge of
new types of attack. Bro contains mechanism to withstand attacks and hence is
the first to incorporate that theory into practice.
Huang and Ming-Yuh in 1999 introduces a large scale distributed ID architecture based on IDS agents and collaborative attack strategy analysis which
creates an opportunity for IDS agents to pro-actively look ahead for data most
pertinent to current case development. This look ahead adaptive behavior focuses limited system resources on collecting and auditing those events which
are most likely to reveal intrusions.
In 2000, Ning et al. presented the paper on Modelling requests among cooperating IDSs [201]. IDSs have to share information in order to discover attacks

Appendix B

199

involving multiple sites. This paper proposes a formal framework modeling requests among the cooperating IDSs. Dickerson and John had a different idea
which they presented in the paper on Fuzzy network profiling for Intrusion Detection in 2000. The Fuzzy Intrusion Recognition Engine (FIRE) is an anomalybased IDS that uses fuzzy logic to assess whether malicious activity is taking
place on the network. It uses simple data mining techniques to process the
network input data and helps expose metrics that are particularly significant to
anomaly detection. These metrics are then evaluated as fuzzy sets. Kent and
Stephen presented the paper On the trail of intrusions into information systems
during the same time.
Luo and Jianxiong in 2000 published their paper on Mining fuzzy association
rules and fuzzy frequency episodes for Intrusion Detection [203]. Lee, Stolfo
and Mok previously reported the use of association rules and frequency episodes
for mining audit data to gain knowledge for intrusion detection. Experimental
results show the utility of fuzzy association rules and fuzzy frequency episodes
for intrusion detection. Luo and Jianxiong published another paper on Fuzzy
frequent episodes for real-time intrusion detection in 2001. Data mining methods including association rule mining and frequent episode mining have been
applied to the intrusion detection problem.
In 2001, Balajinath in his paper Intrusion detection through learning behavior model observes that normally the users exhibit regularities in their usage of
commands of a system, as they tend to achieve the same or perhaps similar objective. Hence it is popularly known that the command sequences can be used
to characterize the user behavior . Deviations from the characteristic behavior
pattern of a user can be used to detect potential intrusions.
In 2002, Mukkamala and Srinivas suggested the use of neural networks and support vector machines in intrusion detection. Their paper on Intrusion detection
using neural networks and support vector machines describes these approaches
to intrusion detection and also compares the two methods. Lichodzijewski and
Peter described the Host-based intrusion detection using self-organizing maps
in 2002. Hierarchical SOMs are applied to the problem of host based intrusion

Appendix B

200

detection on computer networks. Unlike systems based on operating system


audit trails, the approach operates on real-time data without extensive off-line
training and with minimal expert knowledge.
Forrest et al. [188] presented one of the first papers analyzing sequences of
system calls issued by a process for intrusion detection. In 2002, Dasgupta and
Gonzalez in their paper An immunity-based technique to characterize intrusions
in computer networks [192] present a technique inspired by the negative selection mechanism of the immune system that can detect foreign patterns in the
complement (nonself) space. In particular, the novel pattern detectors (in the
complement space) are evolved using a genetic search, which could differentiate varying degrees of abnormality in network traffic. The paper demonstrates
the usefulness of such a technique to detect a wide variety of intrusive activities on networked computers. Also a positive characterization method based
on a nearest-neighbor classification is used. Experiments are performed using
intrusion detection data sets and tested for validation. Seleznyov and Alexander
presented the paper Learning temporal patterns for anomaly intrusion detection
in 2002. Being able to accurately recognize its legitimate users a system may
effectively detect masqueraders.
In 2002 Krugel and Christopher presented a paper Service specific anomaly
detection for network intrusion detection. The paper presents an approach that
utilizes application specific knowledge of the network services that should be
protected. This information helps to extend current, simple network traffic models to form an application model that allows to detect malicious content hidden
in single network packets. The features of the proposed model is described and
experimental data that underlines the efficiency of the system is also presented.
Gao and Bo introduces HMMs (Hidden Markov models) based on anomaly intrusion detection method where the key idea is to use HMMs to learn the (normal and abnormal) patterns of Unix processes.
In 2002, Cho and Sung-Bae incorporates soft computing techniques into a probabilistic intrusion detection system. There are a lot of industrial applications
that can be solved competitively by hard computing, while still requiring the

Appendix B

201

tolerance for imprecision and uncertainty that can be exploited by soft computing. This paper presents a novel intrusion detection system (IDS) that models
normal behaviors with hidden Markov models and attempts to detect intrusions
by noting significant deviations from the models. At almost the same time,
Abouzakhar and Nasser came up with An intelligent approach to prevent distributed systems attacks. This paper proposes an innovative way to counteract
distributed protocols attacks such as distributed denial of service (DDoS) attacks using intelligent fuzzy agents. Cansian and Adriano in the paper An attack signature model to computer security intrusion detection mention internal
and external computer network attacks or security threats occur according to
standards and follow a set of subsequent steps, allowing to establish profiles or
patterns. This well-known behavior is the basis of signature analysis intrusion
detection systems. This work presents a new attack signature model to be applied on network-based intrusion detection systems engines.
Zhao and Jun-Zhong in their paper An intrusion detection system based on data
mining and immune principles describe a framework of immune-based intrusion detection system (IDS). Here data mining techniques are used to discover
frequently occurred patterns. Ouyang and Ming-Guang presented A fuzzy comprehensive evaluation based distributed intrusion detection. Fuzzy Decision
Engine (FDE), which is a component of the detection agent in a distributed intrusion detection system, can consider various factors based on fuzzy comprehensive evaluation when an intrusion behavior is judged. Kumar and Parimal
in their paper on Detection of port-scans and OS fingerprinting using clustering explain the port-scanning and OS fingerprinting exploit vulnerabilities of
TCP/IP for intrusion in a computer network.
In 2002, Sekar in his paper Specification-based anomaly detection: A new approach for detecting network intrusions introduced a different idea. Specificationbased techniques have been shown to produce a low rate of false alarms, but are
not as effective as anomaly detection in detecting novel attacks, especially when
it comes to network probing and denial-of-service attacks. This paper presents
a new approach that combines specification-based and anomaly-based intrusion
detection, mitigating the weaknesses of the two approaches while magnifying

Appendix B

202

their strengths. The approach begins with state-machine specifications of networks protocols, and augments these state machines with information about
statistics that need to be maintained to detect anomalies.
Inoue and Hajime in the paper on Anomaly intrusion detection in dynamic execution environments describe an anomaly intrusion-detection system for platforms that incorporate dynamic compilation and profiling. This approach called
dynamic sandboxing gathers information about behavior of applications, usually unavailable to other anomaly intrusion detection systems, and is able to
detect anomalies at the application layer. This implementation is shown to be
both effective and efficient at stopping a backdoor and a virus, and has a low
false positive rate. Taylor and Carol in 2002 presented a paper on An empirical analysis of NATE - Network analysis of Anomalous Traffic Events. This
paper presents results of an empirical analysis of NATE (Network Analysis of
Anomalous Traffic Events), a lightweight, anomaly based intrusion detection
tool.
Mahoney and Chan have done credible work in detecting the novel attacks and
presented a paper on Learning nonstationary models of normal network traffic
for detecting novel attacks in 2002. The paper proposes a learning algorithm
that constructs models of normal behavior from attack-free network traffic. Behavior that deviates from the learned normal model signals possible novel attacks. This IDS is unique in two respects. First, it is nonstationary, modeling
probabilities based on the time since the last event rather than on average rate.
This prevents alarm floods. Second, the IDS learns protocol vocabularies (at
the data link through application layers) in order to detect unknown attacks that
attempt to exploit implementation errors in poorly tested features of the target
software.
Kemmerer and Richard in 2003 presented a paper on Internet security and intrusion detection which highlights the principal attack techniques that are used
in the Internet today and possible countermeasures. In particular, intrusion detection techniques are analyzed in detail. This paper mixes a practical character
with a discussion of the current research in the field. Feng and Hanping came

Appendix B

203

up with a paper on Anomaly detection using call stack information in 2003. The
call stack of a program execution can be a very good information source for
intrusion detection. There is no prior work on dynamically extracting information from call stack and effectively using it to detect exploits. In this paper, a
new method is proposed to do anomaly detection using call stack information.
The basic idea is to extract return addresses from the call stack, and generate
abstract execution path between two program execution points. Experiments
show that this method can detect some attacks that cannot be detected by other
approaches, while its convergence and false positive performance is comparable
to or better than the other approaches.
In 2003, Ling and Jun in the paper Novel immune system model and its application to network intrusion detection analyzes the techniques and architecture of
existing network Intrusion Detection Systems, and probes into the fundamentals of Immune System (IS), a novel immune model is presented and applied to
network IDS, which is helpful to design an effective IDS. Besides, this paper
suggests a scheme to represent the self profile of network. And an automated
self profile extraction algorithm is provided to extract self profile from packets.
Almost the same time, Tapiador and Juan in their paper on NSDF: A computer
network system description framework and its application to network security
describe a general framework, termed NSDF, for describing network systems.
Both entities and relationships are the basis underlying the concept of system
state. The dynamics of a network system can be conceived of as a trajectory in
the state space. The term action is used to describe every event which can produce a transition from one state to another. These concepts (entity, relationship,
state, and action) are enough to construct a model of the system. Evolution and
dynamism are easily captured, and it is possible to monitor the behavior of the
system.
In 2003, Xiang and Ga in their paper Generating IDS attack pattern automatically based on attack tree illustrate the generation of attack pattern automatically based on attack tree. The extending definition of attack tree is proposed
and the algorithm of generating attack tree is presented. The method of generating attack pattern automatically based on attack tree is shown, which is tested

Appendix B

204

by concrete attack instances. The results show that the algorithm is effective and
efficient. The efficiency of generating attack pattern is improved and the attack
trees can be reused. In 2003, Gao and Meimei worked on a paper Fuzzy intrusion detection based on fuzzy reasoning Petri Nets. Fuzzy rule-based technique,
combining fuzzy logic and expert system methodology, not only is capable to
deal with uncertainty in intrusion detection but also allows the most flexible
reasoning about the widest variety of information possible. It can be used in
both anomaly and misuse detections. This paper presents a method for detecting intrusion based on fuzzy rule-based technique. Fuzzy Reasoning Petri Nets
(FRPN) model is used to represent fuzzy rule base and to derive the final detection decision as an inference engine. FRPN have parallel reasoning ability and
are readily used into real time detection.
In 2003, Sarawagi and Sunita in their paper on Sequence data mining techniques
and applications comment that many interesting real-life mining applications
rely on modeling data as sequences of discrete multi-attribute records. Mining
models for network intrusion detection view data as sequences of TCP/IP packets. Erbacher and Robert in 2003 presented a paper on Analysis and Application
of Node Layout Algorithms for Intrusion Detection. The proposed monitoring
environment aids system administrators in keeping track of the activities on
such systems with much lower time requirements than that of perusing typical
log files. With many systems connected to the network the task becomes significantly more difficult. If an attack is identified on one system then all systems
have likely been attacked. The ability to correlate activity among multiple machines is critical for complete analysis and monitoring of the environment. This
paper discusses the layout techniques experimented with and their effectiveness.
Zhong and Shao-Chun presented a paper on A safe mobile agent system for
distributed intrusion detection where some applications of the technology of
mobile agent (MA) in Intrusion detection system have been developed. MA
technology can bring IDS flexibility and enhanced distributed detection ability.
The MA-IDS architecture and detail methods of local intrusion detection and
distributed intrusion detection are presented. Sabhnani and Maheshkumar in
their work on Application of Machine Learning Algorithms to KDD Intrusion

Appendix B

205

Detection Dataset within Misuse Detection Context in 2003 comments that a


small subset of machine learning algorithms, mostly inductive learning based,
applied to the KDD 1999 Cup intrusion detection dataset resulted in dismal
performance for user-to-root and remote-to-local attack categories as reported
in the recent literature. This paper evaluates performance of a comprehensive
set of pattern recognition and machine learning algorithms on four attack categories as found in the KDD 1999 Cup intrusion detection data set. Results
of simulation study implemented to that effect indicated that certain classification algorithms perform better for certain attack categories: a specific algorithm
specialized for a given attack category. Consequently, a multi-classifier model,
where a specific detection algorithm is associated with an attack category for
which it is the most promising, was built. Empirical results obtained through
simulation indicate that noticeable performance improvement was achieved for
probing, denial of service, and user-to-root attacks.
Sabhnani and Maheshkumar continued with the work and in another paper on
Formulation of a heuristic rule for misuse and anomaly detection for U2R attacks in solaris operating system environment proposes a heuristic rule for detection of user-to-root (U2R) attacks against Solaris operating system. Relevant
features for developing heuristic rules were manually mined using Solaris Basic
Security Module audit data. Results show that all user-to-root attacks exploiting the suid program were detected with 100% probability and with zero false
alarms. The rule can detect both successful and unsuccessful U2R attempts
against the Solaris operating system. The proposed rule is general enough to
detect any U2R attack that leverages the buffer overflow technique. Empirical
results indicate that the rule also detected novel user-to-root attacks in DARPA
1998 intrusion detection dataset.
Heo and Young-Jun presented a paper on Defeating DoS attacks using wavelet
analysis which propose a new approach for detection toward a DoS and DDoS
attack. The use of LRU cache filter and Wavelet approach to analyze characteristics of network traffic anomalies considering the elicit changes in the wavelet
variance to be a potential DoS attack, and comparing wavelet variance with flow

Appendix B

206

profile to validate attack. Amo and Sandra in 2003 presented the paper on Mining generalized sequential patterns using genetic programming. They propose
a new kind of sequential pattern called Generalized Sequential Pattern, and introduce the problem of mining generalized sequential patterns over temporal
databases.
Ye and Nong in 2004 had a paper on Robustness of the Markov-chain model for
cyber-attack detection. This paper presents a cyber-attack detection technique
through anomaly-detection, and discusses the robustness of the modeling technique employed. In this technique, a Markov-chain model represents a profile of
computer-event transitions in a normal/usual operating condition of a computer
and network system. The Markov-chain model of the norm profile is generated
from historic data of the systems normal activities. The observed activities of
the system are analyzed to infer the probability that the Markov-chain model of
the norm profile supports the observed activities. The lower probability the observed activities receive from the Markov-chain model of the norm profile, the
more likely the observed activities are anomalies resulting from cyber-attacks,
and vice versa.
Xu and Ming in their paper Anomaly detection based on system call classification aim to create a new anomaly detection model based on rules. A detailed
classification of the LINUX system calls according to their function and level
of threat is presented. The detection model only aims at critical calls (i.e. the
threat level 1 calls). In the learning process, the detection model dynamically
processes every critical call, but does not use data mining or statistics from static
data. Therefore, the increment learning could be implemented. Based on some
simple predefined rules and refining, the number of rules in the rule database
could be reduced, so that the rule match time can be reduced effectively during
detection processing. The experimental results demonstrate that the detection
model can detect R2L and U2R attacks. The detected anomaly is limited in the
corresponding requests, but not in the entire trace. The detection model is fit for
the privileged processes, especially for those based on request-responses.
In 2004, Yang and Hongyu introduced a different idea with Decision Support

Appendix B

207

Module introduced to Intrusion Detection. They presented a paper on An application of decision support to network intrusion detection in 2004 which briefs
network intrusion system, describes the design of a decision support module
(DSM) for intrusion detection system, which can provide active detection and
automated response support during intrusions. The primary function of the decision support module is to provide recommended actions and alternatives and
the implications of each recommended action. In the decision support module, the GA (genetic algorithm) was run over a subset of the data, called the
training data, and then tested over the entire data set to test real-world performance. Zhang and Lian-Hua in their paper on Intrusion detection using rough
set classification in 2004 comment that recently machine learning-based intrusion detection approaches have been subjected to extensive researches because
they can detect both misuse and anomaly. In this paper, rough set classification
(RSC), a modern learning algorithm, is used to rank the features extracted for
detecting intrusions and generate intrusion detection models.
Imamura and Kosuke in 2004 presented a paper on Potential application of
training based computation to intrusion detection comments that without detection of a network intrusion, a system is not capable of properly defending itself.
Therefore, the first step in preserving system integrity is to detect whether or not
the system is under attack. Packet analysis approaches are effective at detecting known attacks, but fail at unknown attack detection. In order to protect the
system from unknown attacks, a classifier system which is independent of the
signatures found in network packets is developed. One of the promising ways
to perform this classification is to profile kernel level activities. A probabilistically optimal classifier ensemble method is used to monitor kernel activity, and
ultimately to predict whether or not the system is under attack.
Du and Yan-Hui in their paper on Formalized description of distributed denial of service attack try to analyze, check and judge DDoS. Based on a careful
study of the attack principles and characteristics, an object-oriented formalized
description is presented, which contains a three-level framework and offers full
specifications of all kinds of DDoS modes and their features and the relations
between one another. Its greatest merit lies in that it contributes to analyzing,

Appendix B

208

checking and judging DDoS. Teng and Shaohua in their paper on Scan attack
detection model by combining feature and statistic analysis in 2004 remark that
attackers often find a host that is attacked in the Internet by scan, so a lot of
attacks can be prevented if such scan attacks are detected. Presently, there are
mainly two kinds of methods to detecting scan attacks: statistics-based detection and feature-based detection. But the high rate of false negative and false
positive makes them not very effective. In this study, a new method for detecting scan attacks is presented by combining feature with statistic analysis. It can
detect efficiently scan attacks with lower false positive rate and false negative
rate.
Teng and Shaohua again in a different paper on Case reasoning and state transition analysis for intrusion detection in 2004 make remarks that when a new
intrusion scenario is developed, many intrusion methods can be derived by exchanging the command sequences or replacing commands with the functionally
similar commands, which makes the detection of the developed intrusion very
difficult. To overcome this problem, a Case Reasoning And State Transition
Analysis (CRASTA) is proposed in this paper. For an intrusion case all the
possible derived intrusions are generated as an intrusion base and based on this
intrusion base, an efficient algorithm to detect such intrusions by using finite
automation is presented. A derived intrusion can be seen as an unknown intrusion, in this sense the technique presented can detect some unknown intrusions.
Zhao and Yuming in their paper on Study of anomaly detection based on system
call and data mining technology in 2004 introduce the categories of intrusion
detection and the methods of data mining applied in anomaly detection. It also
describes the design and implementation of the anomaly IDS based on system
calls and data mining algorithms.
Abraham [171] in 2004 investigated the suitability of linear genetic programming (LGP) technique to model fast and efficient intrusion detection systems.
The performance and accuracy of the LGP was compared to results obtained
by ANN and regression tree methods. Experiments performed over the popular
DARPA IDS data set showed that LGP outperformed decision trees and support

Appendix B

209

vector machines in terms of detection accuracies (except for one class). Decision trees were considered as the second best, especially for the detection of
U2R attacks.
Xu and Ming in the paper Two-layer Markov chain anomaly detection model
in 2005 propose, on the basis of the current single layer Markov chain anomaly
detection model, a new two-layer model. Two distinctly different processes,
the different requests and the system call sequence in the same request section,
are classified as two layers and dealt with by different Markov chains respectively. The two-layer frame can depict the dynamic activity of the protected
process more exactly than the single layer frame, so that the two-layer detection
model can promote the detection rate and degrade the false alarm rate. Furthermore, the detected anomaly will be limited in the corresponding request
sections where anomaly happens. The new detection model is suitable for privileged processes, especially for those based on request-response.
Zhao et al. [172] in 2005 have proposed a misuse detection system and anomaly
detection system that encode an experts knowledge of known patterns of attack
and system vulnerabilities as if-then rules. The normal connection and intruded
connection divided into different clustering sets and then to distinguish them,
then the researchers integrate GA to detect intruded action. The researchers system combines two stages (clustering stage and genetic optimizing stage) into the
process. The GA was successfully applied and learned to a real-world test case.
At almost the same time, Gong et al. [173] chose GA approach to network misuse detection, because robust to noise, no gradient information is required to
find a global optimal or suboptimal solution, self-learning. Kim et al. in 2005
[174] proposed Genetic Algorithm to improve Support Vector Machines based
IDS. They fused GA and SVM in order to improve the overall performance of
IDS. An optimal detection model for SVM classifier was determined. As the
result of the fusion, SVM based IDS did not only select optimal parameters for
SVM but also optimal feature set among the whole feature set.

Appendix B

210

Abraham and Grosan [175] evaluated the performances of two Genetic Programming techniques for IDS, Linear Genetic Programming (LGP) and MultiExpression Programming (MEP) and provided a comprehensive comparison
of obtained results with selected nonevolutionary machine learning techniques
such as Support Vector Machines (SVM) and Decision Trees (DT). Based on
numerical experiments and comparisons, they showed that Genetic Programming techniques outperformed the reference machine learning methods. In
detail, the MEP outperformed LGP for three of the considered classes and
LGP outperform MEP for two of the classes. MEP classification accuracy was
greater than 95% for all considered classes and for three of them is greater
than 99.75%. Moreover, they suggested that for real time intrusion detection
systems, MEP and LGP would be the ideal candidates because of its simple
implementation.

B.3 Taxonomy of Intrusion Detection System


We have made use of a large number of concepts to classify the IDSs. The classification is presented in Figure B.1 with a detailed discussion in this section.
B.3.1 Intrusion detection methods
The basic intrusion detection methods are the two complementary approaches to
detecting intrusions, namely the anomaly detection (behavior-based) approaches
and the knowledge-based approaches (misuse detection). Both methods have
their distinct advantages and disadvantages as well as suitable application areas
of intrusion detection.
Anomaly detection methods

Anomaly detection or the behavior-based detection or Heuristic detection methods use information about repetitive and usual behavior on the systems they
monitor, and this approach identifies events that deviate from expected usage
patterns as malicious. Most anomaly detection approaches attempt to build
some kind of a model over the normal data and then check to see how well
new data fits into that model. In other words, anything that does not correspond

Appendix B

211

Figure B.1: Taxonomy of Intrusion Detection Systems

Appendix B

212

to a previously learned behavior is considered intrusive. Therefore, the intrusion detection system might not miss any attacks, but its accuracy is a difficult
issue, since it can generate a lot of false alarms. Examples of anomaly detection systems are IDES, NIDES, EMERALD and Wisdom and Sense. Anomaly
detection can be either by unsupervised learning techniques or by supervised
learning techniques.
1. Unsupervised learning systems Unsupervised or self-learning systems learn
the normal behavior of the traffic by observing the traffic for an extended
period of time and building some model of the underlying process. Examples include such techniques such as Hidden Markov Model (HMM) and
Artificial Neural Network (ANN). More details are available in the work
of Sundaram [176].
2. Supervised Systems In the programmed systems or the supervised learning method, the system has to be taught to detect certain anomalous events.
The supervised anomaly detection approaches build predictive models provided labeled training data (normal or abnormal users or applications behavior) are available. Thus the user of the system forms an opinion on
what is considered abnormal for the system to signal a security violation.
Advantages of behavior-based approaches
Detects new and unforeseen vulnerabilities.
Less dependent on operating system-specific mechanisms.
Detect abuse of privileges types of attacks that do not actually involve exploiting any security vulnerability.
Disadvantages of behavior-based approaches
The high false alarm rate is generally cited as the main drawback of
behavior-based techniques because:
The entire scope of the behavior of an information system may not be
covered during the learning phase.
Behavior can change over time, introducing the need for periodic online retraining of the behavior profile.

Appendix B

213

The information system can undergo attacks at the same time the intrusion detection system is learning the behavior. As a result, the behavior profile contains intrusive behavior, which is not detected as
anomalous.
It must be noted that very few commercial tools today implement such
an approach, leaving anomaly detection to research systems, even if the
founding paper by Denning [7] recognizes this as a requirement for IDS
systems.
Knowledge-based detection methods

Knowledge-based detection or misuse detection or signature detection methods


use information about known security policy, known vulnerabilities, and known
attacks on the systems they monitor. This approach, compares network activity
or system audit data to a database of known attack signatures or other misuse
indicators, and pattern matches produce alarms of various sorts. All commercial
systems use some form of knowledge-based approach. Thus, the effectiveness
of current commercial IDS is based largely on the validity and expressiveness of
their database of known attacks and misuse, and the efficiency of the matching
engine that is used. It requires frequent updates to keep up with the new stream
of vulnerabilities discovered, this situation being aggravated by the requirement
to represent all possible facets of the attacks as signatures. This leads to an
attack being represented by a number of signatures, at least one for each operating system to which the intrusion detection system has been ported. Examples
of product prototypes are Discovery, IDES, Haystack and Bro. The work of
Gordeev [180] discusses these methods in detail.
B.3.2 Deployment techniques
The effectiveness of the Intrusion Detection System depends on their internal
design and, even more importantly, on their position within the corporate architecture. Generally, IDS can be classified into different categories depending on
their deployment.

Appendix B

214

Host-based monitoring

A host-based IDS is deployed on devices that have other primary functions such
as Web servers, database servers and other host devices. Host logs, comprised
of the combination of audit, system and application logs, offer an easily accessible and non-intrusive source of information on the behavior of a system. In
addition, logs generated by high-level entities can often summarize many lowerlevel events such as a single HTTP application log entry covering many system
calls, in a context-aware fashion. A host-based IDS provides information such
as user authentication, file modifications/deletions and other host-based information, thus designated as secondary protection to devices on the network. Examples of HIDS products are EMERALD, NFR etc.
Advantages of Host-based Intrusion Detection Systems
Although overall host-based IDS is not as robust as network-based IDS, hostbased IDS does offer several advantages over network-based IDS:
More detailed logging:- HIDS can collect much more detail information
regarding exactly what occurs during the course of an attack.
Increased recovery:- Because of the increased granularity of tracking events
in the monitored system, recovery from a successful incident is usually
more complete.
Detects unknown attacks:- Since the attack affects the monitored host,
HIDS detects unknown attacks more than the Network-based IDS.
Fewer false positives:- The way HIDS works provides fewer false alerts
than produced by Network-based IDS.
Disadvantages of Host-Based IDS
Indecipherable information:- Because of network heterogeneity and the
profusion of operating systems, no single host-based IDS can translate all
operating systems, network applications, and file systems. In addition,
in the absence of something like a corporate key, no IDS can decipher
encrypted information.

Appendix B

215

Indirect information:- Rather than monitor activity directly (as do networkbased IDS), host-based IDS usually rely heavily or completely on an audit
record of activity that is created by a system or application. This audit
record varies widely in quality and quantity between different systems and
applications, thus dramatically affecting IDS effectiveness.
Complete coverage:- Host-based IDS are installed on the system being
monitored. On very large networks this can comprise many thousands
of workstations. Providing IDS on this scale is both very expensive and
difficult to manage.
Outsiders:- A host-based IDS can potentially detect an outside intruder
only after the intruder has reached the monitored host system, not before,
as can network-based IDS. To reach a host system, the intruder must have
already bypassed network security measures.
Host interference:- Host-based IDS places such a load on the host CPU as
to interfere with normal host operations. On some systems, just invoking
an audit record sufficient for the IDS can result in unacceptable loading.
Network-based monitoring

The sole function of network-based IDS is to monitor the traffic of that network.
This ensures that the IDS can observe all communication between a network
attacker and the victim system, resolving many of the problems associated with
log monitoring. Typical Network-based IDS are Microsoft Network Monitor,
Cisco Secure IDS (formerly NetRanger), Snort etc.
Advantages of network-based intrusion detection
Ease of deployment:- Passive nature and hence few performance or compatibility issues in the monitored environment.
Cost:- Strategically placed sensors can be used to monitor a large organizational environment where as a host-based IDS requires software on each
monitored host.

Appendix B

216

Range of detection:- The variety of malicious activities able to be detected


through the analysis of network traffic is wider than the variety able to be
detected in host-based IDS.
Forensics integrity:- Since the network-based IDS sensors run on a host
separate from the target, they are more impervious to tampering.
Detects all attempts, even failed ones:- Host-based IDS detects only successful attacks because unsuccessful attacks do not affect the monitored
host directly.
Disadvantages of Network-based IDS
Direct attack susceptibility:- A recently released a study by Secure Networks, Inc. of leading network-based IDS products found that networkbased IDS are susceptible to:
i. Packet spoofing, which tricks the IDS into thinking packets have come
from an incorrect location.
ii.Packet fragmentation attacks that retransmit sequence numbers so that
the IDS sees only what a hacker wants it to see.
Indecipherable packets:- Because of network heterogeneity and the relative profusion of protocols, network-based IDSs often cannot decipher the
packets they capture. In addition, in the absence of something like a corporate key, no IDS can decipher encrypted information.
Failure when loaded:- A recent evaluation of leading network-based commercial products found that products that detect all tested attacks successfully on an empty or moderately utilized network have been found to
start missing at least some attacks when the monitored network is heavily
loaded.
Failure at wire speed:- While network-based IDS can process packets on
low-speed networks (10Mbps), few claim to be able to keep up and miss
no information at 100Mbps or higher.
Complete coverage:- Most sensors are designed to be installed on sharedaccess segments, and can monitor only that traffic running through those

Appendix B

217

segments. To provide coverage, the IDS user must select key shared-access
segments for IDS sensors. Most frequently they place sensors in the demilitarized zone and, in some cases, in front of port and server farms. To
monitor distributed ports, internal attack points, distributed Ethernet connections, and desktops, many sensors must be installed. Even then, elastic
or unauthorized connections such as desktop dial-ins and modems will not
be monitored.
Switched networks:- To make matters worse, switching has replaced shared
/routed networks as the architecture of choice. Switching effectively hides
traffic from shared-access network-based IDS products. Switched networks fragment communication and divide a network into myriad micro
segments that make deploying shared-access IDS prohibitively expensive
since to provide coverage, very many sensors must be deployed. Alternatives could be attaching hubs to switches wherever switched traffic must be
monitored or mirroring selected information such as that moving to specific critical devices, to a sensor for processing. None of these are easy or
ideal solutions.
Insiders:- Network-based IDS focus is on detecting attacks from outside,
rather than attempting to detect insider abuse and violations of local security policy
Host network monitoring

Host Network Monitoring is also called the network-node or the hybrid intrusion detection, this approach is used in personal firewalls and some IDS probe
designs lies in combining network monitoring with host-based probes. By observing data at all levels of the hosts network protocol stack, the ambiguities
of platform-specific traffic handling and the problems associated with cryptographic protocols can be resolved. The data and event streams observed by
the probe are those observed by the system itself. This approach offers advantages and disadvantages similar to both alternatives listed above. It resolves
many of the problems associated with promiscuous network monitoring, while
maintaining the ability to observe the entire communication between victim
and attacker. Like all host-based approaches, however, this approach implies a

Appendix B

218

performance impact on every monitored system, requires additional support to


correlate events on multiple hosts, and is subject to subversion when the host is
compromised. Sometimes this hybrid intrusion detection system is considered
as a subtype of network-based intrusion detection system because it relies primarily upon the network traffic analysis for detection. Example ot hybrid IDS
is prelude.
Target-based monitoring

An attempt to resolve the ambiguities inherent in protecting multiple platforms


lies in combining network knowledge with traffic reconstruction. These targetbased ID systems typically use scanning techniques to form an image of what
systems exist in the protected network, including such details as host operating
system, active services, and possible vulnerabilities. Using this knowledge, a
probe can reconstruct network traffic in the same fashion, as would be the case
on the receiver system, preventing attackers from injecting or obscuring attacks.
In addition, this approach allows an IDS to automatically differentiate attacks
that are a threat to the targeted system, from those that target vulnerabilities not
present - thus refining generated alerts. Whether attacks that cannot succeed
should be reported is something of a contentious issue - offering a trade-off between lower security alerts being generated, versus the possibility of recognizing novel attacks when combined with known sequences. In addition, the need
to maintain an accurate map of the protected network - including valid points of
vulnerability - may reduce the ability of this class of system to recognize novel
attacks.
B.3.3 Information source
The information that an IDS product can access is determined by where it is
deployed. Network-based IDS always capture and analyze network packets,
while host-based IDS products potentially have many information sources on
the hosts where they are installed. The IDS classification based on the data
source is listed below:

Appendix B

219

Network packets

The IDS includes a network-based sensor designed to capture and process network packets and decipher at least one network protocol (e.g. TCP/IP).
Audit trial

The IDS includes a host-based agent designed to process the audit record of at
least one specific operating system (e.g., Solaris, Ultrix, Unicos).
B.3.4 Architecture
The IDS should provide a distributed capability, since this component of scalability is vital for effective deployment of IDS in the vast majority of corporate
networks. A distributed capability means that both a central manager or managers and local collection/ processing agents placed as needed throughout the
monitored network provide the IDS functionality. However, some products are
available in both a local and distributed versions.
Monolithic systems

The simplest model of IDS is a single application, containing probe, monitor,


resolver and controller all in one called the monolithic or the centralised system.
This focuses on a specific host or system - with no correlation of actions that
cross system boundaries. Such systems are conceptually simple, and relatively
easy to implement. Their major weakness lies in the ability for an attack to be
implemented using a sequence of individually innocuous steps. The alerts generated by such systems may in fact be aggregated centrally - but this architecture
offers no synergy between IDS instances.
Hierarchic systems

If one considers the alerts generated by an IDS instance to be events in themselves, suitable for feeding into a higher-level IDS structure, an intrusion detection hierarchy results. At the root of the hierarchy, lie a resolver unit and
controller. Below this lie one or more monitor components, with subsidiary

Appendix B

220

probes distributed across the protected systems. Effectively, the whole hierarchy forms a macro-scale IDS. The use of a centralized controller unit allows
information from different subsystems to be correlated, potentially identifying
transitive or distributed attacks. For example, a simple address range probe,
while difficult to detect using a network of monolithic host IDS instances, can
be trivial to observe when correlating connections using a hierarchic structure.
Agent-based systems

A more recent model of IDS architecture divides the system into distinct functional units: probes, monitors, resolver and controller units. These may be
distributed across multiple systems, with each component receiving input from
a series of subsidiaries, and reporting to one or more higher-level components.
Probes report to monitors, which may report to resolver units or higher-level
monitors, and so forth. This architecture, implemented in systems such as
EMERALD, allows great flexibility in the placement and application of individual components. In addition, this architecture offers greater survivability in
the face of overload or attack, high extensibility, and multiple levels of reporting
throughout the structure. FIRE is a product prototype that uses agent-based approach of intrusion detection. The paper, which discusses this method in detail,
is [186].
Distributed systems

All the IDS architectural models described so far consider attacks in terms of
events on individual systems. A recent development, typified by the GrIDS
system, lies in regarding the whole system as a unit. Attacks are modeled as
interconnection patterns between systems, with each link representing network
activity. The graphs that form can be viewed at different scales, ranging from
small systems to the interconnection between large and complex systems (where
sub-networks are collapsed into points). This novel approach promises high
scalability and the potential to recognize widely distributed attack patterns such
as worm behavior. Also this architecture is implemented in DIDS.

Appendix B

221

B.3.5 Analysis frequency


The classification depending on execution frequency or periodicity is based on
how often an IDS analyzes data from its information sources. Most commercial
IDS claim real-time processing capability, and a few provide the capability for
batch processing of historical data.
Dynamic execution

IDS is designed to perform concurrent and continuous automated processing


and analysis implying real-time operation or on-the-fly processing. IDS deployable in real-time environments are designed for online monitoring and analyzing
system events and user actions.
Static execution

IDS is designed to perform periodic processing and analysis implying batch or


other sporadic operation. This may serve effective for the low intensity probes
the attacker makes to hide his presence by spreading his attack over a very long
duration period with appreciable gap between the consecutive attacks. Audit
trail analysis is the prevalent method used by periodically operated systems.
B.3.6 Response
The behavior on detection of an attack describes the response of the IDS. An
IDS may respond to an identified attack, misuse, or anomalous activity in the
following three ways:
Passive

In passive response, the IDS simply generates alarms to inform responsible personnel of an event by way of console messages, email, paging, and report updates. Passive or indirect gathering of information aids in identifying the source
of attack using techniques such as DNS lookups, passive fingerprinting etc.

Appendix B

222

Reactive

It is an active response to critical events, where it takes corrective action that


stops the attacker from gaining further access to resources, thus mitigating the
effects of the attack. These responses are executed after the attack has been
detected by the IDS. Reactive responses change the surrounding system environment, either in the host on which the IDS resides or outside in the surrounding network. For example the IDS reconfigures another system such as firewall
or router to block out the attacker, or uses TCP reset frames to tear down any
connection attempts, correcting a system vulnerability, logging off a user, selectively increasing monitoring, or disconnecting a port as specified by the user.
The main goal of these responses is to stop the attacker from gaining further
access to resources, thus mitigating the effects of the attack.
Proactive

This is an active response to critical events, where it takes proactive action by


intervening and actively stopping an attack from taking place. The only difference between proactive and reactive responses is when they are executed. A
proactive response could be to drop a network packet before it has reached its
destination, thereby intervening and stopping the actual attack. A reactive response would have been able to terminate the ongoing connection, but it would
not have stopped the packet that triggered the IDS from reaching its destination.
A more exhaustive taxonomy of intrusion detection systems is available in the
work of Sabahi [187].

B.4 Latest Intrusion Detection softwares


Along with the intrusion detection systems that have made significant contributions to the ongoing research in the field and mentioned above, there are a
few other products that deserve a special discussion. Most of the currently used
Open-Source and free Software Packages, Commercial Software Packages and
Academic Software Packages are included below:
AirCERT Automated Incident Reporting (AirCERT) is a scalable distributed

Appendix B

223

system for sharing security event data among administrative domains. Using AirCERT, organizations can exchange security data ranging from raw
alerts generated automatically by network intrusion detection systems (and
related sensor technology), to incident reports based on the assessments of
human analysts.
ISS Real Secure This IDS works satisfactorily at Gigabit speed. The high
speed is possible by IDS integrated into the switch or by using a specific
port called the span port, which mirrors all the traffic on the switch. The
Blackice technology of this sensor includes protocol analysis and anomaly
detection combined with the Real Secures library of signature-based detection capabilities.
Real Secure Server Sensor It is a hybrid IDS, which resides on one host and
still monitors the network traffic and detects attacks in the network layer
of the protocol stack. However the sensor also detects attacks at higher
layers and therefore it can detect attacks hidden in encrypted sessions such
as IP sec or SSL encryptions. The sensors can also monitor application
and operating system logs.
Snort Snort is an Open Source Network Intrusion Detection Systems that
keeps track of intrusion attempts, signs of possible bad behavior or hacking exploits. It is capable of performing real-time traffic analysis and
packet logging on IP networks. It can perform protocol analysis, content searching/matching and can be used to detect a variety of attacks and
probes, such as buffer overflows, stealth port scans, CGI attacks, SMB
probes, OS fingerprinting attempts, and much more. It is non-intrusive,
easily configured, utilizes familiar methods for rule development, currently
includes the ability to detect more than 1200 potential vulnerabilities.
Sourcefire Founded by the creators of Snort, the most widely deployed Intrusion Detection technology worldwide, Sourcefire has been recognized
throughout the industry for enabling customers to quickly and effectively
address security risks. Today, Sourcefire is redefining the network security industry by combining enhanced Snort with sophisticated proprietary
technologies to offer the first ever unified security monitoring infrastructure, delivering all of the capabilities needed to proactively identify threats

Appendix B

224

and defend against intruders.


Shadow Shadow is an Intrusion Detection system developed on inexpensive
PC hardware running Open Source, public domain, or freely available software. A SHADOW system consists of at least two pieces: a sensor located
at a point near an organizations firewall, and an analyzer inside the firewall. Shadow performs traffic analysis; the sensor collects packet headers
from all IP packets that it sees; the analyzer examines the collected data
and displays user defined interesting events on a web page.
Entercept Entercept is a HIDS that prevents and detects attacks; uses a combination of signatures and behavioral rules; safeguards the server, applications and resources from known and unknown worms and buffer-overflow
attacks; reduces false positives and protects customer data.
McAfee Desktop Firewall It is an HIDS which provides firewall protection
and intrusion detection for desktop; guards against threats from internal
and internal intruders, malicious code and silent attacks.
OKENA StormWatch It is an HIDS that intercepts all system calls to file, network, COM and registry resources and correlates behaviors of such system
requests to make real time allow or deny decisions; supports XP, Win2K
and UNIX systems; scalable to 5000 intelligent agents manageable from
one console.
Symantec Host IDS It is a HIDS that detects unauthorized and malicious
activity like access to critical files and bad logins, alerts administrators and
takes precautionary action to prevent information theft or loss, without any
overhead to the deployed monitoring machine. It has the advantage that it
supports all the popular operating systems.
SMART Watch It is a HIDS that performs file-change detection; provides a
restoration tool that reacts in near-real time without polling.
GFI LANguard It is a HIDS that monitors the security event logs of all Windows XP, Windows 2000, and Windows NT servers and workstations on
your network; alerts administrators in real time about possible intrusions
and attacks.

Appendix B

225

NetRanger NetRanger is a network based IDS that monitors network traffic


with special hardware devices that can be integrated into Cisco routers and
switches or act as stand-alone boxes. In addition to network packets, router
log files can also be used as additional source of information. The system
consists of Sensors, centralized data processing units called Directors and
a proprietary communication subsystem called Post Office. NetRanger is
integrated into Cisco Secure Intrusion Detection System.
Network Flight Recorder NFR is a network based ID system that uses filters
for misuse detection. The NFR did not start as an IDS but provides an
architecture to monitor and filter network packets, log results, perform statistical evaluation and initiate alarms when certain conditions are met and
therefore can be used to detect intrusions as well. The NFR is designed
to provide such post-mortem analysis capability for networks when malicious activities have happened. This can be used to shorten the lifetime of
new attacks by quickly adding their signature to the detection unit. Additionally, the system also performs statistic gathering and provides information about usage growth of applications or traffic peaks of certain protocol types. The architecture is built in a modular fashion with interfaces
between the main components to easily add new subsystems. The NFR
Securitys intelligent intrusion management system, not only detect and
deter network attacks, but also integrates with popular firewall providers
to prevent future attacks.
Fuzzy Intrusion Recognition Engine FIRE is a network intrusion detection
system that uses fuzzy systems to assess malicious activity against computer networks. The system uses an agent-based approach to separate
monitoring tasks. Individual agents perform their own fuzzy process of input data sources. All agents communicate with a fuzzy evaluation engine
that combines the results of individual agents using fuzzy rules to produce
alerts that are true to certain degree. The results show that fuzzy systems
can easily identify port scanning and denial of service attacks. The system can be effective at detecting some types of backdoor and Trojan Horse
attacks. The paper [191] gives more details on this product.
Intelligent intrusion detection system IIDS is being developed to demonstrate

Appendix B

226

the effectiveness of data mining techniques that utilize fuzzy logic. This
system combines two distinct intrusion detection approaches: Anomaly
based intrusion detection using fuzzy data mining techniques, and Misuse detection using traditional rule-based expert system techniques. The
anomaly-based components look for deviations from stored patterns of
normal behavior. The misuse detection components look for previously
described patterns of behavior that are likely to indicate an intrusion. Both
network traffic and system audit data are used as inputs. The paper, which
describes this prototype, is [193].
DERBI DERBI is a computer security tool that targets at diagnosing and
recovering from network-based break-ins. The technology adopted has the
ability to handle multiple methods (often with different costs) of obtaining
desired information, and the ability to work around missing information.
The prototype will not be an independent program, but will invoke and
coordinate a suite of third-party computer security programs (COTS or
public) and utility programs.
MINDS MINDS (Minnesota Intrusion Detection System) project is developing a suite of data mining techniques to automatically detect attacks against
computer networks and systems. It uses an unsupervised anomaly detection system that assigns a score to each network connection that reflects
how anomalous that connection is.
NetSTAT NetSTAT is a tool aimed at real-time network-based intrusion detection. The NetSTAT approach extends the state transition analysis technique (STAT) to network-based intrusion detection in order to represent
attack scenarios in a networked environment. Net-STAT is oriented towards the detection of attacks in complex networks composed of several
subnetworks.
BlackICE The BlackICE IDS scans network traffic for hostile signatures in
much the same way that virus scanners examine files for virus signatures.
BlackICE runs at 148,0 00 packets per second, checks all 7 layers of the
stack and rates each attack on a scale of 1 to 100 so that only attacks
it considers serious are alerted. There are two versions: desktop agent

Appendix B

227

(BlackICE Defender) and network agent (BlackICE Sentry) . The desktopagent runs on Win95/WinNT desktop. The network agent runs just like any
other sniffer-type IDS.
Cylops
Snort-based Cyclops IDS provides advanced and flexible intrusion detection at
Gigabit speeds and secures networks by performing high-speed packet analysis
to detect malicious activities in real-time and automatically launch preventive
measures before security can be compromised.
Dragon Sensor
Dragon sensor detects suspicious activity with both signature based and anomaly
based techniques. Its library of attacks detects thousands of potential network
attacks and probes, and also hundreds of successful system compromises and
backdoors.
E-Trust
eTrust intrusion detection delivers state-of-the-art network protection including
DDoS attacks. All incoming and outgoing traffic is checked against a categorized list of web sites to ensure compliance. It is then checked for content,
malicious codes and viruses and notify the administrator of offending payloads.
Manhunt
Symantec ManHunt provides high-speed, network intrusion detection, real-time
analysis and correlation, and proactive prevention and response to protect enterprize networks against internal and external intrusions and denial-of-service
attacks. The ability to detect unknown threats using protocol anomaly detection, helps in eliminating network exposure and the vulnerability inherent in
signature-based intrusion detection systems. Symantec ManHunt traffic rate
monitoring capability allows for detection of stealth scans and denial-of-service
attacks that can cripple even the most sophisticated networks.
NetDetector

Appendix B

228

NetDetector is a network surveillance system for IP networks that provides nonintrusive, continuous traffic recording and real-time traffic analysis. NetDetector records network traffic, analyzes every packet, detects the activities of intruders, sets alarms for real-time alerting, and gathers evidence for post-event
analysis.

B.5 Review of the data processing techniques used in IDS


It is clear from the discussions on IDSs in the above sections that various processing methods are employed on the network traffic for detection by different
IDSs. A brief review of the various systems is presented in this section.
Anomaly detection methods
1. Statistical analysis In statistical analysis approach the user or system behavior, which are a set of attributes, is measured over a period of time, by
a number of variables such as user login, logout, number of files accessed
in a period of time, usage of disk space, memory, CPU etc. The system
stores mean values for each variable and gives an alert when it exceeds
that of a predefined threshold. A sophisticated model of user behavior has
been developed using short- and long-term user profiles. These profiles are
regularly updated to keep up with the changes in user behaviors. Statistical methods are often used in implementations of normal user behavior
profile-based Intrusion Detection Systems. IDES, NIDES, EMERALD,
SECURENET, and SPADE use the statistical analysis approach.
2. Artificial Neural Networks Artificial Neural networks use their learning
algorithms to learn about the relationship between input and output vectors
and to generalize them to extract new input/output relationships. With
the neural network approach to intrusion detection, the main purpose is
to learn the behavior of actors in the system (e.g., users, daemons). The
advantage of using neural networks over statistics is due to the simple way
of expressing nonlinear relationships between variables, and in learning
about relationships automatically.
3. User intention identification This technique models normal behavior of
users by the set of high-level tasks they have to perform on the system in

Appendix B

229

relation to the users functions. These tasks are taken as series of actions,
which in turn are matched to the appropriate audit data. The analyzer keeps
a set of tasks that are acceptable for each user. Whenever a mismatch is
encountered, an alarm is produced. SECURENET uses this technique for
intrusion detection.
4. Computer immunology Analogies with immunology has lead to the development of a technique that constructs a model of normal behavior of UNIX
network services, rather than that of individual users. This model consists
of short sequences of system calls made by the processes. Attacks that exploit flaws in the application code are very likely to take unusual execution
paths. First, a set of reference audit data is collected which represents the
appropriate behavior of services, then the knowledge base is added with all
the known good sequences of system calls. These patterns are then used
for continuous monitoring of system calls to check whether the sequence
generated is listed in the knowledge base; if not an alarm is generated.
This technique has a potentially very low false alarm rate provided that the
knowledge base is fairly complete. Its drawback is the inability to detect
errors in the configuration of network services. Whenever an attacker uses
legitimate actions on the system to gain unauthorized access, no alarm is
generated.
5. Machine learning This is an artificial intelligence technique that stores the
user-input stream of commands in a vectorial form and is used as a reference of normal user behavior profile. Profiles are then grouped in a library
of user commands having certain common characteristics.
6. Data mining Data mining generally refers to a set of techniques that use
the process of extracting previously unknown but potentially useful data
from large stores of data. Data mining method excels at processing large
system logs of audit data. However they are less useful for stream analysis
of network traffic. One of the fundamental data mining techniques used in
intrusion detection is associated with decision trees. Decision tree models
allow one to detect anomalies in large databases. Another technique refers
to segmentation, allowing extraction of patterns of unknown attacks. This
is done by matching patterns extracted from a simple audit set with those

Appendix B

230

referred to warehoused unknown attacks. A typical data mining technique


is associated with finding association rules. With data mining it is easy to
correlate data related to alarms with mined audit data, thereby considerably
reducing the rate of false alarms . Examples include ADAM (Anomaly
Data Analysis and Mining), IDDM and MINDS.
Misuse detection methods
1. Expert system Expert systems work with the previously defined set of rules
describing an attack. All security related events incorporated in an audit
trail are translated in terms of if-then-else rules. Examples are IDES, Wisdom & Sense and ComputerWatch.
2. Signature analysis Signature analysis detects attacks by capturing features
of attack in the audit trail. Thus, attack signatures can be found in logs
as a sequence of audit events that a given attack generates or input data
streams or patterns of searchable data that are captured in the audit trail.
This method uses abstract equivalents of audit trail data. Detection is accomplished by using common text string matching mechanisms. Examples
are Real Secure, Haystack, NetRanger, and Emerald.
3. State-transition analysis In State-transition analysis, an attack gets represented on state-transition diagrams whereby a set of transitions is identified to be completed by an intruder to compromise a system. Examples are
USTAT and NetSTAT.
4. Colored Petri Nets The Colored Petri Nets approach is often used to generalize attacks from expert knowledge bases and to represent attacks graphically. With this technique, it is easy for system administrators to add new
signatures to the system. However, matching a complex signature to the
audit trail data may be time-consuming and hence not used in commercial
systems. Purdue University IDIOT system uses Colored Petri Nets.
5. Data Mining Data Mining is the non-trivial process of identifying valid
and novel attack patterns in the network traffic. Examples are the Mining
Audit Data for Automated Models for Intrusion detection (MADAM ID),
and JAM.

Appendix B

231

B.6 Current Intrusion Detection research


In any network environment, firewall takes the role of protection while detection is handled by the IDS. The IDS can be used to assess the effectiveness of
the firewall rule sets and policies. While the role of reaction has traditionally
been assumed by the system or network manager, an IDS that can operate online
and in real time can also be programmed to behave either reactively or proactively. A reactive IDS would respond to the detection of an intrusion by, say,
stopping the suspect process, disconnecting the doubting user or modifying a
router access control list. A proactive IDS instead will take pre-emptive countermeasures, like, actively interrogating all extant user processes and stopping
all processes which did not originate from bona fide users at approved sites.
Thus, the proactive IDS which can also be called the Intrusion Prevention System combines the functionalities of a firewall, which has blocking capabilities
depending on where the packet came from, and also an IDS with the deep packet
inspection.
B.6.1 Intrusion Prevention System
Intrusion Prevention Systems (IPS) actively search a computer or network of
computers for security flaws and alerts administrator about the security problems before those problems are exploited by an attacker. However, as new attack methods are discovered, they must be updated with the information about
the attacks. COPS (Computer Oracle and Password System) is an example of
the intrusion prevention system. It is a collection of shell scripts which check
for a variety of security flaws, including checking whether the files have the
correct permissions and scanning the system for any files with the setuid bit set.
Present day attacks spread at tremendous speed. These fast moving attacks
can infiltrate a network before conventional tools such as anti-virus software
have time to formulate a signature to prevent infection. IPS, with their behavioral analysis and speed, operate fast enough to detect such attacks without
performance degradation. Thus an IPS can be properly configured to prevent
intrusions and also worm or virus attacks. The main problem identified with
the Intrusion Prevention System is the critical need to minimize false positives,

Appendix B

232

failing which a legitimate user may be disconnected or the unnecessary shut


down of a network service. Hence the greatest challenge for IPS is to allow
legitimate traffic while blocking attacks and do this without adversely affecting
performance.
Types of IPS

Host-based Intrusion Prevention System (HIPS) OKENAs StormWatch


uses a kernel-based approach and works on servers and workstations. It has
four Interceptors: Network interceptor provides address and port blocking
like a firewall; File system and Configuration interceptors monitor and
prevent changes to critical files or registry keys and the Network and File
system interceptors provide worm prevention.
By correlating events from multiple systems at the management station,
Storm-Watch not only blocks the threat but also pushes out a new policy
to all agents and blocks future attacks. This reduces the number of false
positives and false negatives.
Network-based Intrusion Prevention System (NIPS)
NIPS uses different detection methods, stateful signature detection, protocol anomaly detection and some proprietary methods to block specific
attacks. The stateful signature detection looks at the relevant portions of
the traffic, where the attack can be perpetrated. It does this by tracking
state and based on the context specified by the user detects an attack.
IPS adds to the defense in depth approach to security and is an evolution of
IDS technology. Its proactive capabilities will help to keep our networks safer
from more sophisticated attacks. Even though NIPS will prevent attacks, if
some thing slips through HIPS would prevent them. HIPS, being the last line of
defense provides operating system hardening with greater granularity and application specific control.
Snort Inline is a mode of operation for Snort providing it with the intrusion
prevention capabilities. It uses the Netfilters and IPtables software to provide
detection at the application layer to the IPtables firewall so that it can respond
dynamically to real time attacks that take advantage of vulnerabilities at the

Appendix B

233

application level. It is the Netfilter/IPtables software that allows for the implementation of the response mechanism while Snort Inline provides policies
based on which IPtables makes the decision to allow or to deny packets. After
an incoming packet to the network is provided by IPtables, Snort performs the
rule matching against the packet. Thus Snort Inline provides a more proactive
and dynamic capability against todays attacks. However, the rule matching is
against a statically created rule base and thus needs prior estimate of the kinds
of attacks that will be seen and the action is taken at the site of detection.
McAfee Internet Security Suite (ISS) has been developed for the Windows operating system platform that integrates many security technologies to protect
desktop computers from malicious code, spam and unwanted or unauthorized
access. Thus it functions both as an antivirus as well as a firewall. The antivirus subsystem allows for the detection of viruses, worms, and other types of
malicious code by using a signature-base approach along with a heuristic engine for unknown attacks. The firewall component scans multiple points of data
entry. McAfee IntruShield IPS is a Network Prevention product for encrypted
attacks, botnets, and VoIP vulnerability based attacks. It delivers unique forensic features to analyze key characteristics of known and zero-day threats and
intrusions.

B.7 Intrusion detection using multi-sensor fusion


The motivation for applying sensor fusion in enhancing the performance of intrusion detection systems is that a better analysis of existing data gathered by
various individual IDSs can detect many attacks that currently go undetected.
This specifies the broadest solution of advanced intrusion detection technology
to provide ubiquitous coverage through individual IDSs observing the same network traffic. The essential components are:
Intrusion detection systems that performs real-time monitoring of network
packets
A fusion unit that aggregates the decisions generated by the IDSs
When the individual IDS performance is suboptimal, distributed decision making systems and the subsequent fusion of the decisions create a variety of new

Appendix B

234

circumstances that may exacerbate or ameliorate the problems. The IDS fusion
offers the following advantages over a single IDS:
Analytically proved [220] higher system detection rates and lower system
false alarm rates than those of a single IDS or a weighted average.
Error probabilities of fusion system are significantly reduced and approach
zero[220].
As Axelsson highlights it in [221], In reality there are many different types
of intrusions, and different detectors are needed to detect them. The same
argument is made by Lee et al. [222] and additionally they mention that, Combining evidence from multiple base classifiers ... is likely to improve the effectiveness in detecting intrusions. As such, analyzing the data from multiple
sensors should increase the accuracy of the IDS [222]. Kumar [15] observes
that, Correlation of information from different sources has allowed additional
information to be inferred that may be difficult to obtain directly. Such correlation is also useful in assessing the severity of other threats, be it severe because
an attacker is making a concerted effort to break in to a particular host, or severe
because the source of the activity is a worm with the potential to infect a large
number of hosts in a short amount of time.
Multisensor correlation has long been a theme in Intrusion Detection, especially
as most of the early IDS work took place in the wake of the Morris Worm, as
well as by the need to centrally manage the alerts from a network of host based
IDSs. Recently, a great deal of work has been done to standardize the protocols
that IDS components use to communicate with each other. The first solid protocol to do this is the Common Intrusion Detection Format (CIDF) [223]. CIDF
spurred additional work in protocols for multisensor correlation, for example,
Ning et al. [224] extended CIDF with a query mechanism to allow IDSs to
query their peers to obtain more information on currently observed suspicious
activity.
B.7.1 Existing fusion IDSs
Some of the IDS that make use of multisensor correlation and various fusion
techniques are covered in this section.

Appendix B

235

Research IDSs

The first couple of IDSs that performed data fusion and cross sensor correlation were the Information Security Officers Assistant (ISOA) [225] and the
Distributed Intrusion Detection System (DIDS)[226]. ISOA used the audit information from numerous hosts whereas DIDS used the audit information from
numerous host and network-based IDSs. Both made use of a rule-based expert
system to perform the centralized analysis. The primary difference between the
two was that ISOA was more focused on anomaly detection and DIDS on misuse detection. Additional features of note were that ISOA provided a suite of
statistical analysis tools that could be employed either by the expert system or
a human analyst, and the DIDS expert system featured a limited learning capability.
EMERALD was an extension to NIDES [7, 227] with a hierarchical analysis
system. The various levels (host, network, enterprize, etc) would each perform
some level of analysis and pass any interesting results up the chain for correlation [228, 229, 230]. It provided a feedback system such that the higher levels
could request more information for a given activity. Of particular interest is
the analysis done at the top level which monitored the system for network-wide
threats such as Internet worm-like attacks, attacks repeated against common
network services across domains, or coordinated attacks from multiple domains
against a single domain, [228]. The EMERALD architects employed numerous
approaches such as statistical analysis, an expert system and modular analysis
engines as they believed, no one paradigm can cover all types of threats.
Commercial IDSs

RealSecure Siteprotector does advanced data correlation and analysis by interoperating with other Realsecure products [231]. Symantec ManHunt [232]
and nSecure nPatrol [233] integrate the means to collect alarms. Cisco IDS
[234] and Network Flight Recorder (NFR) [235] provide a means to do centralized sensor configuration and alarm collection. The problem with all of these
systems is that they are designed more for prioritizing what conventional intrusion (misuse) detection systems already detect, and not for finding new threats.
Other products, such as Computer Associates eTrust Intrusion Detection Log

Appendix B

236

View [236], and NetSecure Log [237] are more focused on capturing log information to a database, and doing basic analysis on it. Such an approach seems
to be more oriented towards insuring the integrity of the audit trail (itself an
important activity in an enterprize environment), than data correlation and analysis.
B.7.2 Current status of applying sensor fusion in IDS
Despite the proved utility of multiple classifier systems, no general answer
to the original question about the possibility of exploiting the strengths while
avoiding the weaknesses of different IDS designs has yet emerged. Many fundamental issues are a matter of ongoing research in different research communities. The results achieved during the past few years are also spread over different
research communities, and this makes it difficult to exchange such results and
promote their cross-fertilization.

B.8 Conclusion
Intrusion Detection System is currently gaining considerable interest from both
research community and commercial companies. It has become an indispensable and integral component of any comprehensive enterprize security program.
The reason being that the intrusion detection system has the potential to alleviate many of the problems facing current network security. A number of the
techniques and solutions found in current systems and literature are outlined in
this work. As evidenced by recent events, however, network security has some
way to go before any network can be considered safe and hence its near-term
future is very promising. It is clear though, that under the pressures of a highly
competitive global research environment, the field of Intrusion Detection System will re-mould rapidly and overcome many current limitations and hurdles.

Appendix C
Modeling of the Internet Attacks and the
Countermeasure for Detection
Success is the ability to go from one failure to another with no loss of enthusiasm.
Winston Churchill

C.1 Introduction
This appendix introduces dynamic models for the attack-detector interactions
with the simple Nicholson-Bailey precursor, in which the detector is randomly
searching for the attack on the network traffic independent of the attack distribution. The dependence between the detectors and their heterogeneity is introduced as a subsequent step. The heterogeneity is incorporated by the use of negative binomial distribution as introduced in chapter 2, which also accounts for
the non-randomness in the attacks and the detectors. The attack-detector models that incorporate the attack carrying capacity, detector improvement with the
attacks detected, detector correlation and the non-randomness of attacks and detectors have been derived in this appendix. The proposed modeling idea is new
and the related works other than Shimeall and Williams [54] and Browne et al.
[55] are discussed here.
Ravishankar Iyer et al.[238] combine an analysis of data on security vulnerabilities and a focused source-code examination to develop a Finite State Machine

237

Appendix C

238

(FSM) model to depict and reason about security vulnerabilities and also to extract characteristics shared by a large class of commonly seen vulnerabilities.
This information is used to devise a generic, randomization-based technique
for protecting against a wide range of security attacks. Jonsson and Olovsson
[239, 240] try to quantitatively model the security intrusion process based on
attacker behavior. This model presents the phases in performing attacks on a
system in the presence of detector system. They discuss the three phases in the
security intrusion process namely the learning phase, the standard attack phase
and the innovative attack phase.
Ed Skoudis [241] in his book Counter Hack: A step-by-step guide to computer attacks and effective defenses presents a model of an attack using five
phases: reconnaissance, scanning, gaining access, maintaining access and covering tracks. McDermott [242] mentions that most of the quantitative models of security or survivability have been defined on a range of probable intruder behavior. This measures survivability as a statistic such as mean time to
breach. This kind of purely stochastic quantification is not suitable for highconsequence systems. Detailed aspects of the intruders attack potential can
have significant impact on the expected survivability of an approach.
This section also surveys the different research efforts related to the field of
intrusion correlation. IBM has developed a prototype called the aggregation
and correlation component (ACC) [243]. The purpose of the aggregation and
correlation algorithm is to form groups of related alerts using a small number
of relationships. M2D2 uses a formal data model to include external information in the alert correlation process [244]. Four different information types are
handled: information about the monitored system, information about known
vulnerabilities, information about security tools (vulnerability scanners and intrusion detection systems), and information generated by the security tools, e.g.
scans and alerts. A relational database is used to store information from IDS
and scanners, together with product information from the ICAT vulnerability
database [245].
SRI has introduced a probabilistic approach to alert correlation [246, 247, 248].

Appendix C

239

To be able to handle heterogeneous alerts, a generic alert template is used. The


correlation is then performed hierarchically. Threads are used to correlate alerts
relating to the same incident on the same sensor. Security incidents are then
composed of the same incidents correlated over several sensors. It is then possible to create correlated attack reports by correlating over several alert classes.
A similar approach to the one developed at SRI has been chosen by MIT Lincoln Laboratory [249]. To perform correlation, alerts are partitioned into five attack categories called discovery, scan, escalation, denial-of-service, and stealth.
New alerts are possibly added to existing intrusion scenarios after the evaluation of the probability that one attack category is followed by another, the time
difference between alerts, and the proximity of source IP addresses.
One of the sources that significantly supported numerous concepts developed
in this work is the VulDa, a database of collected attacks and vulnerabilities
from the IBM site. VulDa provides the necessary and profound knowledge of
practical security issues. This database is used for categorizing a large number
of attacks, which yielded results that were highly valuable to the IDS analysis
approach developed in this work. It categorizes more than 350 attacks and analyzes the IDS scopes. It collects information from security-relevant material
like Bugtraq, CERT CC, SANS and NIAP.
The rest of the appendix is organized as follows. Section C.2 models the attackdetector relationship taking into account the various possibilities of interaction
between the two groups of population. Finally, section C.3 summarizes the
developed model.

C.2 Nicholson-Bailey model


An assumption made in the initial phase while working with the model trying to
define the attack-detector population dynamics and in addition to the assumptions introduced in section 2.5 is given as follows:
Detectors detect attacks randomly and cause the attack to be ineffective or
unsuccessful, at a rate proportional to the detector density. (As an initial
assumption, it is reasonable to consider the case with no prior knowledge

Appendix C

240

on the probable attacks that happen on the Internet. Hence, it is reasonable


to consider that the detectors search randomly for attacks, and more the
number of detectors, more is the chance of attacks becoming ineffective or
unsuccessful.)
The logistic model simulates the effect of the limiting resources on the growth
of the two interacting populations. Such a model can be used to incorporate the
effect of attacks for a detector, or the detector effect for an attack. The trivial
growth rate of both the attack and the detector can be given by the functions
defined by Nicholson-Bailey. Preliminary investigations have been carried out
using the Nicholson-Bailey model [?, ?] to explain the attack-detector growth
rate. This model takes care of the first four of the assumptions given in section
2.5.
Let At and Dt denote the number of attacks and detectors at any time t. Let d
denote the detection efficiency constant of the IDS on the attack and a denote
the attack increase rate ignoring detection. The number of encounters between
the detector system and the attack is given by:
Ne = dAt Dt (since d =

Ne
At ;

when Dt = 1).

To distribute encounters among attacks, assumption is made that an IDS searches


at random and thus would re-encounter attacks previously detected. Taking into
consideration the stochastic nature of the network traffic, it is easy to assume
that the detector does a random search. Poissons model can hence be used to
distribute encounters among attacks as:
P (X) = exp(G)

GX
x
X!;

where P (X) is the probability of X occurrences and Gx is the average occurrence of X. Solving for zero detection, i.e. probability of not being detected
(proportion of attacks undetected) is given by P (0) = exp(Gx ).
Setting Gx =

Ne
At

e
gives P (0) = exp( N
At ).

Appendix C

241

2.5

x 10

ATTACKDETECTOR RELATIONSHIP
A(t)
D(t)

A(t) and D(t)

1.5

0.5

Time

Figure C.1: Attack-Detector relationship using the Nicholson-Bailey model


e
Since N
At defines the detection efficiency constant, multiplying it with the number of detections Dt gives the probability of an attack escaping detection by Dt
detectors. P (0) = exp(dDt ). dDt is the effect of detector density Dt or in
other words the mean or the Poisson rate. Then the probability of being detected
is P (X > 0) = 1 exp(dDt ). Hence the attack and the detector growth can
be given by:

At+1 = aAt exp(dDt )

(C.1)

Dt+1 = At [1 exp(dDt )]

(C.2)

and
respectively. Figure C.1 shows the attack-detector relationship using the NicholsonBailey model with typical values and initial conditions as a = 0.25, A(1) =
20000, d = 0.9, D(1) = 1 and t varying from 1 to 5. This basic NicholsonBailey model showing the detector performance over the years was in agreement
with the figure of merit of IDSs over the years from 1995 to 2004 as shown in
figure 2.6. The decrease in the growth rate of attacks over the years as seen in
the Figure C.1 depends on the number of detectors initially deployed and also
on the efficiency of the deployed detectors. The actual attacks that happen on
the Internet is expected to be more than what gets reported. This is because
some of the attacks may not be detected and then it is only a small portion of
the detected attacks that gets reported.

Appendix C

242

With the available experimental data to test the model and also with the practical data in the work of Shimeall et al. [54], it is seen that the Nicholson-Bailey
model is in good agreement. It is seen to provide a quantitative possibility of oscillations in the attack-detector interactions. The attack-detector interactions are
often characterized by very strong fluctuations from year-to-year, and then the
complete extinction of either the attack or the detector. Moreover, during certain
time span, detector levels become so low that the model predicts the eventual
extinction of the detector, when the attacks increase exponentially or vice-versa.
In this appendix, an attempt has been made to model the dynamic relationship
existing between the detectors and the attacks and this knowledge can be used
to enrich the design and development of IDSs. For each combination of a and
d there is an unstable attack-detector equilibrium, with the slightest disturbance
leading to expanding population oscillations. With Nicholson-Bailey equations
C.1 and C.2, it is shown that depending on the initial state, the system can evolve
towards a simple steady state or a limit cycle, in which the attack-detector populations oscillate periodically in time. The attack-detector relationship may thus
exhibit coupled oscillations. The aim is to study the oscillatory behavior of an
attack-detector model with intelligent pursuit and evasion rules. Usually detectors respond to attack distribution. A constant searching efficiency is more
difficult to accept. Searching efficiency depends on the speed of the traffic,
attack density on a priori grounds and also on the detector density.
C.2.1 Attack/Detection as they stand alone
This appendix investigates the fate of the attack and the detector in the absence
of the other, in order to assess whether the modeling of attack-detector relationship using the Nicholson-Bailey model would be reasonable.
At+1 = aAt exp(dDt )
and
Dt+1 = At [1 exp(dDt )]
In the absence of detectors At+1 = aAt and Dt+1 = 0; attacks increase exponentially and at any time (t + n), At+n = an At . With this simple model, it

Appendix C

243

is clear that when the detector density Dt = 0, the attack density will follow
the logistic function. It is reasonable to set an attack carrying capacity k for the
attacks beyond a certain limit if the detectors are totally absent.
In the absence of any attack, it is reasonable to assume that the presence of
detectors is of no use. So the existing detection systems will also die out at
the next instant of time if there are no attacks. i.e., if At = 0; Dt+1 = 0,
At+1 = 0. To explain in detail, the beginning years of the Internet is taken as an
year of no attack, the succeeding year will not have any detectors. This happens
till attacks are found and with a latency, detectors evolve. The detector density
is a function of the attack density weighted by the efficiency and the density of
the detector. As soon as one of these detector parameters becomes very large,
the detector density will match, but never overcomes the attack density.
The basic model of Nicholson-Bailey can be extended by incorporating additional features such as density-dependence in the attacks, interference among
detectors, and the refuges. The classification of the detectors spans a wide
range of complexity. The general statistics show that the more the detectors are
successful in detecting the attacks, the more are the chances of highly sophisticated detectors emerging, possibly learning from the detected attacks. Thus
the Nicholson-Bailey model for attack-detector modeling is based on the data
that reflect the knowledge that one has about the system and/or the potential
attacks, but it does not express all the different possibilities that are encountered
in the attack-detector interaction. The following sections look into the different
possibilities in order to generalize the attack-detector interactions.
C.2.2 Attack carrying capacity
The Nicholson-Bailey model suffers from the important defect of having the
attacks with a constant rate of increase and thus a potentially unlimited number
of attacks. It is necessary to take into account the fact that the attack density
does not exceed beyond some carrying capacity. The first definitive theoretical
treatment of this relationship is that while detectors and attacks grow logarithmically, the resources on which they depend may not follow such a fast rate
of increase. Thus the demand for resources must eventually exceed the supply,

Appendix C

244

and population growth being dependent on the resource supply, must then cease.
This is mathematically modeled as a logistic equation with the attack remaining
at a saturation value equal to the attack carrying capacity. Practically there are
technical bottlenecks for the attacks to increase beyond this value; other reasons
of the sated state of the attacks can be the ineffectiveness of detectors, or that
the existing attacks serve all the malicious intents. Hence the attacks should saturate at the attack carrying capacity given by At+1 = At when At = k, where k
is the attack equilibrium density or the attack carrying capacity. This condition
is substituted in the attack equation
At+1 = aAt exp(dDt )
Attacks saturate as At+1 = At = k. This simplifies the attack equation to
exp(dDt ) = a1 or dDt = ln(a). Hence to incorporate the attack carrying
capacity, the attack-detector equations can be modified as:
At+1 = aAt exp(ln(a) Akt dDt )
and
Dt+1 = At [1 exp(dDt )]
respectively, where ln(a) is density-dependent using the expression ln(a) Akt ,
such that as At approaches k, the growth rate of attack approaches zero. The
introduction of the carrying capacity is to cause the system to be stable, thereby
making the Nicholson-Bailey model more realistic. The impact of the attack
getting sated on the detector is that it will vary depending on the probability of
detecting attacks. The detectors pick up to a stage of maximum detection during
this time when the attacks are sated. If given enough time lag, detectors also
will stabilize. Since k depends on the technology bottleneck, it is expected to
increase every year. Expecting around 500 varieties of attacks in year 2000, it
can be several fold higher now.
The stability of this density-dependent model is determined by the attack increase rate a and also by the detector searching efficiency d as shown in Figure
C.4 with k = 500. If the detector is extremely efficient with a large value of d,

Appendix C

245

2.5

4
x 10 ATTACKDETECTOR RELATIONSHIP

A(t)
D(t)

A(t) and D(t)

1.5

0.5

3
Time

Figure C.2: Attack-Detector relationship with attack carrying capacity

then it is expected to hold the attacks below their carrying capacity. The dynamics are determined most strongly by the unstable attack-detector interactions. If
the detector is inefficient with a small value of d, attack dynamics are determined largely by the density-dependent feedback. Thus the density-dependent
attack growth rate can be applied to populations facing limited resources, a
situation that is unlikely to occur in successful cases of stable systems where
equilibria occur at very low levels where resources are not limited. Thus the
detections and the increase rate can be considered as a means of stabilizing the
interactions.
C.2.3 Stability in attack-detector model
With Dt detectors searching for At attacks, the ones that are not detected or survive patch fixing, along with the new attacks generated in the interval between
t and t + 1 are given by At+1 . Similarly, there will be detectors that learn from
the attacks detected and also there will be detectors that remain effective even
with new vulnerabilities or service closure at any point of time and hence Dt+1
denote the number of detectors at any time t + 1. This is a cyclic pickup of attacks and detectors and hence is oscillatory in nature without any over damping
as such.
It is shown in section 2.5.2 that simple detector models when aggregated in
a network of high attack density contribute to the stability of an attack-detector

Appendix C

246

interaction. For the detection of external intrusion activities, if there are multiple
paths to the Internet, an IDS needs to be present at every entry point, whereas
for the detection of internal intrusion activities, an IDS is required in every
network segment. This specifies the broadest solution of advanced intrusion
detection technology to provide ubiquitous coverage through individual IDSs
spread everywhere on the network. The success of a security system depends
on the detector or the security measures in reducing the attack population and
maintaining it at a new lower level in a stable interaction. These equilibrium
levels depend on the following two factors:
1. the effective rate of increase of the attack unaffected by detection.
2. the average proportion of the attacks detected, which in turn depends on
the number of detectors and all factors affecting the searching efficiency
( ANt De t ).
The IDSs that are likely to stabilize the attack population at low levels have the
following characteristics:
high intrinsic searching efficiency
small attack handling time
detector interference to a certain level
high level of detector aggregation using the techniques of sensor fusion
C.2.4 Inclusion of stealthy attacks
If both the attack and the detector were randomly and independently distributed
in Nicholson-Bailey fashion, then the proportion of the attacks escaping detection at time t is given by edDt . If a proportion b of the attacks that are at the
risk of detection are allowed to hide themselves, for example, with the fragmentation of packets or even tunneling, then the proportion of the attacks escaping
detection is raised to:
edDt + b(1 edDt ) for 0 b 1

Appendix C

247

The equations for the number of attacks and detectors at any instant of time
t are:
At+1 = aAt (b + (1 b)exp(dDt )
and
Dt+1 = At At+1 /a
The equilibrium
of the above equation are as:
h solutions
i
a(1b)
dD = ln 1ab
and
a
A = a1
D .
It is necessary for the value of b to lie between 0 and 1 for a solution to exist.
These solutions are stable against small disturbances. When b tends towards
zero, the system resembles the Nicholson-Bailey model.
C.2.5 Modeling of non-random attacks and detection
The random search is a mathematically convenient assumption, but not a realistic one. When a network intrusion happens, the sequence of attacks does not
take place in a totally random order. Intruders come with a set of tools trying
to achieve a specific goal. The selection of the cracking tools and the order of
application depends heavily on the situation as well as the responses from the
targeted system. Typically there are multiple ways to invade a system. Nevertheless, it usually requires several actions/tools to be applied in a particular
logical order to launch a sequence of effective attacks to achieve a particular
goal. It is this logical partial order that reveals the short and long-term goals
of the invasion. Random search is an exception rather than a rule. Real attacks
are likely to be distributed in a patchwork of high and low densities, and the
detectors can be expected to respond to the attacks by orienting towards high
density patches. It is natural that the detector searches for certain traffic features
for signs of attack. This provides a strong selective advantage for detectors that
result in a more focused search. The modeling of non-random behavior of the
attack-detector interactions is provided in detail in chapter 2.

Appendix C

248

C.3 Summary
The modeling shows the restricted growth rates of both the attacks as well as its
detection. With the existing IDSs it is not possible to attain a growth rate so that
the effect of attacks is not felt in the information systems. Hence, it is required
to look at advanced techniques for performance enhancement of the available
IDSs. The level of severity of the alert is understood with this modeling. This
knowledge could then potentially be used by a security analyst to understand
and respond more effectively to future intrusions. As seen from the model, the
existing as well as emerging attacks are not expected to totally evade the detectors monitoring the network. The modeling is realistic in an environment of
network with multiple IDSs for protection, looking at the system as a whole,
instead of the individual responses to an attack. For more proactive defense, it
is essential to understand the network defensive and offensive strategies. With
the attack-detector scenario better understood, the future evolution of attacks
can be estimated in a certain way thereby aiding better attack detection and in
turn reduced false negatives. This knowledge helps the security community to
become proactive rather than reactive with respect to incident response.

Appendix D
Methodology for Evaluation of Intrusion
Detection Systems
Make everything as simple as possible, but not simpler.
Albert Einstein

D.1 Introduction
The poor understanding of the performance of Intrusion Detection Systems
available in literature may be in-part caused by the shortage of an effective,
unbiased evaluation and testing methodology that is both scientifically rigorous
and technically feasible. The choice of intrusion detection systems for a particular environment is a general problem, more concisely stated as the intrusion
detection evaluation problem, and its solution usually depends on several factors. The most basic of these factors are the false alarm rate and the detection
rate, and their tradeoff can be intuitively analyzed with the help of the Receiver
Operating Characteristic (ROC) curve [14], [57], [12], [58], [59]. However, as
pointed out by the earlier investigators [21] [60] [61], the information provided
by the detection rate and the false alarm rate alone might not be enough to provide a good evaluation of the performance of an IDS. Hence, the evaluation
metrics need to consider the environment the IDS is going to operate in, such as
the maintenance costs and the hostility of the operating environment (the likelihood of an attack). In an effort to provide such an evaluation method, several
performance metrics such as Bayesian detection rate [21], expected cost [60],
sensitivity [62] and intrusion detection capability [63], have been proposed in
249

Appendix D

250

literature. These metrics usually assume the knowledge of some uncertain parameters like the likelihood of an attack, or the costs of false alarms and missed
detections. Yet despite the fact that each of these performance metrics makes
their own contribution to the analysis of intrusion detection systems, they are
rarely applied in the literature when proposing a new IDS.
This Appendix introduces a framework for evaluating IDSs along with some
new metrics for IDS evaluation. Classification accuracy in intrusion detection
systems deals with such fundamental problems as how to compare two or more
IDSs, how to evaluate the performance of an IDS, and how to determine the
best configuration of an IDS. In an effort to analyze and solve these related
problems, evaluation metrics such as Area Under ROC Curve, precision, recall, and F-score, have been introduced. Additionally, we introduce the P-test,
which is more of an intuitive way of comparing two IDSs and also more relevant to intrusion detection evaluation problem. We also introduce a formal
framework for reasoning about the performance of an IDS and the proposed
metrics against adaptive adversaries. We provide simulations and experimental
results with these metrics using the real-world network traffic and the DARPA
1999 data set in order to illustrate the benefits of the proposed algorithms in the
chapters five to nine.

D.2 Metrics for performance evaluation


This section introduces the metrics for IDS performance evaluation with its merits and demerits for such an evaluation and analyzes them in a unified framework.
D.2.1 Detection rate and false alarm rate
Let T P be the number of attacks that are correctly detected, F N be the number
of attacks that are not detected, T N be the number of normal traffic packet/connections
that are correctly classified, and F P be the number of normal traffic packet/connections
that are incorrectly detected as attack. In the case of an IDS, there are both the
security requirements and the usability requirements. The security requirement
is determined by the T P rate and the usability requirement is decided by the

Appendix D

251

number of F P s. There is a natural trade-off between these two metrics. The


concept of finding the optimal trade-off of the metrics used to evaluate an IDS
is an instance of the more general problem of multi-criteria optimization. In
this setting, we want to maximize (or minimize) two quantities that are related
by a trade-off, which can be done via two approaches. The first approach is to
directly compare the two metrics via a trade-off curve. The second approach
is to find a suitable way of combining these two metrics in a single objective
function to optimize. We therefore classify the above defined metrics into two
general approaches that will be explored in the rest of this section: the tradeoff
approach and the maximization of a figure-of-merit value.
D.2.2 Receiver Operating Characteristic (ROC) Curve
ROC curves are used to evaluate classifier performance over a range of tradeoffs between T Prate and F Prate . ROC curve is a plot which has the x-axis as the
false alarm rate and y-axis as the detection rate. i.e., ROC={< T Prate , F Prate >}.
One of the benefits of ROC graphs is its ability to separate error cost considerations from the IDS performance. Additionally, the ROC curves remain
invariant under changing class distributions. However, the disadvantage with
the ROC curve is that even small changes in false alarm rate may cause drastic
differences in detection rate when the normal traffic abound in comparison to
the attack traffic in the network traffic.
D.2.3 The Area Under ROC Curve (AU C)
AU C is a convenient way of comparing IDSs. AU C is the performance metric
for the ROC curve. A random IDS has an area of 0.5 whereas an ideal one has
an area of one.
D.2.4 Accuracy
The commonly used IDS evaluation metric on a test data is the overall accuracy.
Overall Accuracy =

T P +T N
T P +F P +T N +F N

Appendix D

252

Overall Accuracy is not a good metric for comparison in the case of network
traffic data since the true negatives abound.

D.2.5 Precision
Precision (P ) is a measure of what fraction of test data detected as attack is
actually from the attack class.
P =

TP
T P +F P

D.2.6 Recall
Recall (R) is a measure of what fraction of attack class is correctly detected.
R=

TP
T P +F N

There is a trade-off between the two metrics precision and recall. As the number of detections increase by lowering of the threshold, the recall will increase,
while precision is expected to decrease. A plot showing the recall-precision
characterization of a particular IDS is used to analyze the relative and absolute
performance of an IDS over a range of operating conditions.
D.2.7 F-score
F-score scores the balance between precision and recall. The F-score is a measure of the accuracy of a test. The F-score can be considered as the harmonic
mean of recall and precision, and is given as:
F -score =

2P R
P +R

The standard measures, namely, precision, recall, and F-score are grounded on a
probabilistic framework and hence allows one to take into account the intrinsic
variability of performance estimation. The comparison of IDSs with the metric
F-score has the limitation in directly applying tests of significance to it in order

Appendix D

253

to determine the confidence level of the comparison. The primary goal was to
achieve improvement in both precision as well as recall, and hence P-test [110]
was used for IDS comparison.
D.2.8 P-test
To compare two IDS X and Y , let (RX , PX ) and (RY , PY ) be the values of recall and precision with respect to attack respectively. Let IDS X and Y predict
NXP os and NYP os positives respectively and N P os be the total number of positives
in the test sample. Then the P-test is applied as follows:

ZR =

RX RY
2R(1R)/N P os

ZP =

where R =

PX PY
P os +1/N P os )
2P (1P )(1/NX
Y

RX +RY
2

and P =

P os
NX
PX +NYP os PY
P os +N P os
NX
Y

If ZR 1.96, then RX can be regarded as being significantly better than RY at


the 95% confidence level.
If ZR 1.96, then RX can be regarded as being significantly poorer than
RY at the 95% confidence level.
If |ZR | 1.96, then RX can be regarded as being comparable to RY .
Similar tests are applied to compare PX and PY .
Now, in order to compare the two IDSs X and Y ;
IDS X is better than IDS Y if either of the following criteria is satisfied:
RX RY and PX PY
RX RY and PX PY
RX RY and PX PY

Appendix D

254

RX RY and PX PY , then X Y .
It may so happen that one metric is significantly better and the other metric
is significantly worse. In such cases of conflict, the non-probabilistic metric
F-score can be used instead of applying the significance test.

D.3 Test setup


The test setup for the experimental evaluation undertaken in this thesis work
consists of three Pentium machines with Linux operating system. A combination of shallow and deep sensors distributed across a single subnet and observing the same domain is required for a good protection. Intrusion detection systems can extract information to detect attacks from different layers like packet
headers (shallow), packet payload (deep) or both. To take advantage of such a
complementary collection, the following three IDSs are chosen:
1. PHAD[67], which is based on attack detection by extracting the packet
header information,
2. ALAD[68], which is application payload-based,
3. Snort[69], which collects information from both the header and the payload part of every packet on time-based as well as on connection-based
manner.
This choice of heterogeneous sensors in terms of their functionality is to exploit
the advantages of fusion IDS [94]. In addition, complementary IDSs provide
versatility and similar IDSs ensure reliability. An experimental Packet Header
Anomaly Detector (PHAD) [67] that monitors the 33 fields of the Ethernet,
TCP, UDP and ICMP protocols is chosen as one of the IDSs for the combination. Observing the header fields makes it efficient to detect Probes and DoS
attacks. The second sensor chosen is Application Layer Anomaly Detector
(ALAD) [68] and it complements PHAD in detection by monitoring incoming
TCP connections to well-known server ports. ALAD has six attributes for detection namely source IP address, destination IP address, destination port, TCP
flags, application keywords and the application argument. It detects the R2L

Appendix D

255

attack with high detection rate since R2L attack normally exploits the application layer. Other than the diversity of the chosen IDSs, yet another reason for
the choice of the two anomaly detectors PHAD and ALAD was the acceptably
low false alarm rates. Snort is an open source network intrusion prevention and
detection system utilizing a rule-driven language, which combines the benefits
of signature, protocol and anomaly based inspection methods. Snort is the most
widely deployed intrusion detection and prevention technology worldwide and
has become the de facto standard for the industry. Snort is efficient in detecting
the DoS attacks and the U2R attacks with high detection rate.

D.4 Summary
In an effort to analyze and solve the IDS evaluation problems identified in this
thesis, evaluation metrics such as Area Under ROC Curve, precision, recall, and
F-score have been introduced in this appendix. Additionally, the P-test, which
is more of an intuitive way of comparing two IDSs and also more relevant to
intrusion detection evaluation problem has been included. The metrics used for
IDS evaluation like F-score and P-test are highly effective for a perfect comparison of IDSs.

References
[1] M. McLuhan, Letters of Marshall McLuhan, Oxford University Press,
1987, pp. 254.
[2] Internet Domain Survey Host Count, https://www.isc.org/solutions/survey
[3] J. McHugh, A. Christie, J.Allen, Defending Yourself: The Role of Intrusion Detection Systems, IEEE software, Sep/Oct. 2000.
[4] Losses due to cyber crime can be as high as $40 billion, Business Line,
Business Daily from THE HINDU group of publications, Monday, May21,
2007.
[5] CSI/FBI
Computer
Crime
http://www.gocsi.comipress/20020407

and

Security

Survey,

[6] J.P. Anderson, Computer Security Threat Monitoring and Surveillance,


Technical report, James P. Anderson Co., Fort Washington, PA., April
1980.
[7] D.E. Denning, An Intrusion-Detection Model, IEEE Transactions on Software Engineering, vol. SE-13, pp. 222-232, 1987.
[8] P. Helman, G. Liepins, Statistical Foundations of Audit Trail Analysis for
the Detection of Computer Misuse. In IEEE Transactions on Software Engineering, volume Vol 19, No. 9, pages 886-901, 1993.
[9] H.S. Javitz, A. Valdes, The NIDES Statistical Component Description and
Justification, Technical report, SRI International, Menlo Park, CA, March
1994.

256

REFERENCES

257

[10] C. Ko, M. Ruschitzka, K. Levitt, Execution Monitoring of SecurityCritical Programs in Distributed Systems: A Specification-based Approach, In Proceedings of the 1997 IEEE Symposium on Security and
Privacy, pp. 175-187, May 1997.
[11] D. Wagner, D. Dean, Intrusion Detection via Static Analysis, In Proceedings of the IEEE Symposium on Security and Privacy, IEEE Press, 2001.
[12] C. Warrender, S. Forrest, B.A. Pearlmutter, Detecting intrusions using system calls: Alternative data models, In IEEE Symposium on Security and
Privacy, pages 133-145, 1999.
[13] DEF CON 8 conference. Las Vegas, NV, 2000. www.defcon.org
[14] W. Lee, S.J. Stolfo, P.K. Chan, E. Eskin, W. Fan, M. Miller, S. Hershkop, and J. Zhang, Real time data mining-based intrusion detection. In
Proc. Second DARPA Information Survivability Conference and Exposition, IEEE Computer Society, pp. 85100.
[15] S. Kumar, Classification and Detection of Computer Intrusions, PhD thesis, West Lafayette, IN: Purdue University, Computer Sciences, 1995.
[16] T.D. Lane, Machine Learning Techniques for the computer security domain of anomaly detection, Ph. D. thesis, Purdue Univ., West Lafayette,
IN, 2000.
[17] F. Neri, Comparing local search with respect to genetic evolution to detect
intrusion in computer networks, In Proc. of the 2000 Congress on Evolutionary Computation CEC00, IEEE Press, pp. 238243
[18] P.K. Chan, S. Stolfo, Toward parallel and distributed learning by metalearning, In Working Notes AAAI Work, Knowledge Discovery in
Databases, Portland, OR, pp. 227240, AAAI Press, 1993.
[19] A. L. Prodromidis, and S. J. Stolfo, Cost complexity-based pruning of
ensemble classifiers, Knowledge and Information Systems 3(4), 449469,
2001.

REFERENCES

258

[20] P.L. Carbone, Data mining or knowledge discovery in databases: An


overview, In Data Management Handbook, New York: Auerbach Publications, 1997.
[21] S. Axelsson, A preliminary attempt to apply detection and estimation theory to intrusion detection, Technical Report 00-4, Chalmers Univ. of Technology, Goteborg, Sweden, 2000.
[22] W. Lee, A Data Mining Framework for Constructing Features and Models
for Intrusion Detection Systems, Ph. D. thesis, Columbia University.
[23] W. Lee, R. A. Nimbalkar, K. K. Yee, S. B. Patil, P. H. Desai, T. T. Tran,
and S. J. Stolfo, A data mining and CIDF based approach for detecting
novel and distributed intrusions, 2000.
[24] W. Lee, and S. J. Stolfo, Data mining approaches for intrusion detection,
In Proc. of the 7th USENIX Security Symp., San Antonio, TX. USENIX,
1998.
[25] M.V. Mahoney, P.K. Chan, Learning non stationary models of normal network traffic for detecting novel attacks, SIGKDD, 2002.
[26] W. Fan, Cost-Sensitive, Scalable and Adaptive Learning Using Ensemblebased Methods. Ph. D. thesis, Columbia University, 2001.
[27] K. Kendall, A database of computer attacks for the evaluation of intrusion
detection sytsems, Thesis, MIT, 1999.
[28] L. Didaci, G. Giacinto, F. Roli, Intrusion detection in computer networks
by multiple classifiers systems, International Conference on Pattern recognition, 2002.
[29] G. Giacinto, and F. Roli, Intrusion detection in computer networks by multiple classifier systems, In Proc. of the 16th International Conference on
Pattern Recognition (ICPR), Volume 2, Quebec City, Canada, pp. 390393.
IEEE press, 2002.
[30] A. Porras and P. G. Neumann, EMERALD: Event Monitoring Enabling
Responses to Anomalous Live Disturbances, Proc. 20th NISSC, pp. 353365, 1997.

REFERENCES

259

[31] M. Kubat, R.C. Holte, S. MATWIN, Learning when negative examples


abound: One-sided selection, Proceedings of the ninth European Conference on machine learning, pp. 146-153, 1997.
[32] L. Breiman, J.H. Friedman, R.A. Olshen, C.J. STONE, Classification and
regression trees, Belmount, CA: Wadsworth, 1984.
[33] P.K. Chan, S. Stolfo, Towards scalable learning with non-uniform class
and cost distributions: A case study in credit card fraud detection, Proceedings of 4rth International Conference on knowledge discovery and
data mining (KDD-98), pp. 164-168, 1998.
[34] K. McCarthy, B. Zabar, G. Weiss, Does cost-sensitive learning beat sampling for classifying rare classes?, Proceedings of the 1st International
workshop on utility-based data mining, pp. 69-77, 2005.
[35] Z. Chair, P.K. Varshney, Optimal data fusion in multiple sensor detection
systems, IEEE Transactions on Aerospace and Electronics systems, 22, 1,
98-101, 1986.
[36] M.V. Joshi, On evaluating performance of classifiers for rare classes, Proceedings of the 2002 IEEE International Conference on data mining, pp.
641-644, 2002.
[37] W.W. Cohen, Fast effective rule induction, Proceedings of 12th International conference on machine learning, California, 1995.
[38] J.R. Quinlan, C4.5: Programs for machine learning, Morgan Kaufmann,
1993
[39] K. Lan, A. Hussain, D. Dutta, Effect of malicious traffic on the network,
Proceedings of PAM, 2003.
[40] S. Bay, A framework for discovering anomalous regimes
in multivariate time-series data with local models, 2004,
http://cll.stanford.edu/symposia/anomaly/abstracts.html
[41] T. Fawcett, Activity monitoring: anomaly detection as an on-line classification, 2004.

REFERENCES

[42] DARPA
intrusion
http://www.ll.mit.edu/IST/ideval/data/
data index.html

260

detection

evaluation,

[43] W. Lee, S.J.Stolfo, A Data Mining framework for building intrusion detection models, IEEE Symposium on Security and Privacy, 1999.
[44] T. Lane, C.E. Brodley, Temporal sequence learning and data reduction for
anomaly detection, ACM Trans. Inform. Syst. Secur. 2 (3), 1999.
[45] M. Thottan, C. Ji, Anomaly detection in IP networks, IEEE Trans. Signal
Processing 51 (8) (2003) 21912204.
[46] S. Jin, D. Yeung, A covariance analysis model for DDoS attack detection, IEEE International Communication Conference (ICC04), vol. 4, June
2004, pp. 2024.
[47] S. Jina, D. S. Yeunga, XizhaoWangb, Network intrusion detection in covariance feature space, Pattern Recognition, vol.40, pp 2185-2197, 2007.
[48] DARPA intrusion detection evaluation, http://www.ll.mit.edu/IST/ideval/
[49] S. Axelsson. The base-rate fallacy and its implications for the difficulty of
intrusion detection. In Proceedings of the 6th ACMConference on Computer and Communications Security (CCS 99), pages 17, November 1999.
[50] D.E. Denning, Information Warfare and Security, Addison Wesley, 1999.
[51] W. Lee, W. Fan, M. Miller, S. Stolfo, E. Zadok, Toward cost-sensitive
modeling for intrusion detection and response, Technical report CUCS002-00, Computer Science, Columbia University, 2000.
[52] Elkan, C., Results of the KDD99 classifier learning, SIGKDD Explorations, Vol. 1, Issue 2, pp. 63-64, Jan 2000.
[53] CERT report of vulnerabilities, http://www.cert.org/stats/cert stats.htm/
#vulnerabilities
[54] T. Shimeall, P. Williams, Models of Information Security Trend Analysis,
www.cert.org/archive/pdf/info-security.pdf

REFERENCES

261

[55] H.K. Browne, W.A. Arbaugh, J. McHugh, W. L. Fithen, A Trend Analysis


of Exploitations,www.cs.umd.edu/ waa/pubs/CS-TR-4200.pdf
[56] L. Hamza, K. Adi, K. El Guemhioui, Automatic generation of attack scenarios for intrusion detection systems, Proceedings of the advanced International Conference on Telecommunications and International Conference
on Internet and Web applications and services, AICT/ICIW, 2006.
[57] S.J. Stolfo, and K. Mok, A data mining framework for building intrusion
detection models. In Proceedings of the IEEE Symposium on SecurityPrivacy, pages 120132, Oakland, CA, USA, 1999.
[58] E. Eskin, A. Arnold, M. Prerau, L. Portnoy, and S. Stolfo. A geometric
framework for unsupervised anomaly detection: Detecting intrusions in
unlabeled data. In D. Barbara and S. Jajodia, editors, Data Mining for
Security Applications. Kluwer, 2002.
[59] C. Kruegel, D. Mutz,W. Robertson, and F. Valeur. Bayesian event classification for intrusion detection. In Proceedings of the 19th Annual Computer
Security Applications Conference (ACSAC), pp. 1424, 2003.
[60] J. E. Gaffney and J. W. Ulvila. Evaluation of intrusion detectors:A decision
theory approach. In Proceedings of the 2001 IEEE Symposium on Security
and Privacy, pages 5061, Oakland, CA, USA, 2001.
[61] G. Gu, P. Fogla, D. Dagon, W. Lee, and B. Skoric. Measuring intrusion
detection capability: An information-theoretic approach. In Proceedings
of ACM Symposium on Information, Computer and Communications Security (ASIACCS 06), Taipei, Taiwan, March 2006.
[62] G. Di Crescenzo, A. Ghosh, and R. Talpade. Towards a theory of intrusion
detection. In ESORICS 2005, 10th European Symposium on Research in
Computer Security, Springer Lecture Notes in Computer Science, 3679,
pp. 267286, 2005.
[63] G. Gu, P. Fogla, D. Dagon, W. Lee, B. Skoric, Towards an InformationTheoretic Framework for Analyzing Intrusion Detection Systems, In Proceedings of the 11th European Symposium on Research in Computer Security.

REFERENCES

262

[64] J. McHugh, Testing Intrusion Detection Systems: A Critique of the 1998


and 1999 DARPA IDS evaluations as performed by Lincoln Laboratory,
ACM Transactions on Information and System Security, vol.3, No.4, Nov.
2000.
[65] M. V. Mahoney, P. K. Chan, An analysis of the 1999 DARPA /Lincoln Laboratory evaluation data for network anomaly detection, Technical Report
CS-2003-02.
[66] V. Paxson, The Internet Traffic Archive. http://ita.ee.lbl.gov/2002
[67] M.V. Mahoney, P.K. Chan, Detecting Novel attacks by identifying anomalous Network Packet Headers, Florida Institute of Technology Technical
Report CS-2001-2.
[68] M.V. Mahoney, P.K. Chan, Learning non stationary models of normal network traffic for detecting novel attacks, SIGKDD, 2002.
[69] Snort Manual, www.snort.org/docs/snort htmanuals/htmanual 260
[70] Cisco IDS4215 manual, http://www.cisco.com/en/US/products/hw/vpndevc/
ps4077/index.html
[71] S. T. Brugger, J. Chow, An assessment of the DARPA IDS evaluation
dataset using Snort, Tech. Report, CSE-2007-1, 2005.
[72] K. J. Pickering, Evaluating the viability of intrusion detection system
benchmarking, Bachelor Thesis, University of Virginia, US, 2002.
[73] S. M. Bellovin, Packets found on an Internet, Technical report, AT&T Bell
Laboratories, May, 1992.
[74] J. Sommers, V. Yegneswaran, P. Barford, Toward comprehensive traffic
generation for online IDS evaluation, Technical Report, University of Wisconsin.
[75] SAFE: A security blueprint for enterprise networks, White paper, Cisco
Systems, 2000.

REFERENCES

263

[76] R. Durst, T. Champion, B. Witten, E. Miller, L. Spagnuolo, Testing and


evaluating computer intrusion detection systems, Communications of the
ACM, vol.42, No.7, Jul. 1999.
[77] S.S. Iyengar, R.R. Brooks, Multi-Sensor Fusion: Fundamentals and Applications with Software, Prentice Hall, 1998.
[78] W. Fan, W. Lee, S. J. Stolfo, and M. Miller, A multiple model cost sensitive
approach for intrusion detection. In R. L. de Mantaras and E. Plaza (Eds.),
Proc. of Machine Learning: ECML 2000, 11th European Conference on
Machine Learning, Volume 1810, Springer Lecture Notes in Computer
Science, Barcelona, Spain, pp. 142153.
[79] C. Siaterlis, B. Maglaris, Towards Multisensor Data Fusion for DoS detection, ACM Symposium on Applied Computing, 2004.
[80] G. Brown, Diversity in Neural Network ensembles, PhD thesis, The University of Birmingham, B15 2TT United Kingdom, 2004.
[81] R.S. Blum, On multisensor image fusion performance limits from an estimation theory, Information Fusion Journal, vol.7 , 3, pp. 250-263, 2006.
[82] B.V. Dasarathy, Sensor fusion potential exploitation-innovative architectures and illustrative applications, Proceedings of the IEEE, Vol. 85, 1,
pp.24-38, 1997.
[83] O. Cohen, Y. Edan, E. Schechtman, Statistical Evaluation Method for
Comparing Grid Map Based Sensor Fusion Algorithms, International Journal of Robotic Research, Vol. 25, No. 2, pp.117-133, 2006.
[84] X. R. Li, Y.-M. Zhu, and C.-Z. Han, Unified optimal linear estimation
fusion-part I: Unified models and fusion rules, Proc. 2000 International
Conf. Information Fusion, pp. MoC2-10-MoC2-17, 2000.
[85] A. Krogh and J. Vedelsby, Neural network ensembles, cross validation,
and active learning, NIPS, 7, pp.231-238, 1995.
[86] D. H. Hall, S. A. H. McMullen, Mathematical Techniques in Multi-Sensor
Data Fusion, Second Edition, Artech House.

REFERENCES

264

[87] P.J. Nahin, J.L. Pokoski, NCTR Plus Sensor Fusion Equals IFFN or can
Two Plus Two Equal Five?, IEEE Transactions on Aerospace and Electronic Systems,vol. AES-16, 3, pp.320-337, 1980.
[88] S.C.A. Thomopoulos, R. Vishwanathan, D.C. Bougoulias, Optimal decision fusion in multiple sensor systems, IEEE Transactions on Aerospace
and Electronics Systems, vol. 23, 5, pp.644-651, 1987.
[89] W. Baek and S. Bommareddy, Optimal m-ary data fusion with distributed
sensors, IEEE Transactions on Aerospace and Electronics Systems, vol.31,
3, pp.1150-1152, 1995.
[90] V. Aalo, R. Viswanathan, On distributed detection with correlated sensors:
Two examples, IEEE Trans. Aerospace Electron. Syst., vol.25, pp.414421.
[91] E. Drakopoulos, C.C. Lee, Optimum multisensor fusion of correlated local, IEEE Trans. Aerospace Electron. Syst., vol.27, pp.593-606.
[92] M. Kam, Q. Zhu, W. Gray, Optimal data fusion of correlated local decisions in multiple sensor detection systems. IEEE Trans. Aerospace Electron. Syst., vol.28, pp.916-920.
[93] R. Blum, S. Kassam, H. Poor, Distributed detection with multiple sensors
- Part II: Advanced topics, Proceedings of IEEE, pp.64-79.
[94] T. Bass, Multisensor Data Fusion for Next Generation Distributed Intrusion Detection Systems, IRIS National Symposium, 1999.
[95] G. Giacinto, F. Roli, L. Didaci, Fusion of multiple Classifiers for Intrusion Detection in Computer Networks, Pattern Recognition Letters, 24,
pp. 1795-1803, 2003.
[96] Y. Wang, H. Yang, X.Wang, R. Zhang, Distributed intrusion detection
system based on data fusion method, Intelligent control and automation,
WCICA 2004.
[97] W. Hu, J. Li, Q. Gao, Intrusion Detection Engine on Dempster-Shafers
Theory of Evidence, Proceedings of International Conference on Communications , Circuits and Systems, vol.3, pp. 1627-1631, Jun 2006.

REFERENCES

265

[98] A. Siraj, R.B. Vaughn, S.M. Bridges, Intrusion Sensor Data Fusion in an
Intelligent Intrusion Detection System Architecture, Proceedings of the
37th Hawaii international Conference on System Sciences, 2004.
[99] R. Perdisci, G. Giacinto, F. Roli, Alarm clustering for intrusion detection systems in computer networks, Engg. applications of Artificial intelligence, Elsevier publications, March 2006.
[100] A. Valdes, K. Skinner, Probabilistic alert correlation, Springer Verlag
Lecture notes in Computer Science, 2001.
[101] O.M. Dain, R.K. Cunningham, Building Scenarios from a Heterogeneous
Alert Stream, IEEE Workshop on Information Assurance and Security,
2001.
[102] F. Cuppens, A. Miege, Alert correlation in a cooperative intrusion detection framework, Proceedings of the 2002 IEEE symposium on security and
privacy, 2002.
[103] B. Morin, H. Debar, Correlation of Intrusion Symptoms : an Application
of Chronicles, RAID 2003.
[104] H. Debar, A. Wespi, Aggregation and Correlation of Intrusion-Detection
Alerts, RAID 2001.
[105] F. Valeur, G. Vigna, C. Kruegel, R. Kemmerer, A Comprehensive Approach to Intrusion Detection Alert Correlation, In IEEE Transactions on
Dependable and Secure Computing, 2004.
[106] H. Wu, M. Seigel, R. Stiefelhagen, J. Yang, Sensor Fusion using
Dempster-Shafer Theory, IEEE Instrumentation and Measurement Technology Conference, 2002.
[107] M. Zhu, S. Ding, R. R. Brooks, Q. Wu, S. S. Iyengar, N. S. V. Rao, Decision making-based multiple sensor data Fusion, Report, US Department
of Energy.
[108] rfp@wiretrip.net/libwhisker

REFERENCES

266

[109] R.C. Holte, N. Japkowicz, C.X. Ling, Learning from imbalanced data
sets, Technical Report WS-00-05, AAAI Press, Menlo Park, CA.
[110] R. Agarwal, M.V. Joshi, PNrule: A new framework for learning classifier
models in data mining (a case-study in network intrusion detection), Tech.
Rep. RC 21719, IBM Research report, Computer Science/Mathematics,
2000.
[111] Lippmann, R.P., An introduction to computing with Neural Nets, IEEE
ASSP Magazine, Vol.4, pp. 4-22, April 1987.
[112] G. Shafer, A Mathematical Theory of Evidence, Princeton University
Press.
[113] G. Shafer, Perspectives on the theory and practice of belief functions,
International Journal of Approximate Reasoning 31-40, 1990.
[114] P. Smets, What is Dempster-Shafers model? in Advances in the
Dempster-Shafer theory of evidence Pages:5-34, John Wiley Sons, 1994,
iridia.ulb.ac.be/ psmets/WhatIsDS.pdf
[115] G. Pasi, R. R. Yager, Modeling the concept of majority opinion in group
decision making, Information Sciences 176, 390414, 2006.
[116] R. R. Yager, On the determination of strength of belief for decision support under uncertaintyPart II: fusing strengths of belief, Fuzzy Sets and
Systems 142, 129142, 2004.
[117] P. Smets, The combination of evidence in the transferable belief
model, IEEE Transactions on pattern analysis and machine intelligence,
12(5):447458, May 1990.
[118] D. Yonga,S. WenKanga, Z. ZhenFub, L. Qi, Combining belief functions
based on distance of evidence, Science Direct, Volume 38, Issue 3, Pages
489-493,Dec.2004.
[119] R.R. Tenney, N. R. Sandel, Detection and distributed sensors, IEEE
Trans. Aerospace Electronic Systems 23(4), 501509, 1981.

REFERENCES

267

[120] J.D. Howard, An analysis of security incidents on the Internet, 19891995, PhD thesis, Carnegie Mellon University, Department of Engineering
and Public Policy, April 1997.
[121] www.cert.org/research/JHThesis/table of contents.html
[122] U. Lindqvist, E. Jonsson, How to systematically classify computer security intrusions, IEEE Symposium on Security and Privacy, p. 154163, Los
Alamitos, CA, 1997.
[123] D.J. Weber, A taxonomy of computer intrusions. Masters thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, June 1998.
[124] Chris Rodgers. Threats to TCP/IP Network Security. 2001.
[125] G. Alvarez, S. Petrovic, A new taxonomy of web attacks suitable for
efficient encoding, Computers and Security, 22(5): p. 435449, July 2003.
[126] M.A. Bishop, A taxonomy of Unix and network security vulnerabilities,
Technical report, Department of Computer Science, University of California at Davis, May 1995.
[127] I.V. Krsul, Software Vulnerability Analysis. PhD thesis, Comp. Sci.
Dept., Purdue University, May 1998.
[128] C.E. Landwehr, A.R. Bull, A taxonomy of computer program security
flaws, with examples, ACM Computing Surveys, 26(3), pp. 211254, 1994.
[129] J. Korba, Windows NT Attacks for the Evaluation of Intrusion Detection
Systems, M. Eng. Thesis, MIT Department of Electrical Engineering and
Computer Science, June 2000.
[130] A. Baker, J.B. Beale, Snort 2.1 Intrusion Detection (Second Edition)
pp.751, 2004.
[131] Xerox Palo Alto Research Center, Parc history, 2003, http://www.p
arc.xerox.com/ about/history/default.html
[132] Eugene Spafford. The InternetWorm Program: An Analysis. Technical
report, Department of Computer Sciences, Purdue University, 1988.

REFERENCES

268

[133] Fred Cohen. Computer Viruses. PhD thesis, University of Southern California, 1985.
[134] CERT Coordination Center. Advisory CA-2001-19 Code Red Worm
Exploiting Buffer Overflow In IIS Indexing Service DLL. July 2001.
http://www.cert.org/advisories/ CA-2001-19.html.
[135] CERT Coordination Center. Advisory CA-2001-26 NimdaWorm.
September 2001. http://www. cert.org/advisories/CA-2001-26.html.
[136] CERT Coordination Center. Advisory CA-2003-04 MS-SQL Server
Worm. January 2003. http://www.cert.org/advisories/CA-2003-04.html.
[137] CERT Coordination Center. Advisory CA-2003-20 W32/Blaster Worm.
August 2003. http://www.cert.org/advisories/CA-2003-20.html.
[138] Top
Ten
Cyber
Security
http://www.sans.org/2008menaces/

Menaces

for

2008,

[139] Symantec.
Symantec
Internet
Security
Threat
Report
Volume
III.
February
2003.
http://enterprisesecurity.symantec.com/content.cfm?articleid=1539EID=0
[140] Symantec.
Symantec
Internet
Security
Threat
Report
Volume
IV.
September
2003.
http://enterprisesecurity.symantec.com/content.cfm?articleid=1539EID=0
[141] Icove, David, Seger, VonStorch, A Crimefighters Handbook, OReilly &
Associates, 1995.
[142] http://www.cert.org/research/JHThesis/chapter6.html
[143] D.E. Denning, Cyberterrorism, http://www.cs.georgetown.edu/denning/infosec/
cyberterror.html
[144] 2004 web Server Intrusion Statistics, www.zone-h.org
[145] CERT Coordination Center. Advisory CA-1999-04 Melissa Macro Virus.
March 1999. http://www.cert.org/advisories/CA-1999-04.html

REFERENCES

269

[146] CERT Coordination Center. Denial of Service Attacks. 1997.


http://www.cert.org/tech tips/denial of service.html
[147] H. Debar, M. Dacier, and A. Wespi, A revised taxonomy of Intrusion
Detection Systems, Research Report, IBM, 1999.
[148] M. Esmaili, R. S. Naini, B. Balachandran, J. Pieprzyk, Case-based reasoning for intrusion detection, 12th annual computer security applications
conference, pp. 214-223, 1996.
[149] S. Northcutt, J. Novak, Network Intrusion Detection, New Riders/Pearson, Indianapolis, IN, third edition, 2003.
[150] D. E. Denning, An intrusion detection model, IEEE Trans. S. E., SE13(2), pp. 222-232, 1987.
[151] H. Debar, M. Dacier, A. Wespi, Towards a taxonomy of intrusion detection systems, Computer Networks, vol.31, pp. 805-822, 1989.
[152] R. Weber, Information systems control and audit, Upper Saddle River,
NJ: Prentice Hall, 1999.
[153] R. P. Lippmann, R. K. Cunningham, Improving intrusion detection performance using keyword selection and neural networks, Computer Networks, vol. 34, pp. 597-603, 2000.
[154] James P. Anderson, Computer Security Threat Monitoring and Surveillance, Technical report, James P. Anderson Co., Fort Washington, PA.,
April 1980. on Software Engineering, vol. SE-13, pp. 222-232, February
1987.
[155] T. F. Lunt, A survey of intrusion detection techniques, Comput. Security,
vol. 12, no. 4, pp. 405-418, June 1993.
[156] Teresa Lunt et al., IDES: The enhanced prototype, Technical report, SRI
International, Computer Science Lab, October 1988.
[157] D. Anderson, T. Frivold, A. Valdes, Next-generation intrusion detection
expert system (NIDES), Technical report, SRI-CSL-95-07, SRI International, Computer Science Lab, May 1995.

REFERENCES

270

[158] S. E. Smaha, Haystack: An Intrusion Detection System, Proceedings of


the IEEE Fourth Aerospace Computer Security Applications Conference,
Orlando, FL., December 1988.
[159] M Sebring et al., Expert systems in intrusion detection: A case study, Proceedings of the 11th National Computer Security Conference, Baltimore,
MD., October 1988.
[160] H. S. Vaccaro, G. E. Liepins, Detection of anomalous computer session
activity, Proceedings of the 1989 Symposium on Research in Security and
Privacy, Oakland, CA., May 1989.
[161] J. R. Winkler, W. J. Page, Intrusion and Anomaly Detection in Trusted
Systems, Proceedings of the Fifth Annual Computer Security Applications
Conference, Tucson, AZ., December 1989.
[162] L. T. Heberlein et al., A network security monitor, Proceedings of the
IEEE Symposium on Research in Security and Privacy, Oakland, CA.,
May 1990.
[163] K. Jackson, D. DuBois, C. Stallings, An expert system application for
network intrusion detection, Proceedings of the 14th Department of Energy Computer Security Group Conference, 1991.
[164] S. R. Snapp et al., A system for distributed intrusion detection, Proceedings of the IEEE COMPCON 91, San Francisco, CA., February 1991.
[165] Mark Crosbie, Gene Spafford, Defending a Computer System using Autonomous Agents, Technical report No. 95-022, COAST Laboratory, Department of Computer Sciences, Purdue University, March 1994.
[166] S. Staniford-Chen, S. Cheung, R. Crawford, M. Dilger, J. Frank, J.
Hoagland, K. Levitt, C. Wee, R. Yip, D. Zerkle, GrIDS A Graph-Based
Intrusion Detection System for Large Networks, The 19th National Information Systems Security Conference, Baltimore, MD., October 1996.
[167] Ross Anderson, Abida Khattak, The Use of Information Retrieval Techniques for Intrusion Detection, Proceedings of RAID 98, Louvain-laNeuve, Belgium, September 1998.

REFERENCES

271

[168] Biswanath Mukherjee, L. Todd Heberlein, Karl N. Levitt, Network Intrusion Detection, IEEE Network, May/June 1994.
[169] May Grance, The DIDS (Distributed Intrusion Detection System) prototype, Proceedings of the Summer USENIX Conference, 227-233, San
Antonio, Texas, 8-12 June 1992.
[170] Herve Debar, Marc Dacier and Andreas Wespi, Towards a taxonomy of
Intrusion-Detection Systems, Computer Networks, 31(8): 805-822, April
1999.
[171] A. Abraham, Evolutionary Computation in Intelligent Network Management, Evolutionary Computing in Data Mining, Springer, pp. 189.210,
2004.
[172] J. L. Zhao, J. F. Zhao, and J. J. Li, Intrusion Detection Based on Clustering Genetic Algorithm, International Conference on Machine Learning
and Cybernetics IEEE, Guangzhou, pp. 3911-3914, 2005.
[173] R. H. Gong, M. Zulkernine, and Purang, A software Implementation
of a Genetic Algorithm Based Approach to Network Intrusion Detection,
SNPD/SAWNf05, IEEE, 2005.
[174] Dong Seong Kim, Ha-Nam Nguyen, Jong Sou Park, Genetic Algorithm
to Improve SVM Based Network Intrusion Detection System, AINA05,
IEEE, 2005.
[175] A. Abraham and C. Grosan: Evolving Intrusion Detection Systems, Studies in Computational Intelligence (SCI) 13, 57-79, 2006
[176] Sundaram A.,
An
http://www.acm.org 2001.

Introduction

to

Intrusion

Detection,

[177] Koral Ilgun, Richard A Kemmerer, and Phillip A. Porras, State Transition
Analysis: A rule-based Intrusion Detection Approach, IEEE Transactions
on Software Engineering, 21(3): 181-199, March 1995.
[178] Verwoerd T., and Hunt R. Intrusion Detection Techniques and Approaches, 2001, http://www. Elsevier.com

REFERENCES

272

[179] Jean Phillippe, Application of Neural Networks to Intrusion Detection,


2004, http://www.sans.org
[180] Gordeev M., Intrusion Detection Techniques and Approaches, 2004,
http://www.ict.tuwein.ac.a
[181] Stefan Axelsson, Intrusion Detection Systems: A Survey and Taxonomy,
Technical Report 99-15, Department of Computer Engineering, Chalmers
University of Technology, Swedan, March 2000.
[182] Bace R., Intrusion Detection, Macmillan Technical Publishing, 2002.
[183] R. Lippmann, D. Fried, I. Graf, J. Haines, K. Kendall, D. McClung, D.
Weber, S. Webster, D. Wyschogrod, R. Cunningham and M. Zissman,
Evaluating Intrusion Detection Systems: The 1998 DARPA Off-line Intrusion Detection Evaluation, IEEE Computer Society Press, 2000.
[184] S.
Axelsson,
Intrusion
Detection
Systems:
A
Survey and Taxonomy,
Chalmers University 99-15,
2000,
http://citeseer.nj.nec.com/axelsson00intrusion.html
[185] S. Axelsson, Research in Intrusion-Detection Systems:
Chalmers University of Technology, 1998, revisited 1999.

A survey,

[186] Christopher Kr ugel and Thomas Toth, A Survey on Intrusion Detection


Systems, TUV-1841-00-11 Technical University of Vienna, Information
Systems Institute, Distributed Systems Group, December 12, 2000.
[187] F.Sabahi, Intrusion Detection: A Survey, The Third International Conference on Systems and Networks Communications, IEEE Computer society.
[188] S. Forrest, S. A. Hofmeyr, and A. Somayaji, Computer immunology,
Commun. ACM, vol. 40, no. 10, pp. 88-96, Oct. 1997.
[189] H. Deba, M. Dacier, and A. Wespi, Toward a taxonomy of Intrusion Detection Systems, Comput. Networks, vol. 31, pp. 805-822, 1999.
[190] Shambhu Upadhyaya,
Ramkumar Chinchani,
and Kevin
Kwiat, An Analytical Framework for Reasoning About Intrusions,http://ieeexplore.ieee.org/iel5/7654/20915/00969760.pdf

REFERENCES

273

[191] John E. Dickerson, Jukka Juslin, Ourania Koukousoula, Julie A. Dickerson, Fuzzy Intrusion Detection, Proceedings: IFSA World Congress
and 20th North American Conference Fuzzy Intrusion Detection,2001,
http://ieeexplore.ieee.org/iel5/7506/20427/00943772.pdf
[192] Dipankar Dasgupta and Fabio Gonzalez, An Immunity-Based Technique
to Characterize Intrusions in Computer Networks, IEEE Transactions on
Evolutionary Computation, Vol. 6, No. 3, June 2002.
[193] Susan M. Bridges and Rayford B. Vaughn, Intrusion Detection via Fuzzy
Data Mining, The Twelfth Annual Canadian Information Technology Security Symposium June 19-23, 2000.
[194] Kristopher Kendall, A Database of Computer attacks for the evaluation
of intrusion detection systems, Thesis Report, Department of Electrical
Engineering and Computer Science at MIT, June 1999.
[195] Shieh et al., A pattern oriented ID model and its applications, Proceedings
of the Symposium on security and privacy, 1991.
[196] Ilgun, Koral, USTAT:a real time IDS for Unix, Proceedings of the 1993
IEEE Computer Society Symposium on research insecurity and privacy,
1993.
[197] Christoph, Gray G, UNICORN: misuse detection for UNICOS, Proceedings of the 1995 ACM/IEEE Supercomputing Conference, Dec. 1995.
[198] White, Gregory B, PEER-based hardware protocol for IDSs, Journal of
Engineerng and Applied Science, v 2, 1996.
[199] Bonifacio, Jose Mauricio, Neural Networks applied in IDSs, IEEE International Conference on neural networks, 1998.
[200] Paxson,Vern, Bro:A system for detecting network intruders in rel-time,
Computer Network, v 31, n 23, Dec 1999.
[201] Ning, Wang X.S, Jajodia S, Modelling requests among cooperating IDSs,
Computer Communications, v 23, n 17, Nov, 2000.

REFERENCES

274

[202] Dickerson, John E, Fuzzy network profiling for intrusion detection, Annual Conference of the North American Fuzzy information processing society, 2000.
[203] Luo, JianXiong, Mining Fuzzy association rules and fuzzy frequent
episodes for intrusion detection, International Journal of Intelligent systems, v15, n 8, Aug, 2000.
[204] Andrew P. Kosoresow and Steven A. Hofmeyr, Intrusion Detection via
System Call Traces, IEEE Software, 14(5), pp. 24-42, September /oct
1997.
[205] Aviv Bergman Intrusion Detection with Neural Networks Technical Report, SRI International, Number A012, February 1993.
[206] D. Gunetti and G. Ruffo, Intrusion Detection through Behavioral Data,
Proc. of The Third Symposium on Intelligent Data Analysis, Lecture Notes
in Computer Science, Springer-Verlag, 1999.
[207] Gunar E. Liepins and H. S. Vaccaro Intrusion Detection: Its Role and
Validation Computers & Security, 11(4), pp. 347-355, 1992.
[208] Guy Helmer and Johnny Wong and Vasant Honavar and Les Miller, Feature Selection Using a Genetic Algorithm for Intrusion Detection, Proceedings of the Genetic and Evolutionary Computation Conference, Vol.
2, p. 1781, Morgan Kaufmann, 13-17 July 1999.
[209] Jake Ryan and Meng-Jang Lin and Risto Miikkulainen, Intrusion Detection with Neural Networks, Advances in Neural Information Processing
Systems 10 (Proceedings of NIPS97, Denver, CO), MIT Press, 1998.
[210] K. Ilgun, USTAT : A Real-Time Intrusion Detection System for UNIX,
Proceedings of the IEEE Symposium on Security and Privacy, pp. 16-29,
1993.
[211] Koral Ilgun and Richard A. Kemmerer and Phillip A. Porras, State Transition Analysis: A Rule-Based Intrusion Detection Approach, IEEE Transactions on Software Engineering, 21(3), pp. 181-199, March 1995.

REFERENCES

275

[212] Mark Crosbie and Eugene H. Spafford, Applying Genetic Programming


to Intrusion Detection, Working Notes for the AAAI Symposium on Genetic Programming, pp. 1-8, AAAI, 10-12 November 1995.
[213] Phillip Andrew Porras, A State Transition Analysis Tool For Intrusion
Detection, Technical Report, University of California, Santa Barbara,
1992.
[214] S. Kumar and E. Spafford, A pattern-matching model for intrusion detection, Proceedings National Computer Security Conference, pp. 11-21,
1994.
[215] Wenke Lee and Salvatore J. Stolfo, Data Mining Approaches for Intrusion Detection, Proceedings of the 7th USENIX Security Symposium
(SECURITY-98), pp. 79-94, Usenix Association, January 26-29 1998.
[216] J.McHugh, The 1998 Lincoln Laboratory IDS Evaluation (A Critique),
Proceedings of the Recent Advances in Intrusion Detection, 145-161,
Toulouse, France, 2000.
[217] Network Computing, Security Feature,
http://www.nwc.com/1023/1023f19.html

November 15,

1999,

[218] Peter
Mell,
Vincent
Hu
,
Richard
Lippmann,
An
Overview of Issues in Testing Intrusion Detection Systems,
http://csrc.nist.gov/publications/nistir/nistir-7007.pdf
[219] CERT report of vulnerabilities, http://www.cert.org/stats/cert stats.htm/
#vulnerabilities
[220] Zhu, Ding, Brooks, Wu, Iyengar, Rao, Decision making-based multiple
sensor data Fusion, Report, US Department of Energy.
[221] S. Axelsson, A preliminary attempt to apply deetction and estimation
theory to intrusion detection, Technical report 00-4, Chalmers University
of Technology, Goteborg, Sweden.
[222] W. Lee, S. J. Stolfo, Data mining approaches for intrusion detection, In
Proc of 7th USENIX security symposium, San Antonio, TX. USENIX.

REFERENCES

[223] B.
Tung,
Common
http://www.isi.edu/gost/cidf/

276

intrusion

detection

framework,

[224] P. Ning, X. S. Wang, and S. Jajodia, Modeling requests among cooperating intrusion detection systems, Computer Communications 23(17),
17021716.
[225] J.R. Winkler, and W.J. Page, Intrusion and anomaly detection in trusted
systems, In Fifth Annual Computer Security Applications Conf., 1989,
Tucson, AZ, pp. 3945.
[226] S.R. Snapp, J. Brentano, G. V. Dias, T. L. Goan, T. Grance, L. T. Heberlein, C. lin Ho, K. N. Levitt, B. Mukherjee, D. Mansur, K. L. Pon, and
S. E. Smaha, A system for distributed intrusion detection, In COMPCON
Spring 91, Digest of Papers, San Francisco, pp. 170176.
[227] NIDES: Next-generation Intrusion
http://www.sdl. sri.com/projects/nides/

Detection

Expert

System,

[228] P.A. Porras, and P. G.Neumann, EMERALD: conceptual overview statement. http://www.sdl.sri.com/papers/emerald-position1/
[229] P.A. Porras, and A. Valdes, Live traffic analysis of TCP/IP gateways, In
Proc. of the 1998 ISOC Symp. on Network and Distributed Systems Security (NDSS98), San Diego, 1998.
[230] P. A. Porras (1999, April). Experience with EMERALD to date. In
First USENIX Workshop on Intrusion Detection and Network Monitoring, Santa Clara, CA, pp. 7380.
[231] Internet Security Systems, RealSecure SiteProtector,
http://www.iss.net/products services/enterprise protection/
rssite protector/siteprotector.php
[232] Symantec Enterprise Solutions: ManHunt.http://enterprisesecurity,
symantec.com/products/products.cfm?ProductID=156.
[233] nSecure Software, nSecure nPatrol, http://www.nsecure.net/features.htm

REFERENCES

277

[234] Cisco intrusion detection,


http://www.cisco.com/warp/public/cc/pd/sqsw/sqidsz/index.shtml
[235] NFR Intrusion Management System, http://www.nfr.net/products/
[236] eTrust Intrusion Detection Log View,
http://www.cai.com/solutions/enterprise/etrust/intrusion
detection/product info/sw3 log view.htm
[237] NetSecure Log,http://www.netsecuresoftware.com/netsecurenew/
Products/NetSecure Log/netsecure log.html
[238] R.K. Iyer, S. Chen, J. Xu, Z. Kalbarczk, SecurityVulnerabilities: from
analysis to detection and masking techniques, Proceedings of the ninth
International workshop on object-oriented real-time dependable systems,
2004.
[239] E. Jonsson, T. Olovsson, An empirical model of the security intrusion
process, Proc. of the 11th annual conference on computer assurance, systems integrity and software safety, pg 176-186, 1996.
[240] E. Jonsson, T. Olovsson, An quantitative model of the security intrusion
process based on attacker behavior, IEEE Trans. on Software Engineering,
23(4): pg 235-245, 1997.
[241] E. Skoudis, Counter Hack: A step-by-step guide to computer attacks and
effective defenses, Prentice Hall, 2002.
[242] J. McDermott, Attack-Potential-Based Survivability Modeling for HighConsequence Systems, Proc. of the third IEEE International Workshop on
Information Assurance, 2005.
[243] H. Debar and A. Wespi, Aggregation and Correlation of IntrusionDetection Alerts, In Proceedings of the 4th International Symposium,
Recent Advances in Intrusion Detection (RAID) 2001, Springer-Verlag
LNCS, 2001.
[244] F. Cuppens, Managing alerts in multi-intrusion detection environments.
In 17th Annual Computer Security Applications Conference (ACSAC),
New-Oreans, 2001.

REFERENCES

278

[245] ICAT METABASE, http://icat.nist.gov/icat.cfm. Aug. 2003.


[246] A. Valdes and K. Skinner, Probabilistic Alert Correlation, In Proceedings
of the 4th International Symposium, Recent Advances in Intrusion Detection (RAID) 2001, Springer-Verlag, LNCS.
[247] A. Valdes and K. Skinner, Adaptive, Model-Based Monitoring for Cyber Attack Detection, In Proceedings of the third International Workshop,
Recent Advances in Intrusion Detection (RAID) 2000, Springer-Verlag
LNCS, 2000.
[248] D. Andersson, M. Fong, and A. Valdes, Heterogeneous Sensor Correlation: A Case Study of Live Traffic Analysis, In IEEE Information Assurance Workshop, 2002.
[249] O. M. Dain and R. K. Cunningham, Building Scenarios from a Heterogeneous Alert Stream. In IEEE Workshop on Information Assurance and
Security, 5-6, 2001.

You might also like