Evaluation Criteria

Evaluation criteria
• Accuracy
• Performance
• Completeness
• Timely response
• Adaptation and cost sensitivity
• Intrusion tolerance and attack resistance
Accuracy
• How correct an IDS works.
• It is a measure of percentage of detection and failure as well as

the number of false alarms that the system is producing.
• The target class are two (normal and abnormal/ intrusion)
• Actual percentage of abnormal data is much smaller than the

percentage of normal one, so harder to detect intrusion.
• Excessive false alarm are biggest problem facing IDS.

False positive and negative
In intrusion detection a positive data is considered to be an attack data, while a
negative data is considered to be a normal data.
Thus, aim of an IDS is to produce as many TP and TN as possible, while trying

to reduce the number of both FP and FN.
• The big circle defines the space of the whole data (i.e., normal
and intrusive data)
• The small ellipse defines the space of all predicted intrusions
by the classifier.
– Thus, it will be shared by both TP and FP.
• The ratio between the real normal data and the intrusions is
graphically represented by the use of a horizontal line.
Confusion Matrix
• It is a ranking method applied to any kind of classification
problem. The size of the matrix is determined by the number
of distinct classes that are to be detected.
• A Confusion Matrix for intrusion detection is defined as a 2-

by-2 matrix,
Precision, Recall, and F-Measure
Under normal operating conditions there is a big difference
between the rate of normal and intrusion data. The Precision,
Recall, and F-Measure metrics ignore the normal data that has
been correctly classified by the IDS (TN), and focus on both the
intrusion data (TP+FN) and FP (also known as False Alarms) that
are generated by the IDS
Precision
• It is a metric defined with respect to the intrusion class. It shows how many
examples, predicted by an IDS as being intrusive, are the actual intrusions
• The aim of an IDS is to obtain a high Precision, meaning that the number
of false alarms is minimized.
precision =TP/ TP+FP
, where precision ϵ[0,1]
• The main disadvantage of the metric is the impossibility to express the

percent of predicted intrusions versus all the real intrusions that exist in the
data.
Recall
• This metric measures the missing part from the Precision;
namely, the percentage from the real intrusions covered by the
classifier. Consequently, it is desired for a classifier to have a
high recall value.
recall = TP/ (TP+FN)

,where recall ϵ [0,1]
• This metric does not take into consideration the number of

False Alarms. Thus, a classifier can have at the same time both
good Recall and high False Alarm rate.
The disadvantage of using only recall as a metric: The figure shows two
classifiers (IDSs)that have almost the same recall (i.e., very good detection
rate) but different precisions. While in the first case (a) the precision is low
(because of the high number of false alarms), in the second case (b) even
though the recall is a little bit lower, the number of false alarms is improved.
• Furthermore, a classifier that blindly predicts all the data as being intrusive
will have a 100% Recall (but a very low precision).
F-Measure
• The F-Measure mixes the properties of the previous two
metrics, being defined as the harmonic mean of precision and
recall.
• F-Measure is preferred when only one accuracy metric is

desired as an evaluation criteria.
• Note that when Precision and Recall reaches 100%, the F-
Measure is maximum (i.e. 1), meaning that the classifier has
0% false alarms and detects 100% of the attacks.
• Thus, the F-Measure of a classifier is desired to be as high as
possible.
ROC Curves
• In intrusion detection, ROC curves are used on the one hand to
visualize the relation between the TP and FP rate of a certain
classifier while tuning it, and on the other hand, to compare the
accuracy of two or more classifiers.
• The lower-left point (0,0) characterizes an IDS that classifies all the data as
normal all the time. Obviously in such situation, the classifier will have a
zero false alarm rate, but at the same time will not be able to detect
anything.
• The upper-right point (1,1) characterizes an IDS that generates an alarm for
each data that is encountered. Consequently, it will have a 100% detection
rate and a 100% false alarm rate as well.
• The line defined by connecting the two previous points represents any
classifier that uses a randomize decision engine for detecting the intrusions.
Any point on this line can be obtained by a linear combination of the two
previously mentioned strategies. Thus, the ROC curve of an IDS will
always reside above this diagonal.
• The upper-left point (0,1) represents the ideal case when there is a 100%
detection rate while having a 0% false alarm rate. Thus the closer a point in
the ROC space is to the ideal case, the more efficient the classifier is.
Performance
• The quality of a NIDS is described by the percentage of true attacks
detected combined with the number of false alerts. However, even a
high-quality NIDS algorithm is not effective if its processing cost is
too high, since the resulting loss of packets increases the probability
that an attack is not detected.
• Besides IDS configuration various other factors influencing the

performance are, a number of architectural and system parameters
such as operating system structure, main memory bandwidth and
latency as well as the processor microarchitecture contribute to a
system’s suitability.
• Performance depends not only on the processor performance, but to

a large extent also on the memory system.
• For NIDS the performance can be evaluated as the system’s ability

to process traffic in a high speed link with minimum packet loss
while working in real time.
• Schaelicke et al. proposes a methodology to measure the performance of
rule-based NIDSs. This study measures and compares two major
components of the NIDS processing cost on a number of diverse systems to
pinpoint performance bottlenecks and to determine the impact of operating
system and architecture differences.
• Given a fixed amount of traffic load, the processing capability of an NIDS

depends on the type of the rules (header & payload) and the packet size.
• Since the size of header is generally fixed, the overall processing cost by
applying header rules depends on the number of packets to be processed.
• For payload rule the overall processing cost is determined by the size of the
packets
• This example demonstrates that for small numbers of rules, nearly
no packets are lost, but when the number of rules exceeds the
maximum processing capability of the system the number of
dropped packets increases drastically.
• the measurement carried out by Schaelicke et al. was performed for

four different packet payload sizes: 64 bytes, 512 bytes, 1000 bytes,
and 1452 bytes, the NIDS to be measured ran on the 100Mbit link,
which was nearly saturated during the evaluation.
• According to this paper, hardware platform for NIDS is very important in
terms of improving performance, thus, general-purpose systems are
generally inadequate to be used as hardware platform for NIDS even on
moderate-speed networks, since maximum number of rules they support is
much smaller than the total number of applicable rules.
• Different system parameters have different degrees of contribution to the

improvement of performance. Memory bandwidth and latency are the most
significant contributor. While CPU is a not a suitable predictor of NIDS’
performance.
• Operating systems also affects the performance of NIDS, the experimental

result presented in this paper shows that, Linux significantly outperformed
FreeBSD because of its efficient interrupt handling.
Completeness
• It represents the space of the vulnerabilities and attacks that
can be covered by an IDS.
• This criterion is very hard to assess because having a global

knowledge about attacks or abuses of privileges is impossible.
Timely response
• An IDS that performs its analysis as quickly as possible will
enable the security officer or the response engine to promptly
react before much damage is done.
• But there always be a delay between the actual moment of the
attack and the response of the system (i.e., Total delay) due to
computational time (data capture, feature extraction).
• There is no point in having a good detection rate if the
detection time takes hours or days.
Evaluation of Anomaly Detection Systems ( Dokas et al. )
• The first derived metric corresponds to the surface areas between the real
attack curve and the predicted attack. The smaller the surface under the real
attack curve, the better the intrusion detection algorithm.
• Burst detection rate (bdr) is defined for each burst and it represents the
ratio between the total number of intrusive network connections ndi that
have the score higher than pre-specified threshold within the bursty attack
and the total number of intrusive network connections within attack
intervals
• Response time represents the
time elapsed from the beginning
of the attack till the moment when
the first network connection has
the score value higher than
prespecified threshold (tresponse).
Adaptation and cost sensitivity (Lee et al.)
• Intrusion detection systems (IDSs) must maximize the realization of
security goals while minimizing costs.
• The major cost factors associated with an IDS
– Damage cost (the amount of damage to the targeted resource)
– Response cost (the cost of responding to an attack)
– Operational cost (the cost of analyzing events using an IDS)
– Confidence metric
– Consequentia cost (Cost associated with the prediction of an IDS)For
example, the false negative detection cost of the event e is equal to
damage cost associated with event e
• Detection of an attack is pointless if the operational cost of detecting
is larger than its damage cost. Also an intrusion with a higher
response cost than damage cost would not be responded to.
• Total cost of intrusion detection is defined as the sum of
consequential and operational cost.
• There is a need of a cost model to estimate the total expected cost of
intrusion detection and shows the trade-off among all relevant cost
factors.

Evaluation Criteria

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Evaluation Criteria

Uploaded by

Copyright:

Available Formats

Evaluation criteria

• It is a measure of percentage of detection and failure as well as

• The target class are two (normal and abnormal/ intrusion)

• Actual percentage of abnormal data is much smaller than the

• Excessive false alarm are biggest problem facing IDS.

Thus, aim of an IDS is to produce as many TP and TN as possible, while trying

• A Confusion Matrix for intrusion detection is defined as a 2-

• The main disadvantage of the metric is the impossibility to express the

recall = TP/ (TP+FN)

• This metric does not take into consideration the number of

• F-Measure is preferred when only one accuracy metric is

• Besides IDS configuration various other factors influencing the

• Performance depends not only on the processor performance, but to

• For NIDS the performance can be evaluated as the system’s ability

• Given a fixed amount of traffic load, the processing capability of an NIDS

• the measurement carried out by Schaelicke et al. was performed for

• Different system parameters have different degrees of contribution to the

• Operating systems also affects the performance of NIDS, the experimental

• This criterion is very hard to assess because having a global

You might also like