Professional Documents
Culture Documents
net/publication/2521253
CITATIONS READS
83 117
2 authors:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Fabio A. González on 02 January 2014.
Abstract. The paper describes the design of a genetic classifier-based intrusion detection
system, which can provide active detection and automated responses during intrusions. It is
designed to be a sense and response system that can monitor various activities on the network
(i.e. looks for changes such as malfunctions, faults, abnormalities, misuse, deviations,
intrusions, etc.). In particular, it simultaneously monitors networked computer’s activities at
different levels (such as user level, system level, process level and packet level) and use a
genetic classifier system in order to determine a specific action in case of any security
violation. The objective is to find correlation among the deviated values (from normal) of
monitored parameters to determine the type of intrusion and to generate an action accordingly.
We performed some experiments to evolve set of decision rules based on the significance of
monitored parameters in Unix environment, and tested for validation.
1. Introduction
The problem of detecting anomalies, intrusions, and other forms of computer abuses can be
viewed as finding non-permitted deviations (or security violations) of the characteristic
properties in the monitored (network) systems [12]. This assumption is based on the fact that
intruders' activities must be different (in some ways) from the normal users' activities. However,
in most situations, it is very difficult to realize or detect such differences before any damage
occur during break-ins.
When a hacker attacks a system, the ideal response would be to stop his activity before he
can cause any damage or access to any sensitive information. This would require recognition of
the attack as it takes place. Different models of intrusion detection have been developed [6], [9],
[10], [11], and many IDS software are available for use. Commercial IDS products such as
NetRanger, RealSecure, and Omniguard Intruder alert work on attack signatures. These
signatures needed to be updated by the vendors on a regular basis in order to protect from new
types of attacks. However, no detection system can catch all types of intrusions and each model
has its strengths and weaknesses in detecting different violations in networked computer
systems. Recently, researchers started investigating artificial intelligence [3], genetic approaches
[1], [6] and agent architectures [4], [5] for detecting coordinated and sophisticated attacks.
This paper describes the design and implementation of a classifier-based decision support
component for an intrusion detection system (IDS). This classifier-based IDS monitors the
activities of Unix machines at multiple levels (from packet to user-level) and determines the
correlation among the observed parameters during intrusive activities. For example, at user level
-- searches for an unusual user behavior pattern; at system level -- looks at resource usage such
as CPU, memory, I/O use etc.; at process level -- checks for invalid or unauthenticated
processes and priority violations; at packet level – monitors number, volume, and size of
packets along with source and type of connections. We developed a Java-based interface to
visualize the features of the monitored Unix environment. We used some built-in tools (such as
vmstat, iostat, mpstat, netstat, snoop, etc.), syslog files and shell commands for simultaneously
monitoring relevant parameters at multiple levels. As the data collector sensors observe the
deviations, the information is sent to the classifier system [7], [8] in order to determine
appropriate actions.
Our prototype system currently monitors the parameters listed below, some of these parameters
are categorical in nature, (e.g. type of user, type of connections) which are represented
numerically for interpretation. However, the selection of these parameters is not final and may
vary (based on their usefulness) in our future implementation. Various Unix commands are used
and filtered the output to get the values of the selected parameters [10].
To monitor user-level activities, the following parameters are recorded as audit trail and
analyzed by statistical methods to develop profiles of the normal behavior pattern of users:
U1. Type of user and user privileges
U2. Login/Logout period and location
U3. Access of resources and directories
U4. Type of software/programs use
U5. Type of Commands used
User parameters are collected based upon the user id that started the session. This allows the
separation of an account into each session. This is important to allow the detection of a session
where a hacker is masquerading as a well-known user. Once the user id of the session is
established an audit trail of that session is recorded.
The system-level parameters that provide indication of resource usage include:
S1. Cumulative per user CPU usage
S2. Usage of real and virtual memory
S3. Amount of swap space currently available
S4. Amount of free memory
S5. I/O and disk usage
All these parameters provide information of the system resource of usage. These parameters
are initially monitored over a period of time to obtain relatively accurate measure of the normal
usage of the system there by providing indication of intrusion on the network.
Historical data of relevant parameters are initially collected over a period of time during
normal usage (with no intrusive activities) to obtain relatively accurate statistical measure of
normal behavior patterns. Accordingly, different threshold values are set for different
parameters.
It is very difficult to develop intelligent decision support component for intrusion detection
systems, as uncertainties and ambiguities. The best approach may be to design an evolvable
system that can adapt to environment. A classifier system is an adaptive learning system that
evolves a set action selection rules to cope with the environment. The condition-action rules are
coded as fixed length strings (classifiers) and are evolved using a genetic search. These
classifiers are evolved based on the security policy – this rule set forms a security model with
which the current system environment needs to be compared. In our approach, the security
policies are embedded while defining the normal behavior. In a classifier rule, the condition-part
represent the amount of deviation (or degree of violation) in the monitored parameters from the
normal values (0-3) and the action-part represent a specific response (0-7) according to the type
of attack. Figure 2 shows different components of the prototype system. The data fusion
module combines (discussed in section 3.2.2) the parameter values and put as an input template
to the classifier system.
Decision
Support
Subsystem
The degree of importance of each level (of monitored parameters) is hypothesized based on
the domain knowledge. The purpose is to generate rules from a general knowledge base
designed by experts. Though the accuracy of this knowledge base will result in more realistic
actions, the heuristic rule set that we used can provide similar detection ability. Table 2, gives
an example of such a knowledge base, a higher value of action indicates stronger response.
Table 2. Estimates of affects of intrusive activities at different levels and proposed actions.
Hypothesis User System Process Packet Action
Number Level Level Level Level
1 0.2 0.0 0.0 0.1 1
2 0.4 0.0 0.0 0.4 3
3 0.0 0.1 0.2 0.8 6
Symbolically, each hypothesis is a 5-tuple (k1, k2, k3, k4, a), where ki∈[0.0, 1.0] and a∈
{0,…,7}. Suppose that the input template to the classifier contain the message (m1,m2,m3,m4),
this messages match the hypothesis rule (k1,k2,k3,k4,a) if and only if:
k1 <= m1 and k2 <= m2 and k3 <= m3 and k4 <= m4,
That is, the condition part of a hypothesis expresses minimum values of deviation at different
level.
Accordingly, if we have two hypotheses: K =(k1,k2,k3,k4,ak) and L=(l1,l2,l3,l4,al), and for
i=1,2,3,4 ki <= li. That means that hypothesis L is more specific than hypothesis K, therefore,
any message that matches L will match K. In order to maintain the consistency of the knowledge
base, the action of the hypothesis L (al) has to be stronger than the action of the hypothesis K
(ak): al >= ak.
This characteristic gives to the knowledge base a hierarchical structure and increases the
robustness of the system.
Classifier systems are dynamical systems that evolve stimulus-response rules or classifiers
(using Genetic Algorithms), each of which can interact with the problem-solving environment.
A classifier system receives inputs (as messages) from the monitored environment in the form
of binary strings and determines an action (as output). The environment then provides some
feedback in terms of a scalar reinforcement for the action and the classifier system updates that
rule’s strength accordingly. When a classifier system gets a message from the environment it
places that message on a message board, it then searches the classifier list to find a rule that
matches the message. If there exist a number of competing rules, the winner is selected based on
the strength of the rules (tie is broken probabilistically). However, if no rule in the classifier list
matches the current message, the classifier system evolves new classifiers using genetic
algorithms. A classifier rule has the form:
<condition> : <action>
The condition part is a string from the alphabet {‘0’, ’1’, ‘#’}, where ‘#’ is a wildcard that
can match either ‘0’ or ‘1’. The action part indicates the suggested response if the rule is
activated, in our case, 7 possible actions are considered. Here is an example of a rule:
Figure 3 illustrates the use of the Classifier-Based Decision Support Subsystem. When an input
arrives to the subsystem (information about of the level of deviation of the different parameters),
it goes to the message list; then, the Rule Selection Module determines the matching rule using
the bucket brigade algorithm, and finally the action of the winner rule is sent to the
environment.
Messagge List
Rule Base
Input 10010… : action 1 Rule Selection Output
111001010….
101010101001… 10001… : action 5 Module Action 5
110101010….
(bucket brigade
100101001…. 00010… : action 2
algorithm)
Genetic Rule
Generation Module
Classifier System
Knowledge Base
0.1 0.3 0.4 0.1 1
0.4 0.8 0.2 0.1 2
Population
Genetic
Operators 1##10#01..
Design 100##101..
Parameters Best Rules Set
1##10#01… : 5
Fitness
100##101… : 4
Evaluation
Knowledge Base
0.1 0.3 0.4 0.1 1
0.4 0.8 0.2 0.1 2
Sequential
Niching
Technique
While evolving the rule set in the classifier system, the genetic algorithm evaluates each rule
to determine a fitness value based on the performance (as shown in figure 4). To generate a
diverse set of rules, we use a sequential niching technique, which is discussed in a subsequent
section.
Parameters U1 U2 U3 U4 U5 S1 S2 S3 S4 S5 P1 P2 P3 P4 P5 N1 N2 N3 N4 N5
An Example 00 11 10 11 11 10 01 00 11 10 00 01 10 11 11 10 11 10 01 00
Fig. 5. Chromosome layout (in binary) -- encoding different monitored parameters at multiple levels.
P1 P2 P3 P4
1
P1 ….. P15 2
P1 ….. P25 P13
….. P35 4
P1 ….. P45
c1 c2 … c9 c10 c11 c12 … c19 c20 c21 c22 … c29 c30 c31 c32 … c39 c40
where,
Pi : Portion of the chromosome corresponding to the level i (i=1 user; i=2 system; i=3
process; i=4 packet)
Pij : Portion of the chromosome corresponding to the parameter j of the level i
(i=1,..,4)
For designing the fitness function we also need the following definitions:
Min(Pij) : Minimum value that Pij can match, as shown in table 3.
i
1 5 Min( Pj )
MinValue( P ) = ∑i
(1)
5 j =1 3
1 4
KnowledgeDist (C ) = ∑
4 i =1
| k i − MinValue( P i ) | (2)
The Generality function measures the difference between the number of wildcards (‘#”) at
each level in the rule compare to the optimal value (optNumWC), where GenCoef is a value
between 0.0 and 1.0 and specifies the importance of the generality on the evaluation of the
fitness.
To appear in Lecture Notes in Computer Science (publisher: Springer-Verlag) as the proceedings of
International Workshop on Mathematical Methods, Models and Architectures for Computer Networks
Security (MMM-ACNS), May 21-23, 2001, St. Petersburg, Russia.
1 4
Generality (C ) = GenCoef * ∑ | OptNumWC − NumWC ( P i ) |
4 i =1
(3)
The fitness is now calculated by subtracting the KnowledgeDis and the Generality values
from 2, since the maximum fitness value is considered as 2.0.
Fitness (C ) = 2 − KnowledgeDist (C ) − Generality (C ) (4)
where,
M 0 = Fitness ( f )
M i = M i −1 * G ( f , bi )
(6)
(dist ( f , bi ) / R) alpha if dist ( f , bi ) < R
G ( f , bi ) =
1 otherwise
R and alpha are the parameters of the sequential niching algorithm.
We performed several experiments with different data sets in order to test the performance of
the classifier-based decision support system. These experiments are conducted in a simulated
laboratory environment. In all experiments, we used the high level knowledge (as discussed in
section 3.1) based on the following hypothesis:
1. If U>= 0.2 or S>= 0.2 or P>= 0.2 or N>= 0.2 then execute action 1
2. If N>= 0.5 or U>= 0.5 or P>= 0.5 then execute action 2
3. If (U>= 0.4 and S>= 0.4) or (P>= 0.4 and N>= 0.4) or (P>= 0.4 and N>= 0.4) then action 3
4. If U>= 0.8 or S>= 0.8 or P>= 0.8 or N>= 0.8 then execute action 4
5. If (U>= 0.4 and P>= 0.6) or (S>= 0.4 and N>= 0.6) then execute action 5
6. If (U>= 0.5 and S>= 0.6) or (P>= 0.5 and N>= 0.6) then execute action 6
7. If (U>= 0.6 and S>= 0.6 and P>= 0.6) or (N>= 0.6 and P>= 0.6) then execute action 7
8. If U>= 0.7 and S>= 0.7 and P>= 0.7 and N>= 0.7 then execute action 8
Some of the above hypotheses are experimented with to generate a set of classifier rules. In
particular, the first four hypotheses are used which generated 1440 rules from 240 possibilities,
however, rules for other hypotheses can be evolved in a similar fashion. The goal of the
experiment is to test the performance of this evolved rule set to detect any arbitrary input
pattern. To accomplish this, we generated a random list of 40-bit messages (representing values
of monitored parameters) as input to the classifier system. For each message, we compared the
output of the classifier (an action) with the action suggested for the knowledge base
(hypothesis).
Accordingly, the system is tested with simulated data (10.000 messages) having different
types of deviation to examine the coverage and robustness in detection. In our preliminary
experiments four types of actions were chosen. The results of these experiments are shown in
the Table 4.
Table 4. Experimental results showing the performance of the classifier based decision supprot system.
The shaded cells represent the correct answers of the system.
Classifier System Output
Correct 1 2 3 4
Action
1 78% 22% 0% 1%
2 2% 96% 0% 2%
3 7% 38% 53% 3%
4 0% 4% 17% 79%
In table 4, the first row (correct action 1) establishes that the classifier system identified
action 1 on the 78% of the test messages, action 2 on the 22% of the messages, action 3 on the
0% of the messages and action 4 on the 1% of the messages. The higher values on the diagonal
indicate that the classifier system could correctly determine the appropriate action in most cases.
Figure 6 also reflects that in most of the test cases, the classifier rules depicted the correct
action.
100%
80%
60% 1
40% 2
20% 3
0% 4
1 4
2 3 Correct
Classifier 3 2
4 1 Action
System
Output
However, the higher level of incorrect responses was for the action 3, where the classifier
system outputs 53% of the cases action 3, and 38% of the cases action 2. This fact indicates that
a high number of rules are necessary to represent more specific rules.
We are currently experimenting to evolve a comprehensive set of rules based on the
significance of monitored parameters at the various levels. Accordingly, we predefined eight
different action types, but it is the classifier system, which will choose the appropriate action
depending on the input message. The classifier decision is to be passed to response agents to
undertake one or more of the following actions (A0 -- A7):
A0. Take no action
A1. Informing the system administrator via e-mail or messaging system
To appear in Lecture Notes in Computer Science (publisher: Springer-Verlag) as the proceedings of
International Workshop on Mathematical Methods, Models and Architectures for Computer Networks
Security (MMM-ACNS), May 21-23, 2001, St. Petersburg, Russia.
A2. Change the priority of user processes
A3. Change access privileges of certain user
A4. Block a particular IP address or sender
A5. Disallow establishing a remote connection request
A6. Termination of existing network connection
A7. Restarting of a particular machine
The results reported (in table 4 and figure 6) are the part of our on-going research in
developing intelligent decision support subsystems for intrusion detection. We expect some
interesting classifier rules to evolve that are more meaningful, for example,
• If connection is external and CPU usage > Threshold3 and free memory significantly low
swap space is minimum then perform Action A4
• If connection is external and average number of packets received >Threshold2 and duration
of the connection >Threshold 2 then perform Action A6
• If number of connections > Threshold1 and number of processes running > Threshold1 and
amount of swap space in Minimum then perform Action A5
• If connection is internal and CPU usage > Threshold1 and memory usage > Threshold2 and
average number of packets sent/ received > Threshold1 respectively then perform Action A1
• If CPU usage > Threshold1 and Memory usage is > Threshold2 and number of blocked
processes > Threshold2 then perform Action A2
6. Concluding Remarks
Most existing intrusion detection systems either use packet-level information or user
activities to make decisions on intrusive activities [6], [9], [10]. In this paper, we described an
intrusion detection system that can simultaneously monitor network activities at different levels
(such as packet level, process level system level and user level), it can detect both inside misuse
and outside attacks. The main emphasis of this work is to examine the feasibility of using a
classifier-based intelligent decision support subsystem for robust intrusion detection.
The proposed system has some unique features of simultaneous monitoring at multi-level to
detect both known and unknown intrusions and generate specific response. The developed
system will perform real-time monitoring, analyzing, and generating appropriate response to
intrusive activities. This work is a part of a larger research project on developing an intelligent
intrusion detection system [2]. In this paper, we emphasized on the design and implementation
of classifier system as decision support component. We are currently experimenting in a
simulated environment as a part of an early development. We anticipate that a more effective
and practical rule base will emerge after the implementation and observation of the network
activity over a period of time during the testing phase.
References
1. Crosbie, M., Spafford, G.: Applying Genetic Programming to Intrusion Detection. COAST
Laboratory, Purdue University, (1997) (also published in the proceeding of the Genetic
Programming Conference).
2. Dasgupta, D.: Immunity-Based Intrusion Detection System: A General Framework. In the
proceedings of the National Information Systems Security Conference, October, (1999)
3. Frank, J.: Artificial Intelligence and Intrusion Detection: Current and future directions. In
Proceedings of the 17th National Computer Security Conference, October, (1994)
4. Balasubramaniyan, J., Fernandez, J.O.G., Isacoff, D., Spafford, E., Zamboni, D.: An
Architecture for Intrusion Detection using Autonomous Agents, COAST Technical report 98/5,
Purdue University, (1998)
5. Crosbie, M., Spafford, E.: Defending a computer system using autonomous agents. In
Proceedings of the 18th National Information Systems Security Conference, October, (1995)
6. Me, L., GASSATA,: A Genetic Algorithm as an Alternative Tool for Security Audit Trail
Analysis. in Proceedings of the First International Workshop on the Recent Advances in
Intrusion Detection, Louvain-la-Neuve, Belgium, September, (1998) 14-16
7. Zhang, Z., Franklin, S., Dasgupta D.: Metacognition in Software Agents using Classifier
Systems. In the proceedings of the National Conference on Artificial Intelligence (AAAI),
Madison, July, (1998) 14-16
8. Boer, B.: Classifier Systems, A useful approach to machine learning? Masters thesis, Leiden
University, August 31, (1994)
9. Mukherjee, B., Heberline, L.T., Levit, K.: Network Intrusion Detection. IEEE Network (1994)
10. Axelsson, S., Lindqvist, U., Gustafson, U., Jonsson, E.: An Approach to UNIX security Logging,
Technical Report IEEE Network (1996)
11. Lunt, T.F.: Real-Time Intrusion Detection. Technical Report Computer Science Journal (1990)
12. Debar, H., Dacier, M., Wepspi, A.: A Revised Taxonomy for Intrusion Detection Systems.
Technical Report Computer Science/Mathematics (1999)
13. Goldberg, D.E.: Genetic Algorithms in Search, Optimization & Machine Learning. Addison–
Wesley, Reading, Mass. (1989)
14. Back, T., Fogel, D.B., Michalewicz, Z.: Handbook of Evolutionary computation. Institute of
Physics Publishing and Oxford university press (1997)
15. Dasgupta, D., Michalewicz, Z. (eds): Evolutionary Algorithms in Engineering and Applications.
Springer-Verlag (1997)