You are on page 1of 12

The following paper was originally published in the

Proceedings of the 8th USENIX Security Symposium


Washington, D.C., USA, August 23–26, 1999

A S T U D Y I N US I N G NE U R A L NE T W O R K S FO R
A N O M A L Y A N D MI S U S E DE T E C T I O N

Anup K. Ghosh and Aaron Schwartzbard

THE ADVANCED COMPUTING SYSTEMS ASSOCIATION

© 1999 by The USENIX Association


All Rights Reserved
For more information about the USENIX Association:
Phone: 1 510 528 8649 FAX: 1 510 548 5738
Email: office@usenix.org WWW: http://www.usenix.org
Rights to individual papers remain with the author or the author's employer. Permission is granted for noncommercial
reproduction of the work for educational or research purposes. This copyright notice must be included in the reproduced paper.
USENIX acknowledges all trademarks herein.
A Study in Using Neural Networks for Anomaly and Misuse
Detection
Anup K. Ghosh & Aaron Schwartzbard
Reliable Software Technologies
21351 Ridgetop Circle, Suite 400
Dulles, VA 20166
poc:aghosh@rstcorp.com
http://www.rstcorp.com

Abstract tion tools against such a large corpus of data. The


ndings from this study indicate that a fundamen-
tal paradigm shift in intrusion detection research
Current intrusion detection systems lack the abil- is necessary to provide reasonable levels of detec-
ity to generalize from previously observed attacks tion against novel attacks and even variations of
to detect even slight variations of known attacks. known attacks. Central to this goal is the ability
This paper describes new process-based intrusion to generalize from previously observed behavior to
detection approaches that provide the ability to gen- recognize future unseen, but similar behavior. To
eralize from previously observed behavior to rec- this end, this paper describes a study in using neu-
ognize future unseen behavior. The approach em- ral networks for both anomaly detection and misuse
ploys arti cial neural networks (ANNs), and can be detection.
used for both anomaly detection in order to detect
novel attacks and misuse detection in order to detect Research in intrusion detection research has begun
known attacks and even variations of known attacks. to shift from analyzing user behavior to analyzing
These techniques were applied to a large corpus of process behavior. Initial work in analyzing pro-
data collected by Lincoln Labs at MIT for an intru- cess behavior has already shown promising results
sion detection system evaluation sponsored by the in providing very high levels of detection against
U.S. Defense Advanced Research Projects Agency certain classes of attacks. In particular, process-
(DARPA). Results from applying these techniques based anomaly detection approaches have shown
for both anomaly and misuse detection against the very good performance against novel attacks that
DARPA evaluation data are presented. result in unauthorized local access and attacks that
result in elevated privileges | a vulnerable area for
most intrusion detection tools [6]. In spite of the
good detection capability of process-based anomaly
1 Introduction detection approaches, the results indicate high rates
of false alarms that can make these tools unusable
for the practical security administrator. Current
Results from a recent U.S. Defense Advanced Re- wisdom is that false alarm rates must be reduced
search Projects Agency (DARPA) study highlight to the level of one to two false alarms per day in
the strengths and weaknesses of current research ap- order to make the system usable by administrators.
proaches to intrusion detection. The DARPA scien-
ti c study is the rst of its kind to provide inde- One of the largest challenges for today's intrusion
pendent third party evaluation of intrusion detec- detection tools is being able to generalize from pre-
 This work was funded by the Defense Advanced Research viously observed behavior (normal or malicious) to
Projects Agency (DARPA) under Contract DAAH01-97-C- recognize similar future behavior. This problem
R095. the views and conclusions contained in this doc- is acute for signature-based misuse detection ap-
ument are those of the authors and should not be
interpreted as representing the official policies, ei-
proaches, but also plagues anomaly detection tools
ther expressed or implied, of the defense advanced
that must be able to recognize future normal behav-
research projects agency or the u.s. government. ior that is not identical to past observed behavior,
in order to reduce false positive rates. Misuse detection techniques attempt to model at-
tacks on a system as speci c patterns, then system-
To address this shortcoming, we utilize a simple neu- atically scan the system for occurrences of these pat-
ral network that can generalize from past observed terns. This process involves a speci c encoding of
behavior to recognize similar future behavior. In the previous behaviors and actions that were deemed
past, we have applied backpropagation networks in intrusive or malicious. The earliest misuse detec-
addition to other neural networks with good per- tion methods involved o -line analysis of audit trails
formance to the problem of anomaly detection [8]. normally recorded by host machines. For instance,
Here we present using a neural network for both a security oÆcer would manually inspect audit trail
anomaly and misuse detection. The approach is log entries to determine if failed root login attempts
evaluated against the DARPA intrusion detection were recorded. Manual inspection was quickly re-
evaluation data. placed by automated analysis tools that would scan
these logs based on speci c patterns of intrusion.
Misuse detection approaches include expert systems
[15, 3], model-based reasoning [13, 7], state tran-
2 Prior Art in Intrusion Detection sition analysis [23, 12, 11, 21], and keystroke dy-
namics monitoring [20, 13]. Today, the vast major-
ity of commercial and research intrusion detection
Some of the earliest work in intrusion detection was tools are misuse detection tools that identify attacks
performed by Jim Anderson in the early 1980s [1]. based on attack signatures.
Anderson de nes an intrusion as any unauthorized
attempt to access, manipulate, modify, or destroy It is important to establish the key di erences be-
information, or to render a system unreliable or un- tween anomaly detection and misuse detection ap-
usable. Intrusion detection attempts to detect these proaches. The most signi cant advantage of misuse
types of activities. In this section we establish the detection approaches is that known attacks can be
foundations of intrusion detection techniques in or- detected fairly reliably and with a low false positive
der to determine where they are strong and where rate. Since speci c attack sequences are encoded
they need improvement. into misuse detection systems, it is very easy to de-
termine exactly which attacks, or possible attacks,
the system is currently experiencing. If the log data
2.1 Anomaly detection vs. misuse de- does not contain the attack signature, no alarm is
tection raised. As a result, the false positive rate can be
reduced very close to zero. However, the key draw-
back of misuse detection approaches is that they
Intrusion detection techniques are generally clas- cannot detect novel attacks against systems that
si ed into two categories: anomaly detection and leave di erent signatures. So, while the false pos-
misuse detection. Anomaly detection assumes that itive rate can be made extremely low, the rate of
misuse or intrusions are highly correlated to ab- missed attacks (false negatives) can be extremely
normal behavior exhibited by either a user or the high depending on the ingenuity of the attackers. As
system. Anomaly detection approaches must rst a result, misuse detection approaches provide little
baseline the normal behavior of the object being defense against novel attacks, until they can learn
monitored, then use deviations from this baseline to generalize from known signatures of attacks.
to detect possible intrusions. The initial impe-
tus for anomaly detection was suggested by An- Anomaly detection techniques, on the other hand,
derson in his 1980 technical report when he noted directly address the problem of detecting novel at-
that intruders can be detected by observing depar- tacks against systems. This is possible because
tures from established patterns of use for individual anomaly detection techniques do not scan for spe-
users. Anomaly detection approaches have been im- ci c patterns, but instead compare current activities
plemented in expert systems that use rules for nor- against statistical models of past behavior. Any ac-
mal behavior to identify possible intrusions [15], in tivity suÆciently deviant from the model will be
establishing statistical models for user or program agged as anomalous, and hence considered as a
pro les [6, 4, 22, 19, 16, 18, 17], and in using ma- possible attack. Furthermore, anomaly detection
chine learning to recognize anomalous user or pro- schemes are based on actual user histories and sys-
gram behavior [10, 5, 2, 14]. tem data to create its internal models rather than
pre-de ned patterns. Though anomaly detection TCP/IP data was collected using a network snif-
approaches are powerful in that they can detect fer and host machine audit data was collected using
novel attacks, they have their drawbacks as well. Sun Microsystem's Solaris Basic Security Module
For instance, one clear drawback of anomaly de- (BSM). In addition, dumps of the le system from
tection is its inability to identify the speci c type one of the Solaris hosts were provided. This data
of attack that is occurring. However, probably the was distributed to participating project sites in two
most signi cant disadvantage of anomaly detection phases: training data and test data. The training
approaches is the high rates of false alarm. Because data is data labeled as normal or attack and is used
any signi cant deviation from the baseline can be by the participating sites to train their respective
agged as an intrusion, non-intrusive behavior that intrusion detection systems. Once trained, the test
falls outside the normal range will also be labeled data is distributed to participating sites in unlabeled
as an intrusion | resulting in a false positive. An- form. That is, the participating sites do not know
other drawback of anomaly detection approaches is a priori which data in the test data is normal or
that if an attack occurs during the training period attack. The data is analyzed o -line by the partici-
for establishing the baseline data, then this intru- pating sites to determine which sessions are normal
sive behavior will be established as part of the nor- and which constitute intrusions. The results were
mal baseline. In spite of the potential drawbacks sent back to MIT's Lincoln Labs for evaluation.
of anomaly detection, having the ability to detect
novel attacks makes anomaly detection a requisite The attacks were divided into four categories: de-
if future, unknown, and novel attacks against com- nial of service, probing/surveillance, remote to local,
puter systems are to be detected. and user to root attacks. Denial of service attacks
attempt to render a system or service unusable to
legitimate users. Probing/surveillance attacks at-
2.2 Assessing the Performance of Cur- tempt to map out system vulnerabilities and usu-
rent IDSs ally serve as a launching point for future attacks.
Remote to local attacks attempt to gain local ac-
count privilege from a remote and unauthorized ac-
In 1998, the U.S. Defense Advanced Research count or system. User to root attacks attempt to
Projects Agency (DARPA) initiated an evaluation elevate the privilege of a local user to root (or super
of its intrusion detection research projects.1 To user) privilege. There were a total of 114 attacks
date, it is the most comprehensive scienti c study in 2 weeks of test data including 11 types of DoS
known for comparing the performance of di erent attacks, 6 types of probing/surveillance attacks, 14
intrusion detection systems (IDSs). MIT's Lincoln types of remote to local attacks, 7 types of user to
Laboratory set up a private controlled network en- root attacks, and multiple instances of all types of
vironment for generating and distributing sni ed attacks.
network data and audit data recorded on host ma-
chines. Network traÆc was synthesized to repli- The attacks in the test data were also categorized as
cate normal traÆc as well as attacks seen on ex- old versus new and clear versus stealthy. An attack
ample military installations. Because all the data is labeled as old if it appeared in the training data
was generated, the laboratory has a priori knowl- and new if it did not. When an attempt was made to
edge of which data is normal and which is attack veil an attack, it was labeled as stealthy, otherwise
data. The simulated network represented thousands it was labeled as clear.
of internal Unix hosts and hundreds of users. Net-
work traÆc was generated to represent the following The reason we present this evaluation study is be-
types of services: http, smtp, POP3, FTP, IRC, tel- cause we believe it to represent the true state of
net, X, SQL/telnet, DNS, nger, SNMP, and time. the art in intrusion detection research. As such, it
This corpus of data is the most comprehensive set represents the foundation of more than 10 years of
known to be generated for the purpose of evaluating intrusion detection research upon which all future
intrusion detection systems and represents a signif- work in intrusion detection should improve. From
icant advancement in the scienti c community for this study, we can learn the strengths of current
independently and scienti cally evaluating the per- intrusion detection approaches, and more impor-
formance of any given intrusion detection system. tantly, their weaknesses. Rather than identifying
1 See www.ll.mit.edu/IST/ideval/index.html for a sum-
which systems performed well and which did not,
mary of the program.
we simply summarize the results of the overall best
combination system. trusion detection tools can be further divided into
network-based or host-based intrusion detection.
Lincoln Laboratory reported that if the best per- The distinction is useful because network-based in-
forming systems against all four categories of at- trusion detection tools usually process completely
tacks were combined into a single system, then di erent data sets and features than host-based in-
roughly between 60 to 70 percent of the attacks trusion detection. As a result, the types of attacks
would have been detected with a false positive rate that are detected with network-based intrusion de-
of lower than 0.01%, or lower than 10 false alarms tection tools are usually di erent than host-based
a day. This result summarizes the combination of intrusion detection tools. Some attacks can be de-
best systems against all of the attacks simulated in tected by both network-based and host-based IDSs,
the data. It shows that even in the best case sce- however, the \sweet spots", or the types of attacks
nario over 30% of the simulated attacks would go each is best at detecting, are usually distinct. As
by undetected. However, the good news is that the a result, it is diÆcult to make direct comparisons
false alarm rate is acceptably low | low enough between the performance of a network-based IDS
that the techniques can scale well to large sites with and a host-based IDS. A useful corollary of distinct
lots of traÆc. The bad news is that with over 30% sweetspots, though, is that in combination both
of attacks going undetected in the best combination techniques are more powerful than either one by it-
of current intrusion detection systems, the state of self.
the art in intrusion detection does not adequately
address the threat of computer-based attacks. Recent research in intrusion detection techniques
has shifted the focus from user-based intrusion
Further analysis showed that most of the systems detection to process-based intrusion detection.
reliably detected old attacks that occurred within Process-based monitoring intrusion detection tools
the training data with low false alarm rates. These analyze the behavior of executing processes for pos-
results apply primarily to the network-based intru- sible intrusive activity. The premise of process mon-
sion detection systems that processed the TCP/IP itoring for intrusion detection is that most computer
data. This result is encouraging, but not too sur- security violations are made possible by misusing
prising since most of the evaluated systems were programs. When a program is misused its behav-
network-based misuse detection systems. The re- ior will di er from its normal usage. Therefore, if
sults were mixed in detecting new attacks. In two the behavior of a program can be adequately cap-
categories of attacks, probing/surveillance and user tured in a compact representation, then the behav-
to root attacks, the performance in detecting new ioral features can be used for intrusion detection.
attacks was comparable to detecting old attacks. In
the other two categories | denial of service and re- Two possible approaches to monitoring process be-
mote to local attacks | the performance of the top havior are: instrumenting programs to capture their
three network-based intrusion systems was roughly internal states or monitoring the operating system
20% detection for new denial of service attacks and to capture external system calls made by a program.
less than 10% detection for new remote to local at- The latter option is more attractive in general be-
tacks. Thus, the results show that the best of to- cause it does not require access to source code for
day's network-based intrusion detection systems do instrumentation. As a result, analyzing external
not detect novel denial of service attacks nor novel system calls can be applied to commercial o the
remote to local attacks | arguably two of the most shelf (COTS) software directly. Most modern day
concerning types of attacks against computer sys- operating systems provide built-in instrumentation
tems today. hooks for capturing a particular process's system
calls. On Linux and other variants of Unix, the
strace(1) program allows one to observe system
calls made by a monitored process as well as their
3 Monitoring Process Behavior for return values. On Sun Microsystem's Solaris op-
Intrusion Detection erating system, the Basic Security Module (BSM)
produces an event record for individual processes.
BSM recognizes 243 built-in system signals that can
In the preceding section, intrusion detection meth- be made by a process. Thus, on Unix systems, there
ods were categorized into either misuse detection is good built-in support for tracing processes' exter-
or anomaly detection approaches. In addition, in- nally observable behavior. Windows NT currently
lacks a built-in auditing facility that provides such ularity from a counter for each xed window of sys-
ne-grain resolution of program behavior. tem calls to a counter for the number of windows
that are anomalous. Thresholds are applied at each
Most process-based intrusion detection tools are level to determine at which point anomalous behav-
based on anomaly detection. A normal pro le for ior is propagated up to the next level. Ultimately,
program behavior is built during the training phase if enough windows of system calls in a program are
of the IDS by capturing the program's system calls deemed anomalous, the program behavior during a
during normal usage. During the detection phase, particular session is deemed anomalous, and an in-
the pro le of system calls captured during on-line trusion detection ag is raised.
usage is compared against the normal pro le. If
a signi cant deviation from the normal pro le is The results from the study showed a high rate of de-
noted, then an intrusion ag is raised. tection, if not a low false positive rate [9]. Despite
the simplicity of the approach and the high levels
Early work in process monitoring was pioneered by of detection, there are two main drawbacks to the
Stephanie Forrest's research group out of the Uni- equality matching approach: (1) large tables of pro-
versity of New Mexico. This group uses the anal- gram behavior must be built for each program, and
ogy of the human immune system to develop intru- (2) the equality matching approach does not have
sion detection models for computer programs. As in the ability to recognize behavior that is similar, but
the human immune system, the problem of anomaly not identical to past behavior. The rst problem
detection can be characterized as the problem of becomes an issue of storage requirements for pro-
distinguishing between self and dangerous non-self gram behavior pro les and is also a function of the
[6]. Thus, the intrusion detection system needs to number of programs that must be monitored. The
build an adequate pro le of self behavior in order second problem results from the inability of the al-
to detect dangerous behavior such as attacks. Us- gorithm to generalize from past observed behavior.
ing strace(1) on Linux, the UNM group analyzed The problem is that behavior that is normal, yet
short sequences of system calls made by programs slightly di erent from past recorded behavior, will
to the operating system [6]. be recorded as anomalous. As a result, the false
positive rate could be arti cially elevated. Instead,
More recently, a similar approach was employed by it is desirable to be able to recognize behaviors that
the authors in analyzing BSM data provided under are similar to normal, but not necessarily identi-
the DARPA 1998 Intrusion Detection Evaluation cal to past normal behavior as normal. Likewise,
program [9]. The study compiled normal behavior the same can be said for a misuse detection system.
pro les for approximately 150 programs. The pro le Many misuse detection systems are trained to recog-
for each program is stored in a table that consists nize attacks based on exact signatures. As a result,
of short sequences of system calls. During on-line slight variations among a given attack can result
testing, short sequences of system calls captured by in missed detections, leading to a lower detection
the BSM auditing facility are looked up in the table. rate. It is desirable for misuse detection systems to
This approach is known as equality matching. That be able to generalize from past observed attacks to
is, if an exact match of the sequence of system calls recognize future attacks that are similar.
captured during on-line testing exists in the pro-
gram's table, then the behavior is considered nor- To this end, the research described in the rest of
mal. Otherwise an anomaly counter is incremented. the paper employs neural networks to generalize
from previously observed behavior. We develop an
The data is partitioned into xed-size windows in anomaly detection system that uses neural networks
order to exploit a property of attacks that tends to to learn normal behavior for programs. The trained
leave its signature in temporally co-located events. network is then used to detect possibly intrusive
That is, attacks tend to cause anomalous behavior behavior by identifying signi cant anomalies. Sim-
to be recorded in groups. Thus, rather than averag- ilarly, we developed a misuse detection system to
ing the number of anomalous events recorded over learn the behavior of programs under attack sce-
the entire execution trace (which might wash out an narios. This system is then used to detect future
attack in the noise), a much smaller size window of attacks against the system. The goal of these ap-
events is used for counting anomalous events. proaches is to be able to recognize known attacks
and detect novel attacks in the future. By using
Several counters are kept at varying levels of gran- the associative connections of the network, we can
generalize from past observed behavior to recognize perceptron network was implemented: a backprop-
future similar behavior. A comparison of the two agation neural network. The backpropagation net-
systems against the DARPA intrusion data is pro- work has been used successfully in other intrusion
vided in Section 5. detection studies [10, 2]. The backpropagation net-
work, or backprop, is a standard feedforward net-
work. Input is submitted to the network and the
activations for each level of neurons are cascaded
4 Using Neural Networks for Intru- forward.
sion Detection
Our previous research in intrusion detection with
BSM data used an equality matching technique to
Applying machine learning to intrusion detection look up currently observed program behavior that
has been developed elsewhere as well [5, 2, 14]. Lane had been previously stored in a table. While the
and Brodley's work uses machine learning to dis- results were encouraging, we also realized that the
tinguish between normal and anomalous behavior. equality matching approach had no possibility of
However, their work is di erent from ours in that generalizing from previously observed behavior. As
they build user pro les based on sequences of each a result, we are pursuing research in using arti cial
individual's normal user commands and attempt to neural networks to accomplish the same goals, al-
detect intruders based on deviations from the es- beit with better performance. Speci cally, we are
tablished user pro le. Similarly, Endler's work [5] interested in the capability of ANNs to generalize
used neural networks to learn the behavior of users from past observed behavior to detect novel attacks
based on BSM events recorded from user actions. against systems. To this end, we constructed two
Rather than building pro les on a per-user basis, di erent ANNs: one for anomaly detection and one
our work builds pro les of software behavior and at- for misuse detection.
tempts to distinguish between normal software be-
havior and malicious software behavior. The advan- To use the backprop networks, we had to address
tages of our approach are that vagaries of individual ve major issues: how to encode the data for in-
behavior are abstracted because program behavior put to the network, what network topology should
rather than individual usage is studied. This can be be used, how to train the networks, how to perform
of bene t for defeating a user who slowly changes anomaly detection with a supervised training algo-
his or her behavior to foil a user pro ling system. It rithm, and what to do with the data produced by
can also protect the privacy interests of users from the neural network.
a surveillance system that monitors a user's every
move. Encoding the data to be used with the neural net-
work is in general, a diÆcult problem. Previous ex-
The goal in using arti cial neural networks (ANNs) periments indicated that strings of six consecutive
for intrusion detection is to be able to generalize BSM events carried enough implicit information to
from incomplete data and to be able to classify on- be accurately distinguished as anomalous or nor-
line data as being normal or intrusive. An arti - mal for programs in general. One possible encod-
cial neural network is composed of simple processing ing technique was simply to enumerate all observed
units, or nodes, and connections between them. The strings of six BSM events, and use the enumera-
connection between any two units has some weight, tion as an encoding. However, part of the motiva-
which is used to determine how much one unit will tion of using neural nets was their ability to classify
a ect the other. A subset of the units of the net- novel inputs based on similarity to known inputs.
work acts as input nodes, and another subset acts A simple enumeration will fail to capture informa-
as output nodes. By assigning a value, or activation, tion about the strings. Therefore, a neural net will
to each input node, and allowing the activations to be less likely to be able to correctly classify novel
propagate through the network, a neural network inputs. In order to capture the necessary informa-
performs a functional mapping from one set of val- tion in the encoding, we devised a distance met-
ues (assigned to the input nodes) to another set of ric for strings of events. The distance metric took
values (retrieved from the output nodes). The map- into account the events common to two strings, as
ping itself is stored in the weights of the network. well as the di erence in positions of common events.
To encode a string of data, the distance metric was
In this work, a classical feed-forward multi-layer used to measure the distance from the data string
to each of several \exemplar" strings. The encod- den nodes, 10 networks were initialized di erently,
ing then consisted of a set of measured distances. and trained. Therefore, for each program, 90 net-
A string could then be thought of as a point in a works were trained. To select which of the 90 to
space where each dimension corresponded to one of keep, each was tested on two weeks of data that
the exemplar strings, and the point is mapped in the were not part of the four weeks of data used for
space by plotting the distance from each dimension. training. The network that classi ed data most ac-
curately was kept.
Once an appropriate encoding method was devel-
oped, an appropriate network topology must be em-
ployed. We had to determine how many input and 4.1 Anomaly detection
output nodes were necessary, and if a hidden layer
was to be used, how many nodes should it con-
tain. Because we seek to determine whether an in- In order to train the networks, it is necessary to
put string is anomalous or normal, we use a single expose them to normal data and anomalous data.
continuously valued output node to represent the Randomly generated data was used to train the net-
extent to which the network believes the input is work to distinguish between normal and anomalous
normal or anomalous. The more anomalous the in- data. The randomly generated data, which were
put is, the closer to 1.0 the network computes its spread throughout the input space, caused the net-
output. Conversely, the closer to normal the input work to generalize that all data were anomalous by
is, the closer to 0.0, the output node computes. default. The normal data, which tended to be lo-
calized in the input space, caused the network to
The number of input nodes has to be equal to the recognize a particular area of the input space as
number of exemplar strings (since each exemplar non-anomalous.
produced a distance for input to the network). With
an input layer, a hidden layer, and an output layer, After training and selection, a set of neural networks
a neural network can be constructed to compute any was ready to be used. However, a neural network
arbitrarily complex function. Thus, a single hidden can only classify a single string (a sequence of BSM
layer was used in our networks. A di erent network events) as anomalous or normal, and our intention
must be constructed, tuned, and trained for each was to classify entire sessions (which are usually
program to be monitored, since what might have composed of executions of multiple programs) as
been quite normal behavior for one program might anomalous or normal. Furthermore, our previous
have been extremely rare in another. The number experiments showed that it is important to capture
of hidden nodes varied based on the performance of the temporal locality of anomalous events in order
each trained network. to recognize intrusive behavior. As a result, we de-
sired an algorithm that provides some memory of
During training, many networks were trained for recent events.
each program, and the network that performed the
best was selected. The remaining networks were The leaky bucket algorithm ts this purpose well.
discarded. Training involved exposing the networks The leaky bucket algorithm keeps a memory of re-
to four weeks of labeled data, and performing the cent events by incrementing a counter of the neu-
backprop algorithm to adjust weights. An epoch of ral network's output, while slowly leaking its value.
training consisted of one pass over the training data. Thus, as the network computes many anomalies, the
For each network, the training proceeded until the leaky bucket algorithm will quickly accumulate a
total error made during an epoch stopped decreas- large value in its counter. Similarly, as the network
ing, or 1,000 epochs had been reached. Since the computes a normal output, the bucket will \leak"
optimal number of hidden nodes for a program was away its anomaly counter back down to zero. As a
not known before training, for each program, net- result, the leaky bucket emphasizes anomalies that
works were trained with 10, 15, 20, 25, 30, 35, 40, are closely temporally co-located and diminishes the
50, and 60 hidden nodes. Before training, network values of those that are sparsely located.
weights were initialized randomly. However, initial
weights can have a large, but unpredictable, e ect Strings of BSM events are passed to a neural net-
on the performance of a trained network. In or- work in the order they occurred during program ex-
der to avoid poor performance due to bad initial ecution. The output of a neural network (that is,
weights, for each program, for each number of hid- the classi cation of the input string) is then placed
into a leaky bucket. During each timestep, the level sisted of 139 non-intrusive sessions, and 22 intrusive
of the bucket is decreased by a xed amount. If the sessions. Although it would have been preferable to
level in the bucket rises above some threshold at any use a larger number of intrusive sessions for testing,
point during execution of the program, the program there were so few intrusive sessions in the DARPA
is agged as anomalous. The advantage of the using data that all other intrusion data were used to train
a leaky bucket algorithm is that it allows occasional the misuse detection system.
anomalous behavior, which is to be expected during
normal system operation, but it is quite sensitive to The performance of any intrusion detection system
large numbers of temporally co-located anomalies, must account for both the detection ability and the
which one would expect if a program were really false positive rate. We observed both of these fac-
being misused. If a session contains a single anoma- tors while varying the leak rate used by the leaky
lous execution of a program, the session is agged bucket algorithm. A leak rate of 0 results in all prior
as anomalous. timesteps being retained in memory. A leak rate of
1 results in all timesteps but the current one being
forgotten. We varied the leak rate from 0 to 1.
4.2 Misuse detection
The performance of the IDS should by judged in
terms of both the ability to detect intrusions, and
Having developed a system for anomaly detection, by false positives|incorrect classi cation of secure
we chose to evaluate how well the same techniques behavior as insecure. We used receiver operating
could be applied to misuse detection. Our sys- characteristic (ROC) curves to compare intrusion
tem is designed to recognize some type of behavior. detection ability to false positives. A ROC curve is
Thus, it should not matter whether the behavior it a parametric plot, where the parameter is the sensi-
is learning was normal system usage, or attack be- tivity of the system to what it perceives to be inse-
havior. Aside from trivial changes to the way the cure behavior. The curve is a plot of the likelihood
leaky bucket is monitored, our system should not re- that an intrusion is detected, against the likelihood
quire any modi cation to perform misuse detection. that a non-intrusion is misclassi ed for a particular
Having made the trivial modi cation to the leaky parameter, such as a threshold. The ROC curve can
bucket, we tested our system as a misuse detector. be used to determine the performance of the system
for any possible operating point. The ROC curve
Unfortunately, two issues particular to our data-set allows the end user of an intrusion detection system
made misuse detection diÆcult. The rst issue was to assess the trade-o between detection ability and
a lack of data. In the DARPA data, there was be- false alarm rate in order to properly tune the system
tween two to three orders in magnitude less intru- for acceptable tolerances.
sion data than normal data. This made it quite
diÆcult to train networks to learn what constituted Di erent leak rates produced di erent ROC curves.
an attack. The second issue was related to the label- Figure 1 displays two ROC curves|one for a low
ing of intrusions. Intrusion data were labeled on a leak rate, and one for a high leak rate. For the leak
session-by-session basis. Whereas several programs rate of .2, to achieve detection better than 77:3%,
might be executed during an intrusive session, as one must be willing to accept a dramatic increase
few as one might be anomalous. Thus, while all in false positives. At 77:3% detection, the false pos-
data labeled non-intrusive could be assumed to be itive rate is only 3:6%. When the leak rate is .7, a
normal, not all data labeled intrusive could be as- detection rate of 77:3% can be achieved with a false
sumed to be anomalous. Despite these stumbling positive rate of only 2:2%.
blocks, we con gured our neural network system for
misuse detection. ROC curves were also produced for the performance
of our misuse detection system. While the perfor-
mance was not nearly as good as the anomaly de-
tection system in terms of false positives (which was
5 Experimental Results a high as 5% for even low sensitivity rates), the
misuse detection system displayed very high detec-
tion abilities|especially surprising due to the small
The anomaly and misuse detection systems were number of sessions used to train the system. As il-
tested on the same test data. The test data con- lustrated in Figure 2, with a leak rate of 0.7, the sys-
Anomaly Detection - Leak Rate: .2 Anomaly Detection - Leak Rate: .7
100 100

90

Percentage Of Intrusions Detected

Percentage Of Intrusions Detected


80 80

70

60 60

50

40 40

30

20 20

10

0 0
0 10 20 30 40 50 60 70 80 90 100 0 20 40 60 80 100

False Positive Percentage False Positive Percentage

Figure 1: Anomaly detection results for two di erent leak rates.


Misuse Detection - Leak Rate: .2 Misuse Detection - Leak Rate: .7
100 100
Percentage Of Intrusions Detected

Percentage Of Intrusions Detected


80 80

60 60

40 40

20 20

0 0
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100

False Positive Percentage False Positive Percentage

Figure 2: Misuse detection results for two di erent leak rates.

tem was able to detect as much as 90:9% of all intru- and can generalize from previously observed behav-
sions with a false positive rate of 18:7%. Other host- ior. Currently, the false positive rates are too high
based misuse detection systems can currently pro- to be practical for commercial users. In order to be
vide similar detection capabilities with lower false a useful tool, false positive rates need to be between
positive rates. Thus, this approach to misuse de- one and three orders of magnitude smaller. We con-
tection may not be suitable for detecting attacks tinue to investigate how to improve the performance
in comparison to signature-based approaches. How- of our neural networks. Tests of variations on our
ever, our technique demonstrated the ability of the techniques indicate that we have not yet achieved
system to detect novel attacks by generalizing from optimal performance.
previously observed behavior.

While that false positive rate is clearly unaccept-


able, it should be remembered that the misuse de- 6 Conclusions
tection system was trained on data which contained
not only intrusion data, but also normal data. This
would naturally lead this system to produce a large
amount of false positives. By eliminating the non- This paper began with an examination of current
intrusion data from the training data, it is believed intrusion detection systems. In particular, the
that signi cantly lower false positive rates could be DARPA 1998 intrusion detection evaluation study
achieved, without lowering the detection ability. found that novel attacks against systems are rarely
detected by most IDSs that use signatures for de-
The results of our experiments indicate that neural tection. On the other hand, well-known attacks for
networks are suited to perform intrusion detection which signatures from network data or host-based
intrusion data can be formed, perform very reli-
ably with low rates of false alarms. However, even Applications Conference (ACSAC'98), pages
slight variations of known attacks escape detection 268{279, Los Alamitos, CA, December 1998.
by signature-based IDSs. Similarly, program-based IEEE Computer Society, IEEE Computer So-
anomaly detection systems have performed very well ciety Press. Scottsdale, AZ.
in detecting novel attacks, albeit with high false
alarm rates. To overcome the problems in cur- [6] S. Forrest, S.A. Hofmeyr, and A. Somayaji.
rent misuse detection and anomaly detection ap- Computer immunology. Communications of the
ACM, 40(10):88{96, October 1997.
proaches, a key necessity for IDSs is the ability to
generalize from previously observed behavior to rec- [7] T.D. Garvey and T.F. Lunt. Model-based in-
ognize future similar behavior. This capbility will trusion detection. In Proceedings of the 14th
permit detection of variations of known attacks as National Computer Security Conference, Octo-
well as reduce false positive rates for anomaly-based ber 1991.
IDSs.
[8] A.K. Ghosh, A. Schwartzbard, and M. Schatz.
In this paper, we presented the application of a sim- Learning program behavior pro les for intru-
ple neural network to learning previously observed sion detection. In Proceedings of the 1st
behavior in order to detect future intrusions against USENIX Workshop on Intrusion Detection
systems. The results from our study show the viabil- and Network . USENIX Associa-
Monitoring
ity of our approach for detecting intrusions. Future tion, April 11-12 1999. To appear.
work will apply other neural networks more suited
toward the problem domain of analyzing tempo- [9] A.K. Ghosh, A. Schwartzbard, and M. Schatz.
ral characteristics of program traces. For instance, Using program behavior pro les for intrusion
applying recurrent, time delay neural networks to detection. In Proceedings of the SANS Intru-
program-based anomaly detection has proved to be sion Detection Workshop, February 1999. To

more successful than using backpropagation net- appear.


works for the same purpose [8]. Our next step is
to apply these networks to misuse detection as well. [10] A.K. Ghosh, J. Wanken, and F. Charron.
Detecting anomalous and unknown intrusions
against programs. In Proceedings of the 1998
Annual Computer Security Applications Con-

References ference (ACSAC'98) , December 1998.


[11] K. Ilgun. Ustat: A real-time intrusion detec-
[1] J.P. Anderson. Computer security threat moni- tion system for unix. Master's thesis, Computer
toring and surveillance. Technical Report Tech- Science Dept, UCSB, July 1992.
nical Report, James P. Anderson Co., Fort
Washington, PA, April 1980. [12] K. Ilgun, R.A. Kemmerer, and P.A. Porras.
State transition analysis: A rule-based intru-
[2] J. Cannady. Arti cial neural networks for mis- sion detection system. IEEE Transactions on
use detection. In Proceedings of the 1998 Na- Software Engineering, 21(3), March 1995.
tional Information Systems Security Confer-

ence (NISSC'98) , pages 443{456, October 5-8 [13] S. Kumar and E.H. Spa ord. A pattern match-
1998. Arlington, VA. ing model for misuse intrusion detection. The
[3] W.W. Cohen. Fast e ective rule induction. In COAST Project, Purdue University, 1996.
Machine Learning: Proceedings of the Twelfth
[14] T. Lane and C.E. Brodley. An application of
International Conference . Morgan Kaufmann, machine learning to anomaly detection. In Pro-
1995. ceedings of the 20th National Information Sys-

[4] P. D'haeseleer, S. Forrest, and P. Helman. An tems Security Conference , pages 366{377, Oc-
immunological approach to change detection: tober 1997.
Algorithms, analysis and implications. In IEEE
Symposium on Security and Privacy, 1996.
[15] W. Lee, S. Stolfo, and P.K. Chan. Learning
patterns from unix process execution traces for
[5] D. Endler. Intrusion detection: Applying ma- intrusion detection. In Proceedings of AAAI97
chine learning to solaris audit data. In Pro- Workshop on AI Methods in Fraud and Risk
ceedings of the 1998 Annual Computer Security Management , 1997.
[16] T.F. Lunt. Ides: an intelligent system for de-
tecting intruders. In Proceedings of the Sym-
posium: Computer Security, Threat and Coun-
termeasures , November 1990. Rome, Italy.
[17] T.F. Lunt. A survey of intrusion detection tech-
niques. Computers and Security, 12:405{418,
1993.
[18] T.F. Lunt and R. Jagannathan. A prototype
real-time intrusion-detection system. In Pro-
ceedings of the 1988 IEEE Symposium on Se-
curity and Privacy , April 1988.
[19] T.F. Lunt, A. Tamaru, F. Gilham, R. Ja-
gannthan, C. Jalali, H.S. Javitz, A. Valdos,
P.G. Neumann, and T.D. Garvey. A real-time
intrusion-detection expert system (ides). Tech-
nical Report, Computer Science Laboratory,
SRI Internationnal, February 1992.
[20] F. Monrose and A. Rubin. Authentication via
keystroke dynamics. In 4th ACM Conference
on Computer and Communications Security,
April 1997.
[21] P.A. Porras and R.A. Kemmerer. Penetra-
tion state transition analysis - a rule-based
intrusion detection approach. In Eighth An-
nual Computer Security Applications Confer-
ence, pages 220{229. IEEE Computer Society
Press, November 1992.
[22] P.A. Porras and P.G. Neumann. Emerald:
Event monitoring enabling responses to anoma-
lous live disturbances. In Proceedings of the
20th National Information Systems Security
Conference , pages 353{365, October 1997.
[23] G. Vigna and R.A. Kemmerer. Netstat: A
network-based intrusion detection approach. In
Proceedings of the 1998 Annual Computer Se-
curity Applications Conference (ACSAC'98),
pages 25{34, Los Alamitos, CA, December
1998. IEEE Computer Society, IEEE Computer
Society Press. Scottsdale, AZ.

You might also like