Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Standard view
Full view
of .
Save to My Library
Look up keyword
Like this
0 of .
Results for:
No results containing your search query
P. 1
Deterministic Memory-Efficient String Matching Algorithms for Intrusion Detection

Deterministic Memory-Efficient String Matching Algorithms for Intrusion Detection

Ratings: (0)|Views: 215 |Likes:
Published by MrIthenG

More info:

Published by: MrIthenG on Sep 11, 2009
Copyright:Attribution Non-commercial


Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less





Deterministic Memory-Efficient String MatchingAlgorithms for Intrusion Detection
Nathan Tuck 
Timothy Sherwood
Brad Calder
George Varghese
Department of Computer Science and Engineering, University of California, San Diego
Department of Computer Science, University of California, Santa Barbara
 Intrusion Detection Systems (IDSs) have becomewidely recognized as powerful tools for identifying, de-terring and deflecting malicious attacks over the network. Essential to almost every intrusion detection system is theability to search through packets and identify content that matches known attacks. Space and time efficient stringmatching algorithms are therefore important for identify-ing these packets at line rate. In this paper we examine string matching algorithmsand their use for Intrusion Detection. In particular, we fo-cus our efforts on providing worst-case performance that is amenable to hardware implementation. We contributemodifications to the Aho-Corasick string-matching algo-rithm that drastically reduce the amount of memory re-quired and improve its performance on hardware imple-mentations. We also show that these modifications donot drastically affect software performance on commod-ity processors, and therefore may be worth considering inthese cases as well.
Keywords: System Design, Network Algorithms
I. I
With each passing day there is more critical data ac-cessible in some form over the network. Any publiclyaccessible system on the Internet today will be rapidlysubjected to break-in attempts. These attacks can rangefrom email viruses, to corporate espionage, to general de-struction of data, to attacks that hijack servers from whichto spread additional attacks. Even when a system cannotbe directly broken into, denial of service attacks can be just as harmful to individuals, and can cause nearly equaldamage to the reputations of companies that provide ser-vices over the Internet. Because of the increasing stakesheld by the various users of the internet, there has beenwidespread interest in combating these attacks at everylevel, from end hosts and network taps to edge and corerouters.Intrusion Detection Systems (or IDSs) are emerging asone of the most promising ways of providing protectionto systems on the network. The IDS market has been es-timated at $100 million by the Aberdeen Group, with ex-pectations that it will double in 2004 and keep growing infuture years. By automatically monitoring network trafficin real time, intrusion detection systems can alert admin-istrators of suspicious activities, keep logs to aid in foren-sics, and assist in the detection of new worms and denialof service attacks.As with firewalls, intrusion detection systems are grow-ing in popularity because they provide a site resilience toattacks
modifying end-node software. While fire-walls only limit entry to a network based on packet head-ers, intrusion detection systems go beyond this by iden-tifying possible attacks that use valid packet headers thatpass through firewalls. Intrusion detection systems gainthis capability by searching both packet headers and pay-loads to identify attack signatures.To define suspicious activities, an IDS makes use of aset of rules which are applied to matching packets. A ruleconsists at minimum of a type of packet to search, a stringof content to match, a location where that string is to besearchedfor, andanassociatedactiontotakeifallthecon-ditions of the rule are met. An example rule might matchpackets that look like a known buffer overflow exploit ina web server; the corresponding action might be to log thepacket information and alert the administrator.Because of the utility of IDSs they are beginning to bedeployed in a wide range of operating environments. End-hostsusethemtomonitorandpreventattacksfromincom-ing traffic. They can be found in network-tap devices thatare inserted into key points of the network for diagnos-tic purposes. They will soon even find their way into edgeand core routers to protect the network infrastructure fromdistributed attacks.The challenge is that increasing line-rates and an explo-sioninthenumberofattacksmountedaswellasplummet-
0-7803-8356-7/04/$20.00 (C) 2004 IEEEIEEE INFOCOM 2004
ing unit costs have made cost-effective deployment a seri-ous issue. In addition, as IDSs move from end-hosts intoedge and core routers, the needs placed on algorithms forintrusion detection will change. While common-case per-formance can be an acceptable metric for end-hosts thatare based on commodity processors, in order to be suc-cessful inside the network infrastructure, algorithms mustsatisfy stringent worst-case performance bounds and tightconstraints on memory.At the heart of almost every modern intrusion detec-tion system is a string matching algorithm. String match-ing is crucial because it allows detection systems to basetheir actions on the
that is actually flowing to amachine. From this sea of packets, the string identifiesthose packets that contain data matching the fingerprint of a known attack. Essentially, the string matching algorithmcompares the set of strings in the rule-set to the data seenin the packets that flow across the network.String matching is computationally intensive. For ex-ample, the string matching routines in Snort account forup to 70% of total execution time and 80% of instructionsexecuted on real traces [2]. Because string matching dom-inates the performance in this and many other IDS, in thispaper we concentrate our efforts on building smaller andfaster string matching algorithms.We present optimized techniques for matching largesets of strings in incoming packets in the context of net-work intrusion detection. Our optimizations draw uponparallels between the well-studied problem of IP lookupand the nascent problem of detecting suspicious strings inpackets. We show that most of the memory used by mod-ern string matching algorithms goes towards the storageof pointers, which is similar to IP lookup.By formulating a novel compressed pointer methodol-ogy for string matching data structures, we can reduce theamount of memory required to be 2% of the original (from53.1 MB down to 1.09 MB) for a current rule set used in amodern IDS. This result is important because we providethis compression while at the same time providing worstcase performance guarantees for the string matching algo-rithm.We present the results of our techniques as applied tothe open-source IDS software
[14]. We characterizethe properties of a real set of IDS string matching rulesand examine both how the rules have changed over time,and the effect of those changes on the data structures used.These characteristics are then exploited to produce a newstring matching technique within an actual implementa-tion of 
.We examine the amount of memory saved by our stringmatching memory optimizations and the improvement inthroughput that we obtain for both commodity hardwareand for proposed next generation network processors. Byaddressing worst case performance in both the algorithmsand architecture we ensure that it is impossible for an ad-versary to construct an attack based on flooding the IDSwithpacketsthatitperformspoorlyon. Animportantcon-tribution of this work is the development of an algorithmthat performs well, requires little memory,
and has usefulbounds on worst case performance
.The contributions of this paper can be summarizedas:
We characterize the need for anduse of string matching in intrusion detection systems,and show how certain properties of the data lendthemselves well to optimizations somewhat similarto those applied to IP-lookup. We also characterizethe growth and properties of the database of knownattacks.
New Algorithms:
Based on these characterizations,we design two new string matching algorithms thatcan reduce the memory usage to as low as 2% of that required by existing algorithms while maintain-ing bounded worst case performance.
We evaluate these new algorithms intwo operational contexts, first by examining worst-case performance of hardware implementations andsecondly in a commonly used intrusion detectionsystem,
, running on a commodity processor foran example trace. We show hardware performancemorethan30%greaterthatotheralgorithmsandonlyslightly degraded software performance.We begin by characterizing the place of string matchingin intrusion detection systems such as
and discussrelevant prior work in string matching algorithms in Sec-tion II. We then discuss our proposed optimizations basedon these observations in Section III. A detailed evaluationoftheresultsofourtechniquescanbefoundinSectionIV.Our contributions are summarized in Section V.II. S
In the Introduction we motivated the need for stringmatching in Intrusion Detection Systems. In this sectionwe further demonstrate how string matching is used in anactual intrusion detection system. We also examine thestate of the art in string matching as it relates to intrusiondetection, and note some interesting parallels between theproblem of string matching and the problem of IP-lookup.
 A. Quantifying the Use of String Matching
We asserted earlier that string matching is the most crit-ical component of an Intrusion Detection System (IDS). It
0-7803-8356-7/04/$20.00 (C) 2004 IEEEIEEE INFOCOM 2004
is important to further examine exactly
the matchingis being exercised. To do this analysis we use the freelyavailable and widely used IDS tool,
1) Snort - An Intrusion Detection System: Snor
usesa set of rules that are derived from known attacks or othersuspicious behavior. The rules are generated manually byexperts who extract relevant (presumably unusual)
fromthepayloadandheaderofknownattacks.If all the conditions of the rule are met (which includematching the string, its location within the packet, andseveral other possible conditions) then the action specifiedby the rule is applied. This action can include logging thepacket, alerting a system administrator via email, ignoringthe packet, or dynamically activating other rules.The distribution of 
includes a set of rules whichcover known attacks such as the exploit that allowedCodeRed [5] to spread or buffer overflows in POP3servers. Rules are usually added to
as new vulnera-bilities are discovered. Each of these rules contains a con-tent string, associated rules for its location, and the type of packet it can appear in. To the best of our knowledge, thisdefault rule set is used in most production uses of 
with minor modifications, as it represents best practicesand knowledge of the internet community.
2) Scalability of the Intrusion Detection System Database:
An Instruction Detection System (IDS) con-tains a set of rules with corresponding actions; the set of rules supported by the IDS is called a database. In or-der to understand the hardware issues behind building anIDS we need to first understand the scalability of the IDSdatabase over time.We begin by examining the actual data within thestrings contained in the rules of a typical IDS. The currentstandard distribution of 
comes with over 1500 rulesenabled by default. Figure 1 shows a histogram of thenumber of bytes in the character portion of each uniquerule in the default database.Rules can match non-letter characters such as IP ad-dresses; this partially explains the large number of 4-byterules. As can be seen in Figure 1, the bulk of the ruleshave length on the order of 15 bytes but there is a largedistribution above and below. We also see from this fig-ure that there are many rules with very long lengths. Thisimplies that it is beneficial to avoid any string-matchingtechnique which has run-time proportional to the lengthof the rules in the database.New attacks are being created all of the time, and asthey are, new rules are being added to the
databaseto detect or combat them. Figure 2 shows how the sizeof the
rule database has grown over time. We char-acterize the size of the database in terms of the number
15 0  5  0  5  3  0  3  5  0  5  5  0 +
Length of String in Rule
   N  u  m   b  e  r  o   f   R  u   l  e  s
Fig. 1. Distribution of the lengths of the unique strings found in thedefault Snort database.
N o 0  0  0  J  un 0  0  p 0  0  J  un 0  0  O c  t   0  0  e b  0  0  3 M a y  0  0  3  J  un 0  0  3 
   I  n  c  r  e  a  s  e   i  n   N  u  m   b  e  r   (   f  r  o  m    N  o  v   2   0   0   0   )
Number of CharactersNumber of Unique Strings153319124
Fig. 2. The growth of the Snort rule database over the last three years.
W u-M an b  e o- C  o a s i   c  S  s  e a c i   t  m a p p e d  a t   C  om p
   T  o   t  a   l   D  a   t  a   S   t  r  u  c   t  u  r  e   S   i  z  e   (   M   B   )
Nov 2000Jun 200317.629.
Fig. 3. Sizes of string matching data structures for known algorithmsand our work.
0-7803-8356-7/04/$20.00 (C) 2004 IEEEIEEE INFOCOM 2004

You're Reading a Free Preview

/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->