is important to further examine exactly
the matchingis being exercised. To do this analysis we use the freelyavailable and widely used IDS tool,
1) Snort - An Intrusion Detection System: Snort
usesa set of rules that are derived from known attacks or othersuspicious behavior. The rules are generated manually byexperts who extract relevant (presumably unusual)
fromthepayloadandheaderofknownattacks.If all the conditions of the rule are met (which includematching the string, its location within the packet, andseveral other possible conditions) then the action speciﬁedby the rule is applied. This action can include logging thepacket, alerting a system administrator via email, ignoringthe packet, or dynamically activating other rules.The distribution of
includes a set of rules whichcover known attacks such as the exploit that allowedCodeRed  to spread or buffer overﬂows in POP3servers. Rules are usually added to
as new vulnera-bilities are discovered. Each of these rules contains a con-tent string, associated rules for its location, and the type of packet it can appear in. To the best of our knowledge, thisdefault rule set is used in most production uses of
with minor modiﬁcations, as it represents best practicesand knowledge of the internet community.
2) Scalability of the Intrusion Detection System Database:
An Instruction Detection System (IDS) con-tains a set of rules with corresponding actions; the set of rules supported by the IDS is called a database. In or-der to understand the hardware issues behind building anIDS we need to ﬁrst understand the scalability of the IDSdatabase over time.We begin by examining the actual data within thestrings contained in the rules of a typical IDS. The currentstandard distribution of
comes with over 1500 rulesenabled by default. Figure 1 shows a histogram of thenumber of bytes in the character portion of each uniquerule in the default database.Rules can match non-letter characters such as IP ad-dresses; this partially explains the large number of 4-byterules. As can be seen in Figure 1, the bulk of the ruleshave length on the order of 15 bytes but there is a largedistribution above and below. We also see from this ﬁg-ure that there are many rules with very long lengths. Thisimplies that it is beneﬁcial to avoid any string-matchingtechnique which has run-time proportional to the lengthof the rules in the database.New attacks are being created all of the time, and asthey are, new rules are being added to the
databaseto detect or combat them. Figure 2 shows how the sizeof the
rule database has grown over time. We char-acterize the size of the database in terms of the number
151 0 1 5 2 0 2 5 3 0 3 5 4 0 4 5 5 0 +
Length of String in Rule
N u m b e r o f R u l e s
Fig. 1. Distribution of the lengths of the unique strings found in thedefault Snort database.
N ov 2 0 0 0 J un2 0 0 1 A pr 2 0 0 2 J un2 0 0 2 O c t 2 0 0 2 F e b 2 0 0 3 M a y 2 0 0 3 J un2 0 0 3
I n c r e a s e i n N u m b e r ( f r o m N o v 2 0 0 0 )
Number of CharactersNumber of Unique Strings153319124
Fig. 2. The growth of the Snort rule database over the last three years.
W u-M an b er A h o- C or a s i c k S F K s e ar c h B i t m a p p e d P a t h C om pr
T o t a l D a t a S t r u c t u r e S i z e ( M B )
Nov 2000Jun 200317.6220.127.116.11.81.22.02.80.71.1
Fig. 3. Sizes of string matching data structures for known algorithmsand our work.
0-7803-8356-7/04/$20.00 (C) 2004 IEEEIEEE INFOCOM 2004