P. 1
Methods of Impvin String Effciency

Methods of Impvin String Effciency

|Views: 78|Likes:
Published by Adarsh Pandey

More info:

Published by: Adarsh Pandey on Dec 14, 2010
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

11/16/2014

pdf

text

original

Sections

  • Introduction
  • Misuse Detection
  • 2.1 Approaches to Misuse Detection
  • Algorithms in Misuse Detection
  • 3.1 Boyer Moore
  • 3.2 Knuth-Morris-Pratt
  • 3.3 Aho Corasick
  • 3.4 Bloom Filters
  • 3.5 NFA/DFA at hardware level
  • 4.1 Introduction
  • 4.2 Snort Rules
  • 4.3 Architecture of String Matching
  • 4.4 Working Model of the code
  • 4.5 Some More about Snort Powers
  • 4.5.1 Preprocessors
  • 4.5.2 Inline Mode
  • 4.6 Multi-Pattern String matching Algorithms in Snort
  • 4.6.1 Boyer-Moore Multi-pattern String Matching
  • 4.6.2 Wu-Manber Multi-pattern String Matching
  • 4.6.3 Aho-Corasick Multi-pattern Matching
  • 4.6.4 Aho/Corasick with Sparse Matrix Implementation
  • 4.6.5 SFKSearch using Tries
  • Issues with Pattern Matching
  • 7.1 Solution
  • 7.2 Example
  • 7.3 Limitations
  • 8.1 Approaches to Anomaly Detection
  • 9.1 BIRCH - Balanced Iterative Reducing and Clustering
  • 11.1 Future Work

Improving the Efficiency of Network Intrusion Detection Systems

B. Tech Project Report Submitted in partial fulfillment of the requirements for the degree of Bachelor of Technology

Nakul Aggarwal Roll No: 02005022

under the guidance of Prof. Om Damani & Prof. Kriti Ramamritham

Department of Computer Science and Engineering Indian Institute of Technology Bombay May 3, 2006

a

BTech Project Approval Sheet
I hereby state that contents of this work are mine. Any substantially borrowed material (cut-pasted or otherwise) including figures, tables and sketches have been duly acknowledged.

Nakul Aggarwal (Roll no: 02005022) Date :

I hereby give my approval for the B.Tech Project Report titled “Improving The Efficiency of Intrusion Detection Systems” by Nakul Aggarwal (02005022) to be submitted.

Prof. Om Damani Prof. Krithi Ramamritham Date :

2

Acknowledgments

I would like to express my sincere gratitude towards my guides Prof. Om Damani and Prof. Krithi Ramamritham for their invaluable consistent support and guidance. They has been generous enough to let me pursue the work of my interest.

Nakul Aggarwal, May, 2006 IIT Bombay.

3

Abstract

Network intrusion detection systems have become standard components in security infrastructures. The elements central to intrusion detection are the resources to be protected in a target network, i.e., computer systems, file systems, network information, etc; models that characterize the normal or legitimate behavior of network; techniques that compare the actual network activities with the established models, and identify those that are abnormal or intrusive. There are two approaches to combat issue of intrusion depending upon whether we have some previous info of the attacks or not. One is, when from earlier intrusions we want to know whether new flows are intrusive in nature. Other is after learning the normal behavior of a network we want to classify new flows are normal or intrusive. Here we will look at some of the approaches, algorithms, issues still unsolved. Then we had looked at the issue of evading IDS’s by overflowing their network buffers with out of order packets and has proposed a solution. Also, implementing inline and adaptive clustering mechanisms for anomaly detection techniques at high traffic rate has been an limitation in anomaly detection approaches. ADWICE has been first effort in this field but since it uses distance based clustering mechanism it suffers from inefficient clustering. We have proposed additional density based statistical variables with each cluster so as to improve the efficiency.

i

. . . .1 Boyer Moore . . . . . . . . . . . . . . 4. . . . . .6. . . . .1 Preprocessors . .3 Aho Corasick . . . . . . . . . . .5 SFKSearch using Tries . . . . . . . . . .Contents 1 2 Introduction Misuse Detection 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 6 .1 Boyer-Moore Multi-pattern String Matching .4 Bloom Filters . . . 4. 4 Snort 4. . . . . .2 Wu-Manber Multi-pattern String Matching .6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6. . . . . . . . . 4. . . . .5. . . . . .6 Multi-Pattern String matching Algorithms in Snort . . . Algorithms in Misuse Detection 3. . . . . . . . . . . . . . 4. . . . . . . . . . . . . . . . . .6. . . . . . . . . . . . . . . . .2 Snort Rules . . . . . . . . 4. . . . . . . . . . . . . . . . .1 Approaches to Misuse Detection . . . . . . . . . . . 3. . . .1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bro Issues with Pattern Matching . . . . . . . . . . . . . . . . . .2 Inline Mode . . . . . . . 4. . . . . . . . . . . 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.5 NFA/DFA at hardware level . . . . . . . .2 Knuth-Morris-Pratt . . . . . 4.3 Aho-Corasick Multi-pattern Matching . .4 Aho/Corasick with Sparse Matrix Implementation 4. . . . 4. 4. . . . . . . . . . . . . . . . .5 Some More about Snort Powers . . . . . . . . . . . . 4. . . . . . .4 Working Model of the code . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Architecture of String Matching . . . . . . . . 1 3 3 6 6 7 7 8 8 10 10 10 10 11 12 12 12 13 13 13 13 14 14 15 17 ii 3 . 3. . . . . . . . . . . . . . . . . . . . 4. . . . . . . . . . . . . . . . . . 3. . . . . . . . . . . .5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . .1 BIRCH . . .Balanced Iterative Reducing and Clustering . . . . . . . . . . . . . . . .2 DBSCAN . . . . 7. . . . . . . . . . . . . 20 20 21 22 23 24 27 27 28 30 32 32 8 9 10 ADWICE-TRAD 11 Conclusion and Future Work 11. . . . . . . . . . . . . . . . . . . . . . . . . . . . Clustering Algorithms for Anomaly Detection 9. . . . . . . . Anomaly Detection 8. . . . .7 Issue of Out of Order packets 7. . . .1 Approaches to Anomaly Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii . . . . . . . . . . . . . . . . .1 Solution . . . . . . . . . . .Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise . . . 7. .1 Future Work . . . . . . . . . . 9. . . . . . . . . . . . . . . . . . . . . .3 Limitations . . . . . . . .2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . .

And implementations exist at both level as in softwares products. Signature matching engines at hardware level implement the signatures with the help of LookUp Tables (LUTs). The increasing network utilization and weekly increase in the number of critical application layer exploits implies IDS designers must find ways to speed up their attack analysis techniques when monitoring a fully-saturated network with less number of false positives.Chapter 1 Introduction There has been significant rise in the number of network attacks. Snort is the one of the most widely deployed IDS tools. TCAMs. therefore much work still needs to be done in this field. Intrusion Detection is primary concerned with the detection of illegal activities and acquisitions of privileges that cannot be detected with information flow and access control modules. hardware chips or pattern maching engines. at network level(NIDS) or at host level(HIDS). attacking of web servers via exploitation of software bugs or DOS attacks. Pattern matching for network security and 1 . independent of implementation level in network i. Some of the most popular software NIDS includes Snort. while. hacking into the systems using as simple as buffer overflows. new worms making whole networks down. Anomaly Detection learns the normal network traffic and then detects network intrusions by classifying the real traffic as being normal or anomalous. Dragon IDS etc. Because of the increasing personal information at stakes in the networks and ever expanding internet/intranet threats. there’s much work going on in combating these attacks. Also. upto 70% of the total execution time goes in this process which clearly reflects the vast amount of work that still needs to be done. Even the studies on empirical data indicate that number of signatures (which represent the one or the other unique malicious activity) has grown around 2. preciselydescribed patterns.5 times in last 3 yrs [20].e. Pattern matching is just one of the methods where system inspects network traffic for matches against exact. Signature Matching is the core of malicious traffic/event detection engines. NFA/DFA and pattern matching is done in router itself for each packet maintaining the session flow information per IP. Statistics say that signature matching is the most computational intensive part of an IDS system. Bro. Then ten’s of vulnerabilities of various softwares are exposed each day at various security related lists and newsgroups or buqtraq mailing lists. other than pattern matching when it comes to statesful pattern matching we have the issues of out-of-memory and excessive CPU usages. Intrusion detection can be of two types either Pattern Matching or Anomaly Detection. whether its deployed at network Perimeter (typically known as Demilitarized Zone (DMZ’s)). In Snort.

we will be explaining the design and architecture of most widely used Intrusion Detection system. Snort. • In Chapter 8. we will look here at a proposed solution and its proof. • We conclude with Chapter 11. Some of them are Coloured Petri Nets. over the time. • Chapter 6. Bro will be looked at. Pattern matching started with the use of the most common string matching algorithms like Boyer-Moore. We have proposed an fix in the BIRCH algorithm which will make clustering more robust and efficient especially in intrusion detection applications. less human effort is involved once setup and running comparing to former approach where one constantly needs to update the signature database. authors has implemented an scalable and efficient anomaly detection system which uses clustering algorithm namely BIRCH with some modifications. Hash table mapping for each pattern over Boyer-Moore given by Wu-Manber.Intrusion Detection demands exceptionally high performance. but is more powerful than pattern matching because of the capability of identifying new attacks. we will begin an general introduction to Misuse Detection and approaches. But. Aho/Corasick etc. starting with a general overview. • Chapter 10. 2 . Rest of the paper is arranged as follows. • Chapter 9 follows with the discussion of clustering algorithms for large scale data and their analysis for implementation in anomaly detection systems. In ADWICE[4]. Anomaly Detection is currently in its infant stage as far as real world implementations are concerned. we will going into Anomaly Detection Systems. • In Chapter 2. • Chapter 7. While there has been a lot of limitations in this approach listed in chapter 8. • In Chapter 5. • In Chapter 4. String matching is of high interest in theoretical aspect also. we will see some of the most common algorithms of pattern matching. we will list the issues with most of the detection systems. Tries and suffix trees (some kind of linked-list implementation for State-Machine matching) etc. Sparse-vector implementation over AhoCorasick. an Adaptive anomaly detection system. efficient and fast clustering mechanism has been one of the most important limitation. Also. we have looked at the issue of ”out of order” packets in NIDS. • In Chapter 3. we propose a fix for the ADWICE. Knuth-Morris-Pratt (KMP). People have adopted various techniques at times for string matching and this technology is still evolving with new optimizations and heuristics every day. But here the need is of fast multiple pattern string matching. researchers have designed new and efficient algorithms including improvements over these existing approaches.

Association Rules or Expert systems defines the intrusions as a set of rules and corresponding actions. misuse detection tends to have low number of false positives but is unable to detect attacks that lie beyond its knowledge. all the data with no such signatures are considered to be normal. By its nature. Signature analysis or Pattern Matching is the technique of matching the data with a set of predefined ruleset or signatures with any of the pattern matching algorithms which will be discussed in the chapter 3. by characterizing the rules that govern these attacks.Chapter 2 Misuse Detection Misuse detection aims to detect well-known attacks as well as slight variations of them. 2. IP packets that exceed the maximum legal length (65535 octets) 2. Where the anomalous data is represented by the signatures as we have seen in above example. Some examples being: 1. 3 . /User-Agent \:[ˆ\n]+ PassSickle/i This is a example signature for capturing the packets containing trojan horse PassSickle.1 Approaches to Misuse Detection Misuse detection approaches can be classified into the following categories: • Signature Analysis • Association Rules • State Transition Analysis • Data Mining Approaches Misuse Detection systems has knowledge of both the normal and the anomalous data and new flows are classified into the one of the two categories depending upon one of the above mentioned approaches used. which are fired whenever a matching with some rule takes place. Systems based on this approach use different models like state transition analysis. or a more formal pattern classification.

here some learning data where flows are pre-labelled as normal and/or anomalous is feed into the machine initially to build up the models and then use these learned models for further classification.State Transition Analysis Here the known intrusions are defined as definite finite state machine with some end nodes. it is not able to detect new threats/intrusions. Since. it uses pre-defined set of signatures. optimizing the rule set. genetic algorithms etc. evading the signature matching rule set by additional packet with arbitrary string and low TTL in between packets which contains the main string. There contains a large set which has been thoroughly discussed in [12] ( For eg. flags. 4 . Decision Trees. where each matching of some signature. Some of the limitations being : 1. 4. 2. Bro (refer Chapter 5) is an example of this kind of approach. Updating of Signatures. Neural Networks. Other IDS evasion and invasion techniques. 6. which are able to detect new intrusions and donot suffer from large signature set issues (since it doesnot uses any signature set). Here a attack eludes the NIDS by exploiting the fact that signature doesn’t covers all the attack instances. 7. networks has seen large variety of intrusions. Over last few years. Given a signature “blaster”. providing a large signature set leading to large number of false positives and requires large human effort optimizing the signature set as per one’s network needs and requirements. rather looking for the superset of misuse detection to be able to detect every intrusion. 5. Data Mining Approaches use statistical classification techniques like Naive Bayes. Latency in development.e. Being statistical this requires some data to build up the models to match new data against. this substring will prevent matching engine from matching but the end host gets affected since it doesn’t gets the additional packet with low TTL value. Signature obfuscation. checking the alerts and hence the intrusions. i. Association rules do suffer from all above with additional overhead of the clumsiness which comes through as the number of attributes to match keep on increasing. From the starting of installing. to classify the new events/flows being normal or anomalous. regular updating of signatures. it suffers from following flaws which has lead to the search of more efficient techniques for intrusion detection. Hence. But. These systems needs to be regularly updated to the newest rule-set from the respective sites for combating everyday’s new attacks. the NIDS can be easily evaded by the malicious packet[s] if it contains the string mlaster etc. every event either takes you to next state depending upon the transitional input. etc. This type of systems involve high manual involvement lifelong. triggers the correlation engine which makes an transition on the state machine. While Misuse detection is the most widely deployed mechanism for NIDS. 3. people rather looked for removing the limitations which gave rise to anomaly detection techniques.

Signatures can be both flags matching. typically within hours when a new vulnerability or exploit or worm is detected. which also uses this approach. One of the most widely deployed tool for NIDS is Snort. or content in the payload (which is there in most of the signatures). Hence. A detailed study of the snort architecture. most commonly string matching algorithms like Boyer More etc. IP Addresses. Pattern Matching matches the input flow with given a set of signatures. algorithms and code has been discussed in Chapter 4.Misuse Detection via Signature Matching is the most widely accepted approach because of the large research base which provides the constant and updated flow of signature set. are deployed as part of these NIDS. 5 . Lets look at some of the common pattern matching algorithms. techniques.

the pattern is shifted so that the mismatching character is aligned with the rightmost position at which it appears inside the pattern. works as follows: if a mismatch is found in the middle of the pattern.1 Boyer Moore Main features • performs the comparisons from right to left. we will be discussing some of the basic must-know algorithms of string or pattern matching. referred to as the bad character heuristic. Simple string matching • Boyer-Moore • Knuth-Morris-Pratt (KMP) 2. • O(n/m) best performance example when am b in bn . Hardware Solutions • Bloom Filters and Extended Bloom Filters • NFA/DFA implementation at hardware level 3. • searching phase in O(mn) time complexity example am in an . These include 1. • 3n text character comparisons in the worst case when searching for a non periodic pattern and n in average case. The Boyer-Moore algorithm[5] uses two different heuristics for determining the maximum possible shift distance in case of a mismatch: the “bad character” and the “good suffix” heuristics. The second heuristic. The first heuristic. the search 6 . State Machine Matching • Aho/Corasick 3.Chapter 3 Algorithms in Misuse Detection Here. • preprocessing phase in O(m + σ) time and space complexity where σ is character set size of pattern. works as follows: if the search pattern contains a mismatching character (that is different from corresponding character in the given text).

Both heuristics can lead to a shift distance of m. if only the first comparison was a match. this is not true for small alphabet sets. G consists of the following components: 1. if the first comparison causes a mismatch and the corresponding text symbol does not occur in the pattern at all. • Preprocessing phase has O(m ) space requirements. finite set Q of states finite alphabet transition function δ : Q × initial state q0 in Q a set F of final states → Q + f ail 7 . 2. one can finds the value of shift needed as the max of these two. For the bad character heuristics this is the case. 3. This was significant improvement in memory requirements over finite state machine based string matching. some versions of the Boyer-Moore algorithm are found in which the good suffix heuristics is left away. • Searching phase in O(n) time complexity. The argument is that the bad character heuristics would be sufficient and the good suffix heuristics would not save many comparisons. The optimum shift depends on the prefix in pattern which is also the suffix in the matched pattern part. • Searching phase in O(n + m) time complexity (independent from the alphabet size).2 Knuth-Morris-Pratt Main features • Performs the comparisons from left to right. but that symbol does not occur elsewhere in the pattern. And with the help of preprocessed “bad character” and “good suffix” values. π contains the information about the optimum shifts needed in the case of a mis-match. • Preprocessing phase in O(m) space and time complexity. 5. The preprocessing for the good suffix heuristics is rather difficult to understand and to implement. 3. Therefore. 3.pattern is shifted to the next occurrence of the matched suffix in the pattern. 4. It pre-calculates a auxiliary function π (m-dimension) which contains the information about the jumping from current state to next state. where is alphabet set size.3 Aho Corasick Main Features • Performs the comparison from left to right. However. While during string matching process. For the good suffix heuristics this is the case. Aho/Corasick String Matching Automaton for a given finite set P of patterns is a (deterministic) finite automaton G accepting the set of all words containing a word of P .

Then combinational logic associated with each flip flop ensures that this 1-bit is transferred to flip-flop corresponding to next state in the DFA. the transition function which tells which state to jump for each character ∈ .first time implemented NFA matching onto programmable logic in O(n2 ) logic and still providing O(1) access time. reported CPU time and maximum memory usage of 0. Where at each state. They implemented One-Hot Encoding (OHE) scheme. If all the bits are found to be set then the string is said to belong to the set with a certain probability. Given a string ’X’. they are able to map the patterns to the FPGA’s.5 NFA/DFA at hardware level Sidhu and Prasanna in [18]. The space efficiency is achieved at the cost of a small probability of false positives. the string matching time for 11MB file. first DFA is formed and then a NFA.34sec and 580KB respectively. where a string whose membership is to be verified is input to the filter. Bloomier filters etc. The Bloom filter generates ’k’ hash values using the same hash functions it used to program the filter. it is one of the best options for pattern matching. Lately. The reported times are amazing. where one flip-flop is associated with each state and at anytime only one is active. It then sets ’k’ bits in a ’m’. 3. This means that a Bloom filter could wrongly accept some entry even if it does not belong to the set under consideration. The same procedure is repeated for all the members of the set. a Bloom filter could result in false positives. while the same when 8 . Therefore. 3. It just traverses the string to be matched making transitions via the δ.bit long vector at the addresses corresponding to the ’k’ hash values. and usage of LUT’s for comparing the input character. Whenever we reach a state ∈ F . This process is called programming of the filter. wise selection of the filter’s parameters can guarantee a small false positives probability. there is information about where to jump to for each character ∈ . there has been much improvements in this technology also with the modifications leading to counting bloom filters. The bits in the ’m’-bit long vector at the locations corresponding to the ’k’ hash values are looked up. For fitting in logic of the existing patterns. the Bloom filter computes ’k’ hash functions on it producing ’k’ hash values ranging from 1 to ’m’. where an item is accepted while it does not actually belong to the set. For simple string matching cases. The query process is similar to programming. it doesnot performs very well but when there are multiple patterns or pattern matching is done at regular expression level. Now each transition is mapped to these flip-flop structure. If at least one of these ’k’ bits is found not set then the string is declared to be a nonmember of the set. Taking care of the transitions in the NFA’s by providing the same input to next state also.4 Bloom Filters A Bloom filter is a space efficient randomized data-structure used for concisely representing a set in order to support approximate membership queries.Transition table is built during the preprocessing part. a match is reported by the engine. However. Compressed Bloom Filters.

matching when done by DFA matching engine as software reported above mentioned stats to be 87309.38sec and 229MB respectively. there had been a lot of modifications and advancements in this approach also after this initial effort. While. 9 .

we will see some of those soon. many of the rules contains the same Rule Headers.2 Snort Rules A sample snort rule can be written as. and attacks on daemons with known weaknesses. 4. ICMP. rev:1. this helps in matching. sid:517.) This rule has been broken down into 2 parts: Rule header (everything upto first parenthesis) and Rule options (everything in the hypothesis). Since. It can be used to detect a variety of attacks and probes such as stealth port scans. These head nodes are head nodes of the RTN linked lists. They have come long way after these initial implementations. 4. Snort utilizes descriptive rules to determine what traffic it should monitor and a modularly designed detection engine to pinpoint attacks in real time. Snort into its first releases used to have brute force matching which was very slow.476. Each rule in the rules file is added to the respective list.3 Architecture of String Matching Snort contains RuleList global variable which has four RTN head nodes. When an attack is identified. Four heads corresponds to each of the four protocols TCP. and content searching/matching. Address Resolution Protocol (ARP) spoofing. we will see in next section. Rule headers forms the RTN (Rule Tree Node) in the snort matching architecture while Rule Options forms the OTNs (Optional Tree Node).reference:arachnids. classtype:attempted-recon. alert udp $EXTERNAL NET any -> $HOME NET 177 (msg:"MISC xdmcp query". Snort can take a variety of actions to alert the systems administrator to the threat. IP and UDP. protocol analysis.Chapter 4 Snort 4.. How.1 Introduction Snort can perform real-time packet logging. CGI-based attacks. First boost to signature matching came in with the implementation of Boyer-Moore pattern matching Algorithm. therefore each of the RTN node contains a pointer to the head node of the OTN linked list which contains all the rules with the same 10 . content: "|00 01 00 03 00 01 00|".

simplicity. That is we have port based classification for ruleset after the four protocols mentioned above. But snort contains implementations of large number of other pattern matching engines also including Modified Wu-Manber Style Multi-Pattern Matcher. Unique Source Port 2. 4. and data structure generation takes place before the sniffer section is initialized. This allows O(1) mapping of rule on the basis of port value to its appropriate list. function pointers] are not able to work at high speed in the network. Program configuration. And when flag also matches. Optimizations on Aho/Corasick. rules parsing. Therefore they have done one more optimization i. Each OTN node further contains some other flags that needs to matched (like Ack flag should be set etc. Aho/Corasick. By default. the detection engine. SFK matching engine.RTN header. These subsystems ride on top of the libpcap promiscuous packet sniffing library. 1. This Rulelist is build up during the initialization of the engine. keeping the amount of per packet processing to the minimum required to achieve the base program functionality. 11 . OTN. Wu-Manber string matching algorithm is used. and flexibility. engine calls the function pointer stored to do other(if any) necessary checks and string matching using any of the string matching algorithm. There are three primary subsystems that make up Snort: the packet decoder. Sparse Matrix implementation of Aho/Corasick etc. This additional dimension speeds up the process of string matching since now the number of rules to be matched against the incoming traffic are reduced by high number.) and these checks are performed before the string matching to avoid unnecessary pattern matching in case even flag doesnot matches. and the logging and alerting subsystem. Unique Source and Destination Port 4. Unique Destination Port 3. This fourth dimension is “port” based classification of rules and this is done before the RTN lists are created. which provides a portable packet sniffing and filtering capability.e. Generic (source and Destination port can take “any” value) Now each structure has linked list array of MAX PORT size (64*1024). The authors has assumed that given the port values (both of source or destination) we can drop the rule in one of the following class. We will be discussing some of these algorithms in next section. But lately the number of rules in snort rule DB has exceeded even 3000 mark such that the above mentioned 3 dimension linked lists [RTN. they have implemented a fast packet classification engine adding a 4rth dimension to the above structure.4 Working Model of the code Snort’s architecture is focused on performance.

like perfmonitor. Hola Anonimo has given a very basic level tutorial [1] on how to write a preprocessor plugin for Snort. authors thought of adding this functionality as modular “plug-ins” something similar to modules in linux kernel which can be deactivated whenever not needed or when they are effecting the snort performance. introduced since version 1. SPP’s (Snort Preprocessors) can be used in different ways: They can look for an specific behavior(portscan.e. Now we will look at last 2 of the above mentioned achievements. rule based matching was done in Detection engine. where each packet is processed first accessed by snort and then passed to the linux kernel. Since.1 Some More about Snort Powers Preprocessors With the arrival of term ”Anomaly Detection”. where former adds the TCP statefulness and session reassembly so that connection status and information can be stored providing more information on alerts and also removing the unnecessary checks and also check the patterns which are extending over multiple packets. They’re “located” just after the module of protocol analysis and before detection engine and do not depend of rules. • Adding the anomaly detection techniques to snort and last but not the least • Improving the pattern matching when pattern is extended over multiple segments or fragments.2 Inline Mode Inline Mode is optional argument in Snort which actually increases the processing speed of snort. Preprocessors are plugable components of Snort. or just collect certain information. the detection plugins. do depend of rules and may be applied many times for a single packet. their was high demand of this in snort also.5 4.5. telnet decode etc) and even the advanced techniques of statistical approaches to anomaly detection via the Spade plugin (A brief description has been given in Appendix A). While the latter preprocessor prevents the IDS evasion and invasions via fragmented packets [15]. Anomaly Detection Anomaly Detection preprocessors include both type of protocol anomaly detection (via protocol specific PP like arpspoof. Also. 4. 12 . Hence. Pattern Matching over Multiple Packets This is achieved through the Stream4 and Frag2 preprocessors. in the other hand. will demand more processing time affecting the main strength of snort i. They are called whenever a packet arrives. flowportscan). Snort inline obtains packets from iptables instead of libpcap and then uses new rule types to help iptables pass or drop packets based on Snort rules.5. Since in this level. the protocol anomaly detection and many other functionalities which are independent of rules comes under this category. fast rule-based matching. Some of the major achievements or goals of Preprocessors were • To decrease the number of false positives. hence preventing significant overheads involved in kernel processing in cases when packet needs to be dropped.4. to be support for further analysis(is this the expression? help us) like flow. each added preprocessor. it works at the same level as Iptables.5. but just once.

2. matching here is done sequentially for each pattern. 4.4.1 Boyer-Moore Multi-pattern String Matching This is same as what we have already discussed in earlier section 3. where initially HASH table is built and all patterns are categorized into appropriate table entry. Multi-Pattern String matching Algorithms in Snort Boyer More Wu-Manber Aho/Corasick Sparse matrix with Aho/Corasick Tries in SFKSearch 4. This was an improvement over Boyer-Moore in 2 aspects (assuming all the patterns are of same length and each pattern is broken into further substrings of same length. which is b-byte shift table preprocessed during initialization (Here all possible cases of b-string are considered for the given alphabet size). 4. Hence. implementation first forms a combined DFA for all patterns. The scanning operation was also shown to be in O(bN/m) where N is the size of input.2 Wu-Manber Multi-pattern String Matching This is the default string matching algorithm used in Snort. The reported macthing times are nearly two times faster than GNU-grep. therefore in that case only 1 character is skipped. by mapping them to unique integral values. only then pattern matching is done. we form fragments of length b = log(mk).3 Aho-Corasick Multi-pattern Matching Aho/Corasick Matching. Eg. value ≤ 0). Building of hash table is quite interesting here. blocks of characters are matched. there is no overhead of DFA formation for each pattern and also no (individual or set of) patterns traversal. It is used to determine how many characters in the text can be shifted (skipped) when the text is scanned. Ambiguity lies in the case where the SHIFT reports a match but there is no entry in the Hashtable (since the hash only depends on first b-character substring in the pattern). And for each new character we have to just take one step. When a shift value indicates matched fragment of pattern (i. Since this is preprocessed during the initialization part. Also. if patterns are of length m and k in number.6. But. 4. • Rather than matching all the patterns they have exploited power of hash functions. 5. 3.6 1.6. since the first b-length substring is considered from the prefix of each pattern for calculation of each hash value.6. inpractise however b = 2 or 3) • SHIFT Table.1 except that the patterns are quite large in number rather just one. But the memory overheads are huge. the state holding at each step is huge because there are multiple copies of active DFA’s since a new DFA gets 13 .e.

4 Aho/Corasick with Sparse Matrix Implementation The enhanced design on Aho-Corasick uses an optimized vector storage design for storing the transition table. the algorithm backtracks to the point at which the match started.2 to 1. 14 .6. Sparse-Row format Vector: 0 0 0 2 4 0 0 0 6 0 7 0 0 0 0 0 0 Sparse-Row Storage: 8 4 2 5 4 9 7 11 7 Now for each DFA state rather than having a 256-size vector of which most are 0 values. the input is a set of n strings called S1 .6. it is unaffected by the variance in size of the patterns and worse and average case performance is same. Cleary since we cannot have O(1) transition time in this implementation. 4.activated at each new character input other than the existing DFA’s. Of course some go out also but difference is huge. The memory requirements go down by four times which is quite significant. Otherwise. A trie is a k-ary position tree. where each Si consists of symbols from a finite alphabet set and has a unique terminal symbol $. It is constructed from input strings. at which point it recognizes that no further matches are possible. While worst case performance is quite poor in comparison to Aho/Corasick but low memory requirements makes it an appropriate substitute at times. i. we use sparse matrices to present the transition element and the corresponding value.5 times with full-matrix version. Each level in the trie is a sequential list of sibling nodes that contain a pointer to matching rules..5-2. and now considers matches starting from the next character in the packet..7 times faster performance with the usage of sparsematrix and 1. If there is a match between the character in the current node and the current character in the packet. at which point it traverses the trie looking for matches.. S2 . This memory efficient variant uses sparse matrix storage to reduce the memory requirements and further improve performance on large pattern groups. the algorithm follows the child pointer and increments the character packet pointer. Sn . This algorithm used for low memory situations in Snort. Banded-Row Format and CSR Matrix Format. 4.5 SFKSearch using Tries The term trie comes from the word ”retrieval”. a character that must be matched to traverse to their child node. In the case that matching fails. There are some other compact representations have also been discussed by the author namely Compressed Sparse Vector Format. and a pointer to the (next) sibling node. . since we need to traverse this new vector to find the transition element.e. The author [13] has even reported an 1. But power of the algorithm is. The algorithm builds a trie. it follows the sibling pointer until it reaches the end of the list. The algorithm uses a bad character shift table to advance through search text until it encounters a possible start of a match string.

They have compared their approach with Snort and reported some interesting results also.g. Bro monitors network traffic and detects intrusion attempts based on the traffic characteristics and content. Bro detects intrusions by comparing network traffic against rules describing events that are deemed troublesome. Since the DFA construction requires quite large memory requirements. certain hosts connecting to certain services). • The number of alerts and signatures in Bro were much more informative as compared to Snort. For the former. what activities are worth alerting (e. 15 . • The reported matching time was quite similar in without-cache implementation of Bro engine and snort. 2.Chapter 5 Bro Bro[2] is a Unix-based Network Intrusion Detection System (IDS). In their design. an event is generated and passed to another component named as policy script component which at the abstract level sort of correlates these events to find the possibility of an attack. for every matched pattern or rule.. These rules might describe activities (e. they designed the concept of context based pattern matching. they have implemented DFA matching for pattern matching algorithm which also provides additional strength to their patterns since they are more robust to false positives now. attempts to a given number of different hosts constitutes a “scan”).. it can be instructed to either issue a log entry or initiate the execution of an operating system command. The main aim of this IDS to combat two major shortcomings of the snort engine namely high false alarm rates and the string matching time. But matching large number of patterns each time is quite intensive especially when they have two engines running simultaneously. or signatures describing known attacks or access to known vulnerabilities. For combating this problem.g. If Bro detects something of interest. Regular expressions for signatures rather strings. eliminating a large number of false positives. Providing the alert engine a notion of connection state and knowledge. where additional context is provided by 1. • They because of their context-based matching engine has inbuilt capability to fightback TCP reassembly and fragmentation issues. they have used the approach of on-the-fly generation of the DFA as given in [7] and also implemented the memory bounded DFA engine in-case there is algorithmic attack on the engine itself so that not to affect the other engine. rather than generation of an alert.

without regular updation of signatures and more proper categorization (with the ever increasing signature set). • Snort has large and regularly updated signature database. • The memory requirements are quite high.Important question is then why snort is the most widely used tool? There is no such answer available anywhere but these arguments are just what are my inferences: • With the implementation of efficient string matching algorithms. since they use a DFA matching engine. the running time of snort exceeds bro by much large margin. which is most important reason for its usage. the performance goes down. • Even though Bro signatures are more context-specific. 16 .

) While one always needs to compromise between memory requirements vs speed available. As we can see in the existing algorithms itself. While the other string matching algorithms such as Boyer-Moore can lead to O(mn) time requirements in cases of algorithmic attacks. Bro contains patterns in regex (regular expression) format itself.Chapter 6 Issues with Pattern Matching Other than Pattern matching “algorithm” decision. While this is also needed as the most common algorithms used are BoyerMoore. Most of the IDS’s except a few use the byte or character based string as the patterns presentation format. Also. KMP etc. [19] also discusses about the statesful packet matching where IDS stores the information about the context of the traffic between two peers providing more efficient pattern 17 . • • • • • • Memory vs Speed Signature format Session-Based and Application Level Signature Matching State Holding issues in-cases of pattern extending over multiple packets Packet Fragmentation Issues. But if State-machine matching is being deployed then regular expression can provide a better pattern which can be more informative and will be more unique to the attack it is identifying. Aho/Corasick provides O(1) time pattern matching but requires quite large memory for the storage of the state machine. there are a lot of other issues that also needs to considered before choosing any one of them. fast matching is the natural need for the decision but there are some other issues to be kept in mind like fighting false positives example in some cases it is possible that payload contains a pattern for buffer overflow attack via telnet application protocol but what if there was no active telnet session between two hosts. Other than these. Of course. Then. One must need to payoff one depending upon the constraints. [19] provides some examples also. and * etc. Then. other issue can be what if pattern is split over multiple packets? Some of issues with respect to choice of algorithm and limitations of signature matching has been stated below. Getting packet dumps or testing data set? (other than attack tools and DARPA set. most of the Snort rules do contains multiple patterns with different offset and depth values which can be very well expressed in single regular expression with the usage of basic regular expression patterns like .

Snort even contains a preprocessor plugin i. These issues pops up some new questions other than existing ones like because different operating systems have unique methods of fragment reassembly. The above mentioned papers themselves have discussed few of them. then its model of the end-system s protocol state will be incorrect. While over this. Frag2 for most of these issues with some assumptions like if next few fragments doesnot arrives in next 30 seconds. Incase of pattern matching over individual packets. other flow specific data structures etc. it will be dropped. this is not of much concern since this does not even comes into picture. does the information needs to stored before dropping the information. While much of these have been solved in existing tools heuristically. One of the most important issues with IDS systems is the state holding issue which can be explained as the amount of the information that needs to be stored for each flow flowing through it. If the attacker can find ways to systematically ensure that some packets will be received and some not. one can also provide application level pattern matching to provide even better results. demanding more memory for storing information about session flows and packets flowing. [14] even complicate the situation more. But with the invent of attack packet split over multiple packets. the partially matched patterns. Although there is Snort preprocessor for counter-attack to this issue namely Stream4. since information that needs to be stored can vary from flow to flow.[12]. Then. it may not reassemble and process the packets the same way the destination host does. (it should not be the case that IDS declares timeout and drops the session information while the destination host still keeps waiting. then one can/needs to specify the end hostsystem OS so that specific reassembly is done for that session. some of new issues comes into picture like • • • • Out-of-order arrival of TCP segments Re-transmitted segments overlapping TCP packets hence issues with reassembly Missing of fragments in between or losing the state of the connection while connection is still alive? • How much data should be buffered (TCP window) • Varying TTL of the fragments for evasion of NIDS. if an intrusion detection system uses a single “one size fits all” reassembly method. classifications as per the statistics. If the NIDS believes a packet was received when in fact it did not reach the end-system. but what if traffic is fragmented attack specific. pattern matching has gone to name packet stream matching since now packet needs to be matched over multiple packets. [15]. what is the number of maximum sessions that can be stored. Since. Some 18 . protocol and application layer.matching results but the overheads involved are the massive because of the information that needs to be stored specific to content of the traffic for large amount of the flows. While Authors in [17] has examined the character and effects of fragmented IP traffic as monitored on highly aggregated Internet links. Continuing the above discussion. They show that the amount of “fragmented packet” traffic at internet links is less than 1% but there are two cases first they are talking at internet level with good connection speeds and secondly. issue of fragmented packets [9]. or vice-versa). They had shown the amount of fragmented packets in normal internet traffic and their characterizations.e. the attacker may be able to evade the NIDS. An attack that successfully exploits these differences in fragment reassembly can cause the IDS to miss the malicious traffic and fail to alert. but these issues are with this plugin also. For how much time.

one for each possible interpretation. While there exists MIT DARPA Datasets but there are two issues with them. Even the attack tools are too specific for producing individual attacks rather a generic traffic in-between including attack packets. also generate other type of traffic like the ones has been described in [12]. but then always its also synthetic. what it means is if the NIDS does not know which of two possible interpretations the end-system may apply to incoming packets. one of the major issue we have come across is the testing of existing approaches. and analyzes each context separately from then onwards.tools even use bifurcating analysis [12]. Some other methodologies has also been discussed in the same paper. Then. firstly they contain very few attacks and secondly they are of 1998-99 period and since that attack technologies has advanced a lot.[16] has designed a new tool for IDS testing namely AGENT which takes other than producing ”pattern strings”. While recently. 19 . then it splits its analysis context for that connection into multiple threads.

Currently the IDS handle this issue by limiting the number of fragments per flow and also setting timeout value for each of the fragmented packet so that it will be dropped as soon as timeout time is passed after the packet arrival. as we keep getting fragments they are pushed to the destination instantly.e.1 Solution Consider the Aho-Corasick algorithm of pattern matching. But. do the transitions on the original DFA and for reverse of the packet payload. RDFA is constructed similarly to the original DFA just that new signature set generated in last step is used. (which is stored anyways in the stream based pattern matching methodology). Now. DFA) for the incoming traffic payload. do the transition jumps on the RDFA. For each of the input packet payload. “Handling of Out of Order” packets is one of them. 7. logging the packets (hence the blockage of network buffers) also affects the other modules of the system. When the 20 . one can easily evade the IDS by constant bombardment of the never ending fragments. since it involves temporary storage of the fragments. Now. we made an assumption that is fragment size should always be greater than the largest signature in the signature set. How this works We claim that using the two DFA’s we would be able to do the matching (assumed above assumption). out of order packets needs to stored unless all the fragments/segments has been received. In most of the current implementations of intrusion detection systems. which involves making a definite finite automata of the signature set and then traversing this DFA (we call this simply. store pointers for the intermediate state for both of these DFA’s. Now. we saw some of the limitations of the existing NIDS systems. without storing the fragmented packets. Since. Then the packet is re-assembled and transmitted to the destination.Chapter 7 Issue of Out of Order packets In the last section. consider another DFA. we propose a solution such that one need not store the out-of-order packet i. Now. Define a new signature set which is formed by reversing all of the signatures of the original signature set. lets call it RDFA.

and total packet size of two fragments will be >= 2k. Proof: Lets say that the largest signature size is k. “she } 21 .next fragment comes. if it was starting somewhere in packet n. Implications of the Assumption Our assumption that fragment payload size will be greater than largest signature size from our signature set. Case 3: Since. n+2. and ending in n + 1. we keep on storing the pointers in both of the DFA’s for each of the flows (if packets are out of order). hence if there exists a matching pair. Since. Case 2: Here. Now. For eg. then we start making transitions from the stored states. we move on the respective DFA’s from the stored states.2 Example Lets consider an example: Signature Set = {“hello . 7. implies that no signature (if it exists in the flow) will be extended across more than 2 fragments. So. it there exists a possible match for a signature. it will definitely be in two consecutive fragments. and n is seen earlier then from the DFA transitions it will be matched while if n + 1 is seen earlier then the RDFA moving in reverse direction ensures that matching do takes place and notification is send to the appropriate action taking engine. normal DFA transition analysis will also work in this case. now when we get an packet with next sequence number of one of packets seen. • We have seen fragments with sequence number n. we start the transitions from state 0 on both of them again. we store the pointers in the respective DFA’s for the sequence n. it will be reported. it can maximum go upto index i + k. Handling the 3 cases Case 1: Here the matching in the RDFA becomes redundant. now if a pattern starts matching at place i in fragment 0 ( even n can be taken. Now comes the fragment with sequence number n + 1. Since i <= k which implies i + k <= 2k. there are the following possible cases: • Fragments are in order • We have seen fragments upto sequence n and now some fragment of sequence n + i (where i > 1) arrives. 0 is used without loss of generality) . then fragment payloadsize >= kbytes. and for the packet n + k.

while [17] shows that fragmented packets are quite less and also that our assumption will be true in most of the cases but we regard this is as a limitation • Snort with one DFA ends up using around 58MB of memory of the DFA. “ehs } Stream flow (payloads of packets) = {“whatshel . then DFA will report the match for signature1 and will be in state 3. 7. “lomg } Now. Otherwise. Now.Reverse Signature Set = {“olleh . DFA’s will be as shown in figures below: Figure 7. if first packet comes first. then DFA will be in state 0 while state RDFA will be in state 2. So. now with two DFA’s this almost double. DFA will report another match as it crosses ’o’ in the payload and at the end it will be in state 0. like merging the 2 DFA’s or keeping 2 transition tables rather than 2 DFA’s. 22 .3 Limitations • Assumption (this is also a limitation). but none of them worked. As second packet arrives. We looked at different ways of optimizing the huge memory requirements by our proposed solution.2: DFA and RDFA (respectively) for the above example. RDFA will report the matches for both the signatures and ends up in state 0 as the DFA. while RDFA will be in state 0 itself. if second packet comes first. using suffix trees etc. the tradeoff of network buffers goes into the requirement of more memory. And as the first packet arrives. some are inefficient in terms of speed of matching while some can lead to wrong results.

Some of the commonly employed algorithms belonging to this class are Nearest Neighbor (NN). Unsupervised Support Vector Machines. faults. defects etc. 23 . which indicated that LOF is the best among all these approaches. Anomaly detection approaches build models of normal data and detect deviations from the normal model in observed data. Some popularly used Anomaly Detection Tools/Products: • MINDS (Minnesota Network Intrusion Detection System) using the LOF approach for learning model. based on some measure (which can be distance based or the density based). While [10] has discussed all this in good detail with their relative pros and cons and also their performances on the DARPA [8] as well as real data set. inspects recorded data for anomalous behavior based on a computed score. for the open source IDS Snort. this field has seen application of large number of clustering algorithms from the fields of databases and data mining being employed here. Most anomaly detection algorithms require a set of purely normal data to train the model. Since an outlier may be defined as a data point which is very different from the rest of the data. Distance to the k-th Nearest Neighbor. Mahalanobis-distance Based Outlier. • ADWICE [4] in collaboration with SafeGaurd using the BIRCH clustering algorithm • LANCOPE (a commercial Behavioral Network Anomaly Detection product) • SPADE plug-in. Density Based Local Outliers (LOF)[3]. Balanced Iterative Reducing and Clustering using Hierachies (BIRCH). and they implicitly assume that anomalies can be treated as patterns not observed before.Chapter 8 Anomaly Detection Anomaly detection is a key element of intrusion detection in which perturbations of normal behavior suggest the presence of intentionally or unintentionally induced attacks. Anomaly detection applied to intrusion detection and computer security has been an active area of research since it was originally proposed in ’87. Even one of the most anomaly detection tool namely MINDS [6]has also used this approach in their NIDS.

But. 3. where all the learning data points are clustered first and then when new data comes. false positives rate are much higher than pattern matching techniques. Here. Building of normal profile can be done in one of the two ways: 1. Algorithms or techniques used are not scalable and fast enough to keepup with the gigabit networks requirements of these days. what if some attack or scan going on while assuming normal data etc. 4. Nearest Neighbour etc. Since. 24 . we would look at some of the algorithms and their scalability issues in the next chapter. Scalability is an issue since these systems depend on the network traffic behavior and we have networks today which have diverse and different requirements at times. Some have tried to use them at network level also. former approach that is clustering for anomaly detection. The proper set which can be said to properly and completely define the network behavior is still not available. Efficient data structures are deployed to prevent these computations for the existing data points but for atleast new data points they are quite heavy. Gathering or Capturing the normal data for a network is not feasible always because of variety of reasons like toplogy of network. it is tested for possbility to drop into one of the clusters else declared as outlier. Using measures such as Local Outlier Factors. Latter techniques have been mainly deployed in the anomaly detection systems at system level where intrusion on a host is prime concern. Selection of features for defining the network behavior from the packets is still developing. normal is defined as the regular traffic features of the network. rather than clustering the data points. it is matched with the nearest (which is also defined by these measures) data points and scored as normal or anomalous. Not fast enough because the statistical processing involves heavy computions on each of the incoming packet and with large feature set (which implies large dimension data set) makes computation even more expensive. 100 being anomalous). 2. They work best when you have properly labelled normal data. these are based on statistical analysis. Using one of the clustering algorithms like BIRCH etc.8. some features or statistics are calculated over each of the points and then when the new data arrives.1 Approaches to Anomaly Detection Anomaly Detection involves two parts namely building the normal profile of the network and scoring the new flows on the scale 0 to 100 (0 being normal. 2. heavy computations on both of the existing data and new data point are required to get good results. Some of the limitations of anomaly detection systems being: 1. But they have shortcoming that they require to store all the data points from the learning data and when new data points arrive. Now. can be used.

lets look at the some of the clustering algorithms with their minuses and positives.e. which are able to detect new intrusions and donot suffer from large signature set issues (since it doesnot uses any signature set).e. data has all equal sized clusters. Even their clustering algorithm in very popular scalable clustering algorithm from the databases. hence any new buffer overflow. But we know that when looking for the clustering algorithm for anomaly detection we are looking for one with the following properties: • Clustering should be unique. data has some very dense clusters while some sparse. implies that clusters returned as output should be independent of the order of the input data points. Also.5. But. Event correlation engines are developed for correlating the various events/alerts after a threshold or the rule is found violated. And then collect the data for large periods of time since networks may have different requirements at different times of day or different times of weekdays. ADWICE has looked at the last flaw and even developed a system which is adaptive to network behavior. clustering algorithms has been proposed which works via exploring the “dense” sub-dimensions of the data rather working on the large data set in large dimensions and results are positive. sql injection or any such exploits are still undetectable by these systems. people rather looked for removing the limitations which gave rise to anomaly detection techniques. Even applications of techniques such as SVD (Singular value decomposition). 25 . one can use the pattern matching engine to detect the network attacks. People have tried to work on the expensive(in cpu and memory terms) and time taking computation of these systems. PCA (Principal Component Analysis) etc which reduce the dimensions of the data in such a way that results does not vary much even if complete dimensions were chosen. Lack of adaptiveness of changing network behavior People do try to provide solutions for various shortcomings like in case of the normal data. Application level exploits at network anomaly detection systems are still in developing phase (i. (Since most commonly defined features capture the network behavior from the headers or the flags of the packets) 6. data has some dense and small clusters while some large and sparse and vice-versa etc. rather looking for the superset of misuse detection to be able to detect every intrusion. • Clustering Algorithm should be adaptive that is new data points can be feeded into the appropriate clusters and/or clusters can be modified even in testing time. no product does this). • Clustering should be as accurate as possible and accuracy with distance based approach comes at the cost of time while density based approach (here tradeoff is large memory requirements). scans etc if any goingon. provides much accuracy and can be deployed to consider all cases i.

• Should be able to classify new input points as normal as anomalous efficiently and fastly (keeping in mind gigabit requirements) • (This is optional) But having an Memory and space efficient clustering algorithm would be helpful to convert the product into an inline one. 26 .

DBSCAN etc) • Divisive Partitioning Approaches tries to optimize an function such that the space is divided into the k-partitions and each point is in the best possible partition. 27 . the branching factor of the tree.1 BIRCH . Hierarchical clustering algorithms groups data points into the same cluster initially and then keep partitioning them as some dense clusters start forming (divisive approach). K-Means) 2. scalable and efficient clustering algorithms but here in anomaly detection we have additional requirement of speed which should be much faster than those in databases. It is divided into 4 phases out of which last three are optional and are just for fine tuning the phase 1 clustering. While Grid based approaches. Grid Based Algorithms (eg. Hierarchical Algorithms • Agglomerative (eg. CLIQUE) 3. that is the size of cluster. Clustering Algorithms can be broadly classified into three parts: 1. 9.Balanced Iterative Reducing and Clustering BIRCH [21] is one of the fastest running clustering algorithms with an order of O(N ). BIRCH. slice the n-dimension into the small cells and then forming the dense clusters (this even helps to work with dimension reducibility). These constants to be feeded to the algorithm: • T. • B.Chapter 9 Clustering Algorithms for Anomaly Detection While a lot of research has been going in the field of databases for clustering and data mining for various large. Lets look at some of the clustering algorithms. and vice-versa for the Agglomerative approach. Partitioning Algorithms (eg.

it traverses the tree to find the appropriate leaf node where it can fit into. classifying new data points is easier. then it is inserted there else a new cluster is formed. But. and then it looks for the perfect match in each of the clusters in that leaf node. • Memory efficient (hence easily be built as a inline product) • Because of efficient tree data structure. Positive Points of this algorithm: • Running time is O(n) which is much better compared to other algorithms. then the T is increased so that the cluster sizes are increased and more points can be fitted into each of the clusters. If it can fit into any of the clusters. Now initially the tree is empty and let say T=0. then the leaf is splitted into 2 leaves with a parent above them and clusters are designated to the appropriate leaf nodes.2 DBSCAN . henceforth reducing the cluster count freeing up some memory. If formation of cluster increases the leaf child count by L.e.• P. It just iterates once over all the data points for all the clusters with addition time of log N in each step makes it an O(N log N ) algorithm. Negative Points of this algorithm: • The clustering is not unique. This is because it is possible that two small dense clusters can be joined to form one cluster if the data points occur alternatively from two clusters or in some same order (assumed that T is large enough to encapsulate both the clusters) • Uses distance based measures for all calculations which are known to be less accurate when clusters with different densities and sizes exists. euclidean etc) and the cluster statistics are updated after insertion. It maintains a binary Tree type Tree structure with each node having maximum of B child’s. the memory size available to this process. as new data points keep coming. at sometime if memory cap i.Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise DBSCAN [11] an O(N ∗ log N ) time clustering algorithm has density as the similarity measure between data points rather than the distance based formulas. • Some data points may be classified to wrong clusters because of the limitations of distance based calculations in measurement. All the clusters are at the leaf nodes of the tree. Also. • L. the clustering results depends upon the order of the data points. it has just two input parameters namely 28 . 9. compared to BIRCH. may lead to large false positives. • Clusters formed are spherical. the maximum number of clusters at each leaf node. i. P is reached. also additional phases cleans up some of the errors from phase 1. The fitting of the data point is defined by distance based measure (which can be manhattan. • Requires large numbers of the input params.e.

Eps . 29 . this step goes into recursion (each of the merged cluster points. within which one should look at for finding its neighbours. checks for their points and hence their cluster possibilities) and using the definition of connectedness. Minpts . where. then clusters are merged based on definition of reachibility. Following are the positive points of this algorithm: • Algorithm is density based statistics for clustering which is far more accurate than distance based. the density reachable and density connected for a pair of clusters. then this cluster is named as a cluster of its own and assigned a new clusterID.which is the measure of distance. • Anomaly detection with new data points won’t be much efficient (since for each of the new input) program has to find the Eps-neighbourhood. • Algorithm results in unique clustering results.which define the number of points which must lie within a Epsneighbourhood of a point for it to be core-point. paper also defines the merging mechanism for the clusters which should form one cluster i. one more loop is ran for each of the points in the neighbourhood of this point. checking is done if they also form their own clusters. this is not true though. One may think here that each of the points which has more than Minpts data points within its neighbourhood would be a separate cluster. While these are the negative points of this algorithm: • It also (as in BIRCH) requires two input parameters from the user. • Is able to detect the clusters of any size and shapes. Since. • It is not capable for differentiating clusters with different sizes and different densities since the Eps is pre-defined and fixed all the time. Now. 2. clusters are kept merging unless no more clusters can be merged.e. if yes. For each of the points. In the step where the neighbourhood is find and clusterID is assigned. first find the Eps-neighbourhood of that point (this step takes O(log N ) time. then if there exists more than Minpts data points within this region. using efficient R∗ trees).1.

So. According to which. with all points near the center of cluster and cluster’s threshold ’T’. ’inclusion region’ should be less and should be dependent on the current radius of the cluster rather than some predefined fixed threshold. Hence. In original original algorithm of BIRCH uses a constant same threshold for each of the clusters (named ’T’) which increases whenever we run out of given amount of memory so as to merge some clusters and free some memory.e. define ’inclusion region’ as the spherical region of radius ‘T’ around the center of cluster. For example consider a cluster. the number of points in cluster and its current radius). ’inclusion region’ is independent of the current density of the cluster and same for all clusters. Hence. the inclusion of the new point in a cluster should be dependent on the density of the cluster (i. This cluster can include some of the bad points which are near the boundary. a cluster is dense. density of the cluster etc. Mathematically. But if. Here we tried to reduce the number of false positives by modifying the threshold calculation and cluster bounds. BIRCH uses distance based measures for clustering algorithm. ’inclusion region’ should be relatively large. the measurements will be made on the basis of two more variables t and R where both the terms has been explained below. As already seen in the last section. its distance from the center of the cluster has to be less than ‘T’. we propose a density based mechanism for the deciding the cluster size and threshold which we name as ADWICE-TRAD. For a new point inclusion into a cluster. While for sparse cluster. all clusters have the same threshold size. BIRCH suffers from a lot of shortcomings. Currently. ‘T’. Fixing the same threshold for all clusters is unfair for many of them. • R (additional statistical variable need to be stored with each cluster Cluster feature set) is different for each of the clusters and depends on the current number of points 30 .C h a p t e r 10 ADWICE-TRAD ADWICE[4] is an adaptive anomaly detection algorithm which uses BIRCH[21] as the clustering algorithm for learning the normal data and then classifying the new data as anomalous or normal. So. fixing the same threshold for all clusters is not fine rather it should depend on the cluster properties like points distribution.

The function f n can be logd (n) or just log(n). 31 . d)) d = dimension of the data points. (this can be kept fairly small). R = its current radius + current radius multiplied by some constant and divided by some function of ‘n’.e. threshold requirement becomes R(CFi ) <= max(R (CFi ). threshold requirement in ADWICE-TRAD would be R(CFi ) <= min(max(R (CFi ). i. So.in it and its current Radius (R(CFi ) R (CFi ) = R(CFi ) ∗ (1 + c/f n(n. threshold requirement should be R(CFi ) <= R (CFi ) • But using above expression as measure. for large sparse clusters. d) = some function of ‘n’ and ‘d’ c = some constant. we want an upper bound on the radius of the cluster so as to prevent explosion by some of the clusters. hence define t as the threshold for handling the base cases. So. t ). clustering will suffer in the case of one or very few points in the cluster. since R is automatically calculated from the dataset points. here. n = number of points inside this cluster. t ) • Also. f n(n. So. T ) ADWICE-TRAD requires one additional variable as input to the clustering algorithm namely t .

one needs both of them. 11. that is for a robust intrusion detection system. where we have proposed a solution but still far away from an workable model or an optimized model.C h a p t e r 11 Conclusion and Future Work Both the methodologies of intrusion detection namely misuse and anomaly detection has been widely researched. While we also worked on decreasing the false positive rate from ADWICE by introducing additional statistical parameters for the clusters which introduce some component of density in clustering algorithm. We have looked at two problems namely handling out of order packets in NIDS. • Checking the effectiveness of the proposed fix in ADWICE 32 .1 Future Work There are a number of avenues to pursue the work of optimizing the pattern matching and anomaly detection engines: • Optimizing the proposed solution for handling of out of order packets. • Looking for solutions for other issues of the NIDS’s • Providing solutions for any of the issues in Anomaly detection or combating the limitations of such systems. we have already listed the major issues of both type of systems it is quite clear that none of them can work without the other. Since.

Kent and J. The MINDS .. and J&#246. pages 93–104.ll. and R. In ICISC.Bibliography [1] http://afrodita..rg Sander. [5] T. AAAI Press.anomaly detection with real-time incremental clustering. P. [9] C. A. Raymond T. J.. [8] http://www. and P. 1999. Sander J. Corman. NY.Minnesota Intrusion Detection System. C. [10] Aleksandar Lazarevic. 2003. In SIGMOD ’00: Proceedings of the 2000 ACM SIGMOD international conference on Management of data. Klint. Tan. [4] Kalle Burbeck and Simin Nadjm-Tehrani. 1996.. MIT /AAAI Press. 2000.html. DARPA Intrusion Detection Evaluation.Ertoz E. ACM Press. Mogul. Fragmentation considered harmful. Levent Ertoz. Conf. Hans-Peter Kriegel. Program. Kriegel H. P. In Proc. and J. Jaideep Srivastava.. 2nd int. In SIAM International Conference on Data Mining.co/∼cbedon/snort/spp kickstart.org. Lazarevic.edu/IST/ideval/. pages 407–424.unicauca. USA. MIT Press. Introduction to Algorithms. Kumar V. New York. Adwice . 14(4):490–520. A density-based algorithm for discovering clusters in large spatial databases with noise. WRL Technical Report 87/3. A comparative study of anomaly detection schemes in network intrusion detection. Lang. [3] Markus M. Rekers. [7] J. and Vipin Kumar. Syst. [11] Ester M. 2004. 1987. ACM Trans. [2] http://bro-ids. Ng. in “Next Generation Data Mining”. Dokas. [6] L.mit. 33 . C.. 1992. Lof: identifying density-based local outliers. Heering. 1990.-P. Eilertson A. on Knowledge Discovery and Data Mining (KDD 96). Aysel Ozgur. Breunig. Leiserson. A. Incremental generation of lexical scanners. Rivest. and Xu X. Srivastava. 2004.edu.

2004. [14] Judy Novak. 2001. BIRCH: an efficient data clustering method for very large databases.Paxon. Sommer and V. Paxson. [13] Marc Norton. Automatic generation and analysis of nids attacks. pages 28–38. Fast regular expression matching using fpgas. 2003. B. 2004. Sherwood. 2001. Technical report. DC. [17] C. Ptacek and Timothy N. evasion. [19] R. 34 . Washington. April. In ACSAC ’04: Proceedings of the 20th Annual Computer Security Applications Conference (ACSAC’04). and denial of service: Eluding network intrusion detection. and end-to-end protocol semantics. Alberta. [15] Thomas H. Enhancing byte-level network intrusion detection signatures with context.Kreibich M. 1201 5th Street S. Calder. 2004. [18] R.. Secure Networks. Prasanna. D. and G. 1998. Network intrusion detection: Evasion. Canada. USA. Optimizing pattern matching for intrusion detection. and K. 1996. [20] N. Shannon.of the 10th USENIX Security Symposium (Security ’01).[12] C. Inc. T2R-0Y6. Suite 330. traffic normalization. Calgary. Claffy. Somesh Jha. Newsham. Tuck.Handley and V. Deterministic memoryefficient string matching algorithms for intrusion detection. 2001. T. pages 103–114. Raghu Ramakrishnan. Sidhu and V. IEEE Computer Society. Miller. Characteristics of fragmented ip traffic on internet links. Target-based fragmentation reassembly. [16] Shai Rubin.W. Varghese. Moore. Insertion. and Miron Livny. 2005. and Barton P. [21] Tian Zhang. Proc.

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->