Welcome to Scribd. Sign in or start your free trial to enjoy unlimited e-books, audiobooks & documents.Find out more
Standard view
Full view
of .
Look up keyword
0 of .
Results for:
No results containing your search query
P. 1
Regular Expressions

Regular Expressions

Views: 51|Likes:
Published by vigyaan
Regular expressions are at the heart of most security solutions. This article describes several architectural considerations when designing a good regular expression system. Read other similar articles on http://www.vigyaan.com
Regular expressions are at the heart of most security solutions. This article describes several architectural considerations when designing a good regular expression system. Read other similar articles on http://www.vigyaan.com

More info:

Published by: vigyaan on Mar 25, 2010
Copyright:Attribution Non-commercial


Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less





This is a technical brief on regular expression which is at the heart of all signature and pattern
matching systems such as Anti-Virus, Anti-Spam, Intrusion Prevention Systems, Data Loss
Prevention, Web Application Firewalls etc.

Vatsal Mehta\u2013 ht t p: // www. vig ya an . c o m

Regular Expression Matching (i.e. the technology by which you match a given set of expression with a larger set of data) is a technical term used in defining many of today's security technologies such as Gateway Anti-Virus, Intrusion Detection and Prevention Systems, Data Loss Prevention systems, Unified Threat Management Systems, Cross Site Scripting Protection, URL Filtering etc.

There is a constant stream of data which keeps being checked for known threats or attacks. While RegEx matching is at the heart of these techniques, many advanced techniques can be additionally used to prevent or detect attacks such as correlation of data from multiple sites, analysis of multiple connections.


From a user's perspective the biggest complaint of a RegEx matching system are false positives. A false positive is like the story of child crying wolf. If your system generates too many alerts, then the accurate alerts decrease in relevance. Another issue is false negatives; the system might miss some attacks. The third issue is signature updates, how long does signature updates take, how frequently they are required and what does the system do to active sessions.

These are all valid topics and we will discuss them in this article after
discussing the basics of RegEx matching.
There are primarily two types of RegEx matching technologies:
a) Deterministic Finite Automata\u2013 DFA


Regular Expressions
are the heart of most
security solutions.
Learn about the
different tradeoffs in
designing high
performance and
scalable regular
expression matching

b) Non-Deterministic Finite Automata - NFA
We'll go into the details of these in subsequent articles.

For any kind of RegEx matching, the most important criteria is a rule set. The regular expression rules, that the string is supposed to be matched against. For today's speeds, for a personal computer this can be done in software, but for multiple machines behind a gateway, this has to be done in hardware.

The signature rule set is then "compiled" into a format that is easily used by the RegEx matching engine. This
might remind you of your anti-virus engine downloading signature updates frequently.

These RegEx patterns are specific to the application. For example a URL Filtering system will be looking for URLs in a stream of data. It may need to understand the different protocols that go through the system, and as soon as it sees a defined protocol, it will start the state transitioning process.

For a page that does specific text matching, the rules may become more complex - remember the data is a stream of bits and bytes to the engines and everything is a sets of 0s and 1s. So let's say we are interested in looking for the word "cooking" in a set of web pages.

First we need to understand web pages. Web pages have different encoding schemes, the can be in different languages or can be different versions of the same language (Unicode). And last but not least, it can be case sensitive.

A good RegEx engine will take all this in consideration when it compiles the rule set, so the rule set doesn't

grow exponentially - it would not be expensive to catch CooKing as well as CooKING through this method.
So the state transitions would be similar to the following:
State 1: c or C
Sate 2: o or O (if not go back to State 1)
and so on ...

Now comes the tough part of signature updates.
Let's say you have just discovered a serious new attack and you created a signature for it. You'll need to push
this to the field with the least disruption and maximum protection possible.

What are all the decision factors?
What is the processing power needed to compile the new rule set? This would help decide whether the
compiling takes place in the field or at the headquarters.

Can the new signature be an incremental update or the whole rule set needs to be recompiled. Of
course we would all like an incremental update, but that depends on the technology and the trade-offs
we did on the regular expression engine.


How are existing sessions being handled. Do they need to be dropped? This probably depends on the severity of the vulnerability. Typically a RegEx matching system is a parallel processing engine, and some engines can be turned off to do updates while other engines process the traffic...

If you want to play with a RegEx engine I would suggest the following:
- Snort (http://www.snort.org)
- Modsecurity (http://www.modsecurity.org/)

Now let's talk about False Positives:

Activity (2)

You've already reviewed this. Edit your review.
1 hundred reads
Sweetyk1987 liked this

You're Reading a Free Preview

/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->