Efficient Regular Expression Evaluation: Theory To Practice: Michela Becchi and Patrick Crowley

Efficient Regular Expression Evaluation: Theory to Practice
Michela Becchi and Patrick Crowley
ANCS08
Motivation
Size and complexity of rule-set increased in recent years

Snort, as of November 2007
8536 rules, 5549 Perl Compatible Regular Expressions 99% with character ranges ([c1-ck],\s,\w) 16.3 % with dot-star terms (.*, [^c1..ck]* 44 % with counting constraints (.{n.m}, [^c1..ck]{n,m})
Several proposals to accelerate regular expression matching

FPGA Memory centric architecture
Michela Becchi 2/27/2008 11/06/2008
Objectives
Can we converge distinct algorithmic techniques into a single proposal also for large data-sets? Can we apply techniques intended for memory centric architectures also on FPGAs?
Provide tool to allow anybody to implement a high throughput DPI system on the architecture of choice
Michela Becchi 2/27/2008 11/06/2008
Target Architectures
Regex-Matching Engine
Memory-centric architectures
FPGA logic
General purpose processors
Network processors
FPGA / ASIC + memory
available parallelism
Michela Becchi 2/27/2008 11/06/2008
Challenges
DFA NFA Memory-centric architectures
FPGA logic
Logic cell utilization Clock frequency
General purpose processors
Network processors
FPGA / ASIC + memory
Memory space Memory bandwidth
Michela Becchi 2/27/2008 11/06/2008
D2FA: default transition compression
Observations:
DFA state: set of || next state pointers Transition redundancy
Idea:
Differential state representation through use of non-consuming default transitions
a s1 b s3 s4 s5 s3 b c s4 s6 s2 c s6 s1 a b c s3 s4 s5
c
a
s2
In general:
DEFAULT PATH
c1
Michela Becchi 2/27/2008 11/06/2008
c2
c3
6
c1
c4
D2FA algorithms
Problem: set default transitions so to

1. 2. Maximize memory compression Minimize memory bandwidth overhead
[Kumar et al, SIGCOMM06]

Bound dpMAX on max default path length O(dpMAX+1) memory accesses per input char Better compression for higher dpMAX
[Becchi et al, ANCS07]

Only backward-directed default transitions (skipping k levels) Amortized memory bandwidth O((k+1/k)N) on N input chars Depth-first traversal at DFA creation
Memory bandwidth = O((dpMAX+1)N) Time complexity = O(n2logn) Space complexity = O(n2)
vs.
Memory bandwidth = O((k+1/k)N) Time complexity =O(n2) Space complexity =O(n)
Compression w/ k=1 ~ compression w/ dpMAX=

Michela Becchi 2/27/2008 11/06/2008
7
DFA alphabet reduction

[a-z] 3/1 [a-zA-Z]
1
0
[0-9B-Z] A 4/2 [a-zA-Z]
Effective for: Ignore-case regex Char-ranges Never used chars
2
A
5/3
[B-Z]
[a-z]
0 1 2 3 4
3/1
[0-2]
A [B-Z] [0-9] [^0-9a-zA-Z]
0
[2-3] 1
4/2
[0-2]
2
1
5/3
Alphabet translation table
Michela Becchi 2/27/2008 11/06/2008
Multiple-stride DFAs

[Brodie et al, ISCA 2006] Idea:

Process stride input chars at a time
DFA
a:1-8 a 0 b 5 b:2-8 c 6 e 7 f
8/2
DFA w/ stride 2
d b 2 c 3 e d
4/1
[a-f]a [a-cef]a 1 1
[a -f] a
da
1/1
bc
dd
ab 2 2
[b-f]b
4/1
ab
[b-f ]b bc
5 6
Observations:
Mechanism used on small DFAs (1-2 regex) No distinct accepting state handling
Michela Becchi 2/27/2008 11/06/2008
9
Multiple stride + alphabet reduction
Stride s Alphabet s
=ASCII alphabet | 2|=2562=65,536; | 4|=2564~4,294M
Effective alphabet much smaller

Char grouping: [a-cef]a, [b-f]b 2-DFA
[a -f] a
[a-f]a [a-cef]a 1 ab 2 bc 3
da
dd
1/1
DFA
a 0 b
a:1-8 1 b 2 c 3 e 5 b:2-8 c 6 e 7 f d
d
4/1
4/1
ab
0
[b-f]b [b-f]b
8/2
[b-f ]b bc
5 6
Alphabet reduction may be necessary to make stride doubling feasible on large DFAs
DFA alphabet reduction Stride doubling alphabet reduction 2-DFA Stride doubling alphabet reduction 4-DFA
TxTable1
TxTable2,1
10
TxTable4,2,1
Michela Becchi 2/27/2008 11/06/2008
Multiple stride + default transitions
Compression
Default transitions eliminate transition redundancy In multiple stride DFAs
# of states does not substantially change # of transitions per state increases exponentially ( stride ) Fraction distinct/total transitions decreases Increased potential for compression!
Accepting state handling

DFA
a 0 b 5 b:2-8 c 6 e 7 f
8/2
2-DFA
d
4/1
[a-f]a 1 bc 3
dd
1/1
a:1-8 1 b 2 c 3 e d
4/1
0/1
2 0 5
Duplicated states have same outgoing transitions as original states but different depth
Default transition will remove all outgoing transitions from new accepting states
Michela Becchi 2/27/2008 11/06/2008
11
Multiple stride + default transitions (contd)
Problem:
For large and stride, uncompressed DFA may be unfeasible
Out of memory when generating a 2K node, stride 4 DFA on a Linux machine w/ 4GB memory
Solution
Perform default transition compression during DFA creation
Use [Becchi et al, ANCS 2006] compression algorithm In the situation above, only 10% memory used
alphabet reduction Stride doubling + compression alphabet reduction compressed 2-DFA Stride doubling + compression alphabet reduction compressed 4-DFA
DFA
TxTable1
TxTable2,1
TxTable4,2,1
Michela Becchi 2/27/2008 11/06/2008
12
Putting everything together

1-22 regex 48-1,940 states
DFA
alphabet reduction Stride-2 transformation
||=25-44 96.3-98.5% transitions removed
default transition compression
Stride-2 DFA
Compressed DFA
avg 1-2 labeled tx/state
||=53-470
alphabet reduction
97.9-99.5% transitions removed
default transition compression
Same memory bandwidth requirement Initial size=40X-80X final size
Avg 3-5 labeled tx/state
Compressed Stride-2 DFA

13
Michela Becchi 2/27/2008 11/06/2008
NFA
b
1
a a *
2
b
3 7
*
4/1
1. 2. 3. 4. 5.
5
a
b b
6
b
8/2
ab+cd ab+ce ab+c.*f b[d-f]a bdc
0
b b
9 13 16
d-f
d
10
11
15/4
12/3
14 17
a
c
4/1 5/2
18/5
a * b
3
* f f
1
b
2
*
0 8
e-f d
6
10/4
7/3
a a
c
11
Michela Becchi 2/27/2008 11/06/2008
12/5
14
Multiple stride + alphabet reduction
Stride doubling
NFA
* * b
4/1
Avoid new state creation
5
* c
e cd
6/2
2-NFA
* ab b.
.c
bc
d.
4/1
Keep multiple transitions on the same symbol separated
.a
ac
1
cc
ce ce,e.
5
cc
6/2
Alphabet reduction:
Clustering-based algorithm as for DFA, but sets of target states are compared
Michela Becchi 2/27/2008 11/06/2008
15
FPGA implementation
INIT INPUT klog|| log||
r MATCH
Alphabet Tx
CLK
Decoder
||
NFA
Quine-McCluskey like minimization scheme
One-hot encoding [Sidhu & Prasanna]
S2 ci S3 ck
c1
cm S1 cn
=
S2
S3
S1
cm cn
ci ck
+ logic reduction schemes
c2
-{bBcCdD}={aA}
(c1=b OR c1=B) AND NOT (c2=a OR c2=A)
S2 ci
S1
S2
S3
S1
reset
S1
-{ci,ck}
= S3
S1
ci ck
ci
Michela Becchi 2/27/2008 11/06/2008
16
FPGA Results - throughput

8 7
Throughput (Gpbs)
6 5 4 3
stride 1, full alp. stride 1, red. alp.

stride 2, red. alp.
2
1
0
any_99 mail_79 http_406
Rule-set
Michela Becchi 2/27/2008 11/06/2008
17
FPGA Results logic utilization

4000 3500
#s=7,864 1=64 2=2206 #s=2,086 1=78 2=1,969
3000 2500
# slices
2000
1500 1000 500 0
#s=2,147 1=68 2=1640
stride 1, full alp. stride 1, red. alp. stride 2, red. alp.
any_99
mail_79
http_406
Rule-set
Utilization:
8-46% on XC5VLX50 device (7,400 slices) XC5VLX330 device has 51,840 slices
Michela Becchi 2/27/2008 11/06/2008
18
ASIC projected results

Regex partitioning into multiple DFAs Stride = 1 Memory footprint Rule -set || k-NFA any k-DFA any1 any2 any3 78 59 45 60 #states 2,086 23,846 86,977 14,084 Compressed Full states states 505KB 2.9 MB 299MB 200 KB 55 KB 48 KB || 1969 850 579 627 #states 2,091 28,223 102,940 19,344 Content addressing w/ 64 bit words: -98% states compressed w/ stride 1 -82% states compressed w/ stride 2 Stride = 2 Memory footprint Compressed Full states states 356KB 1.27MB 244KB 32MB 81MB 16 MB
Throughput: SRAM@500 MHz 2-4 Gbps for stride 1 4-8 Gbps for stride 2
Alternative representation: decoders in ASIC or instruction memory
Michela Becchi 2/27/2008 11/06/2008
19
Conclusion
Algorithm:
Combination of default transition compression, alphabet reduction and stride multiplying on potentially large DFAs Extension of alphabet reduction and stride multiplying to NFAs
FPGA Implementation:
Use of one-hot encoding w/ incremental improvement schemes Logic minimization scheme for alphabet reduction & decoding
Additional aspects:
Multiple flow handling: FPGA vs. memory centric architectures Design improvements tailored to specific architectures and data-sets:
Clustering into smaller NFAs and DFAs to allow smaller alphabets w/ larger strides
Michela Becchi 2/27/2008 11/06/2008
20
Thank you!
Questions?
http://regex.wustl.edu
Michela Becchi 2/27/2008 11/06/2008
21

Efficient Regular Expression Evaluation: Theory To Practice: Michela Becchi and Patrick Crowley

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Efficient Regular Expression Evaluation: Theory To Practice: Michela Becchi and Patrick Crowley

Uploaded by

Copyright:

Available Formats

Efficient Regular Expression Evaluation: Theory to Practice

Michela Becchi and Patrick Crowley

Size and complexity of rule-set increased in recent years

Several proposals to accelerate regular expression matching

Michela Becchi 2/27/2008 11/06/2008

Michela Becchi 2/27/2008 11/06/2008

General purpose processors

FPGA / ASIC + memory

Michela Becchi 2/27/2008 11/06/2008

General purpose processors

FPGA / ASIC + memory

Memory space Memory bandwidth

Michela Becchi 2/27/2008 11/06/2008

D2FA: default transition compression

Problem: set default transitions so to

[Kumar et al, SIGCOMM06]

[Becchi et al, ANCS07]

Memory bandwidth = O((dpMAX+1)N) Time complexity = O(n2logn) Space complexity = O(n2)

Memory bandwidth = O((k+1/k)N) Time complexity =O(n2) Space complexity =O(n)

Compression w/ k=1 ~ compression w/ dpMAX=

DFA alphabet reduction

Effective for: Ignore-case regex Char-ranges Never used chars

A [B-Z] [0-9] [^0-9a-zA-Z]

Alphabet translation table

Michela Becchi 2/27/2008 11/06/2008

[Brodie et al, ISCA 2006] Idea:

Multiple stride + alphabet reduction

Effective alphabet much smaller

Michela Becchi 2/27/2008 11/06/2008

Multiple stride + default transitions

Accepting state handling

Multiple stride + default transitions (contd)

Michela Becchi 2/27/2008 11/06/2008

Putting everything together

alphabet reduction Stride-2 transformation

||=25-44 96.3-98.5% transitions removed

default transition compression

avg 1-2 labeled tx/state

97.9-99.5% transitions removed

default transition compression

Same memory bandwidth requirement Initial size=40X-80X final size

Avg 3-5 labeled tx/state

Compressed Stride-2 DFA

Michela Becchi 2/27/2008 11/06/2008

ab+cd ab+ce ab+c.*f b[d-f]a bdc

Multiple stride + alphabet reduction

Avoid new state creation

Keep multiple transitions on the same symbol separated

Michela Becchi 2/27/2008 11/06/2008

Quine-McCluskey like minimization scheme

One-hot encoding [Sidhu & Prasanna]

+ logic reduction schemes

Michela Becchi 2/27/2008 11/06/2008

FPGA Results - throughput

stride 1, full alp. stride 1, red. alp.

Michela Becchi 2/27/2008 11/06/2008

FPGA Results logic utilization

#s=2,147 1=68 2=1640

stride 1, full alp. stride 1, red. alp. stride 2, red. alp.

Michela Becchi 2/27/2008 11/06/2008

ASIC projected results

Alternative representation: decoders in ASIC or instruction memory

Michela Becchi 2/27/2008 11/06/2008