Professional Documents
Culture Documents
ANCS08
Motivation
Objectives
Can we converge distinct algorithmic techniques into a single proposal also for large data-sets? Can we apply techniques intended for memory centric architectures also on FPGAs?
Provide tool to allow anybody to implement a high throughput DPI system on the architecture of choice
Target Architectures
Regex-Matching Engine
Memory-centric architectures
FPGA logic
Network processors
available parallelism
Challenges
DFA NFA Memory-centric architectures
FPGA logic
Logic cell utilization Clock frequency
Network processors
Observations:
DFA state: set of || next state pointers Transition redundancy
Idea:
Differential state representation through use of non-consuming default transitions
a s1 b s3 s4 s5 s3 b c s4 s6 s2 c s6 s1 a b c s3 s4 s5
c
a
s2
In general:
DEFAULT PATH
c1
Michela Becchi 2/27/2008 11/06/2008
c2
c3
6
c1
c4
D2FA algorithms
vs.
1
0
[0-9B-Z] A 4/2 [a-zA-Z]
2
A
5/3
[B-Z]
[a-z]
0 1 2 3 4
3/1
[0-2]
0
[2-3] 1
4/2
[0-2]
2
1
5/3
Multiple-stride DFAs
DFA w/ stride 2
d b 2 c 3 e d
4/1
[a-f]a [a-cef]a 1 1
[a -f] a
da
1/1
bc
dd
ab 2 2
[b-f]b
4/1
ab
[b-f ]b bc
5 6
Observations:
Mechanism used on small DFAs (1-2 regex) No distinct accepting state handling
Michela Becchi 2/27/2008 11/06/2008
9
Stride s Alphabet s
=ASCII alphabet | 2|=2562=65,536; | 4|=2564~4,294M
da
dd
1/1
DFA
a 0 b
a:1-8 1 b 2 c 3 e 5 b:2-8 c 6 e 7 f d
d
4/1
4/1
ab
0
[b-f]b [b-f]b
8/2
[b-f ]b bc
5 6
Alphabet reduction may be necessary to make stride doubling feasible on large DFAs
DFA alphabet reduction Stride doubling alphabet reduction 2-DFA Stride doubling alphabet reduction 4-DFA
TxTable1
TxTable2,1
10
TxTable4,2,1
Compression
Default transitions eliminate transition redundancy In multiple stride DFAs
# of states does not substantially change # of transitions per state increases exponentially ( stride ) Fraction distinct/total transitions decreases Increased potential for compression!
2-DFA
d
4/1
[a-f]a 1 bc 3
dd
1/1
a:1-8 1 b 2 c 3 e d
4/1
0/1
2 0 5
Duplicated states have same outgoing transitions as original states but different depth
Default transition will remove all outgoing transitions from new accepting states
Michela Becchi 2/27/2008 11/06/2008
11
Problem:
For large and stride, uncompressed DFA may be unfeasible
Out of memory when generating a 2K node, stride 4 DFA on a Linux machine w/ 4GB memory
Solution
Perform default transition compression during DFA creation
Use [Becchi et al, ANCS 2006] compression algorithm In the situation above, only 10% memory used
alphabet reduction Stride doubling + compression alphabet reduction compressed 2-DFA Stride doubling + compression alphabet reduction compressed 4-DFA
DFA
TxTable1
TxTable2,1
TxTable4,2,1
12
DFA
Stride-2 DFA
Compressed DFA
||=53-470
alphabet reduction
NFA
b
1
a a *
2
b
3 7
*
4/1
1. 2. 3. 4. 5.
5
a
b b
6
b
8/2
0
b b
9 13 16
d-f
d
10
11
15/4
12/3
14 17
a
c
4/1 5/2
18/5
a * b
3
* f f
1
b
2
*
0 8
e-f d
6
10/4
7/3
a a
c
11
Michela Becchi 2/27/2008 11/06/2008
12/5
14
Stride doubling
NFA
* * b
4/1
5
* c
e cd
6/2
2-NFA
* ab b.
.c
bc
d.
4/1
.a
ac
1
cc
ce ce,e.
5
cc
6/2
Alphabet reduction:
Clustering-based algorithm as for DFA, but sets of target states are compared
15
FPGA implementation
INIT INPUT klog|| log||
r MATCH
Alphabet Tx
CLK
Decoder
||
NFA
S2 ci S3 ck
c1
cm S1 cn
=
S2
S3
S1
cm cn
ci ck
c2
-{bBcCdD}={aA}
(c1=b OR c1=B) AND NOT (c2=a OR c2=A)
S2 ci
S1
S2
S3
S1
reset
S1
-{ci,ck}
= S3
S1
ci ck
ci
16
Throughput (Gpbs)
6 5 4 3
2
1
0
any_99 mail_79 http_406
Rule-set
17
3000 2500
# slices
2000
1500 1000 500 0
any_99
mail_79
http_406
Rule-set
Utilization:
8-46% on XC5VLX50 device (7,400 slices) XC5VLX330 device has 51,840 slices
18
Throughput: SRAM@500 MHz 2-4 Gbps for stride 1 4-8 Gbps for stride 2
19
Conclusion
Algorithm:
Combination of default transition compression, alphabet reduction and stride multiplying on potentially large DFAs Extension of alphabet reduction and stride multiplying to NFAs
FPGA Implementation:
Use of one-hot encoding w/ incremental improvement schemes Logic minimization scheme for alphabet reduction & decoding
Additional aspects:
Multiple flow handling: FPGA vs. memory centric architectures Design improvements tailored to specific architectures and data-sets:
Clustering into smaller NFAs and DFAs to allow smaller alphabets w/ larger strides
20
Thank you!
Questions?
http://regex.wustl.edu
21