Efficient Regular Expression Evaluation: Theory to Practice

Michela Becchi and Patrick Crowley

ANCS’08

Motivation

Size and complexity of rule-set increased in recent years
» Snort, as of November 2007
– 8536 rules, 5549 Perl Compatible Regular Expressions  99% with character ranges ([c1-ck],\s,\w…)  16.3 % with dot-star terms (.*, [^c1..ck]*  44 % with counting constraints (.{n.m}, [^c1..ck]{n,m})

Several proposals to accelerate regular expression matching
» FPGA » Memory centric architecture

Michela Becchi – 2/27/2008 11/06/2008

2

Objectives  Can we converge distinct algorithmic techniques into a single proposal also for large data-sets? Can we apply techniques intended for memory centric architectures also on FPGAs?  Provide tool to allow anybody to implement a high throughput DPI system on the architecture of choice Michela Becchi – 2/27/2008 11/06/2008 3 .

Target Architectures Regex-Matching Engine Memory-centric architectures FPGA logic General purpose processors Network processors FPGA / ASIC + memory available parallelism Michela Becchi – 2/27/2008 11/06/2008 4 .

Challenges DFA NFA Memory-centric architectures FPGA logic Logic cell utilization  Clock frequency  General purpose processors Network processors FPGA / ASIC + memory Memory space  Memory bandwidth  Michela Becchi – 2/27/2008 11/06/2008 5 .

D2FA: default transition compression  Observations: » DFA state: set of |∑| next state pointers » Transition redundancy  Idea: » Differential state representation through use of non-consuming default transitions a s1 b s3 s4 s5 s3 b c s4 s6 s2 c s6 s1 a b c s3 s4 s5 c a s2  In general: DEFAULT PATH ∑ c1 Michela Becchi – 2/27/2008 11/06/2008 c2 c3 6 c1 c4 .

ANCS’07] » Only backward-directed default transitions (skipping k levels) » Amortized memory bandwidth O((k+1/k)N) on N input chars » Depth-first traversal → at DFA creation Memory bandwidth = O((dpMAX+1)N) Time complexity = O(n2logn) Space complexity = O(n2) vs. Maximize memory compression Minimize memory bandwidth overhead  [Kumar et al.D2FA algorithms  Problem: set default transitions so to 1. Memory bandwidth = O((k+1/k)N) Time complexity =O(n2) Space complexity =O(n) Compression w/ k=1 ~ compression w/ dpMAX=∞ Michela Becchi – 2/27/2008 11/06/2008 7 . SIGCOMM’06] » Bound dpMAX on max default path length » O(dpMAX+1) memory accesses per input char » Better compression for higher dpMAX  [Becchi et al. 2.

DFA alphabet reduction [a-z] 3/1 [a-zA-Z] 1 0 [0-9B-Z] A 4/2 [a-zA-Z] Effective for:  Ignore-case regex  Char-ranges  Never used chars 2 A 5/3 [B-Z]  [a-z] ’ 0 1 2 3 4 0 3/1 [0-2] 1 = A [B-Z] [0-9] [^0-9a-zA-Z] + 0 [2-3] 1 4/2 [0-2] 2 1 5/3 2 Alphabet translation table Michela Becchi – 2/27/2008 11/06/2008 8 .

Multiple-stride DFAs   [Brodie et al. ISCA 2006] Idea: » Process stride input chars at a time DFA a:1-8 a 0 b 5 b:2-8 c 6 e 7 f 8/2 DFA w/ stride 2 d b 2 c 3 e d 4/1 [a-f]a [a-cef]a 1 1 [a -f] a da 1/1 1 bc 3 dd ab 2 2 [b-f]b 4/1 ab 0 [b-f ]b bc … 5 6  Observations: » Mechanism used on small DFAs (1-2 regex) » No distinct accepting state handling Michela Becchi – 2/27/2008 11/06/2008 9 .

294M  Effective alphabet much smaller » Char grouping: [a-cef]a.1 Michela Becchi – 2/27/2008 11/06/2008 .Multiple stride + alphabet reduction  Stride s → Alphabet ∑s » ∑=ASCII alphabet ►| ∑2|=2562=65.2. | ∑4|=2564~4.536.1 10 TxTable4. [b-f]b 2-DFA [a -f] a [a-f]a [a-cef]a 1 ab 2 bc 3 da dd 1/1 DFA a 0 b a:1-8 1 b 2 c 3 e 5 b:2-8 c 6 e 7 f d d 4/1 4/1 ab 0 [b-f]b [b-f]b 8/2 [b-f ]b bc … 5 6  Alphabet reduction may be necessary to make stride doubling feasible on large DFAs DFA alphabet reduction Stride doubling alphabet reduction 2-DFA Stride doubling alphabet reduction 4-DFA TxTable1 TxTable2.

Multiple stride + default transitions  Compression » Default transitions eliminate transition redundancy » In multiple stride DFAs – # of states does not substantially change – # of transitions per state increases exponentially (  stride ) Fraction distinct/total transitions decreases Increased potential for compression!  Accepting state handling DFA a 0 b 5 b:2-8 c 6 e 7 f 8/2 2-DFA d 4/1 [a-f]a 1 bc 3 dd 1/1 a:1-8 1 b 2 c 3 e d 4/1 0/1 2 0 5 6 » Duplicated states have same outgoing transitions as original states but different depth – Default transition will remove all outgoing transitions from new accepting states Michela Becchi – 2/27/2008 11/06/2008 11 .

uncompressed DFA may be unfeasible – Out of memory when generating a 2K node. stride 4 DFA on a Linux machine w/ 4GB memory  Solution » Perform default transition compression during DFA creation – Use [Becchi et al. only 10% memory used alphabet reduction Stride doubling + compression alphabet reduction compressed 2-DFA Stride doubling + compression alphabet reduction compressed 4-DFA DFA TxTable1 TxTable2.1 Michela Becchi – 2/27/2008 11/06/2008 12 . ANCS 2006] compression algorithm  In the situation above.Multiple stride + default transitions (cont’d)  Problem: » For large ∑ and stride.2.1 TxTable4.

5% transitions removed default transition compression - Same memory bandwidth requirement Initial size=40X-80X final size - Avg 3-5 labeled tx/state Compressed Stride-2 DFA 13 Michela Becchi – 2/27/2008 11/06/2008 .3-98.5% transitions removed default transition compression Stride-2 DFA Compressed DFA avg 1-2 labeled tx/state ||=53-470 alphabet reduction 97.9-99.940 states DFA alphabet reduction Stride-2 transformation ||=25-44 96.Putting everything together… 1-22 regex 48-1.

5. 3. 5 a b b 6 b c e 8/2 ab+cd ab+ce ab+c. 2. 4.NFA b 1 a a * b 2 b c 3 7 * d 4/1 1.*f b[d-f]a bdc 0 b b 9 13 16 d-f d 10 c 11 15/4 f 12/3 14 17 a c d 4/1 5/2 18/5 a * b b c 3 * f f e 1 b 2 * 0 8 e-f d 6 10/4 7/3 9 a a c 11 Michela Becchi – 2/27/2008 11/06/2008 12/5 14 .

4/1 Keep multiple transitions on the same symbol separated 0 . 2 .e. 5 cc 6/2  Alphabet reduction: » Clustering-based algorithm as for DFA.c bc 3 d.Multiple stride + alphabet reduction  Stride doubling NFA * * b 0 a 2 c 3 d 4/1  1 Avoid new state creation c 5 * c e cd 6/2  2-NFA * ab b.a ac 1 cc ce ce. but sets of target states are compared Michela Becchi – 2/27/2008 11/06/2008 15 .

FPGA implementation INIT INPUT klog|∑| log|∑’| r MATCH Alphabet Tx CLK Decoder |∑’| NFA Quine-McCluskey like minimization scheme One-hot encoding [Sidhu & Prasanna] S2 ci S3 ck c1 cm S1 cn = S2 S3 S1 cm cn ci ck + logic reduction schemes c2 ∑-{bBcCdD}={aA} (c1=b OR c1=B) AND NOT (c2=a OR c2=A) S2 ci S1 = S2 S3 S1 reset S1 ∑-{ci.ck} = S3 S1 ci ck ci Michela Becchi – 2/27/2008 11/06/2008 16 .

throughput 8 7 Throughput (Gpbs) 6 5 4 3 stride 1. stride 2. alp. alp. red. full alp. stride 1. red.FPGA Results . 2 1 0 any_99 mail_79 http_406 Rule-set Michela Becchi – 2/27/2008 11/06/2008 17 .

400 slices) » XC5VLX330 device has 51.840 slices Michela Becchi – 2/27/2008 11/06/2008 18 . alp.FPGA Results – logic utilization 4000 3500 #s=7.147 ∑1=68 ∑2=1640 stride 1. any_99 mail_79 http_406 Rule-set  Utilization: » 8-46% on XC5VLX50 device (7.969 3000 2500 # slices 2000 1500 1000 500 0 #s=2. stride 1. alp. red. red.086 ∑1=78 ∑2=1.864 ∑1=64 ∑2=2206 #s=2. full alp. stride 2.

344 Content addressing w/ 64 bit words: -98% states compressed w/ stride 1 -82% states compressed w/ stride 2 Stride = 2 Memory footprint Compressed Full states states 356KB 1.977 14.940 19.ASIC – projected results Regex partitioning into multiple DFAs Stride = 1 Memory footprint Rule -set |Σ| k-NFA any k-DFA any1 any2 any3 78 59 45 60 #states 2.086 23.091 28.27MB 244KB 32MB 81MB 16 MB Throughput: SRAM@500 MHz  2-4 Gbps for stride 1  4-8 Gbps for stride 2 Alternative representation: decoders in ASIC or instruction memory Michela Becchi – 2/27/2008 11/06/2008 19 .9 MB 299MB 200 KB 55 KB 48 KB |Σ| 1969 850 579 627 #states 2.084 Compressed Full states states 505KB 2.846 86.223 102.

memory centric architectures » Design improvements tailored to specific architectures and data-sets: – Clustering into smaller NFAs and DFAs to allow smaller alphabets w/ larger strides Michela Becchi – 2/27/2008 11/06/2008 20 . alphabet reduction and stride multiplying on potentially large DFAs » Extension of alphabet reduction and stride multiplying to NFAs  FPGA Implementation: » Use of one-hot encoding w/ incremental improvement schemes » Logic minimization scheme for alphabet reduction & decoding  Additional aspects: » Multiple flow handling: FPGA vs.Conclusion  Algorithm: » Combination of default transition compression.

Thank you!  Questions? http://regex.wustl.edu Michela Becchi – 2/27/2008 11/06/2008 21 .

Sign up to vote on this title
UsefulNot useful