0% found this document useful (0 votes)
14 views10 pages

IP Lookup Module with Completion Buffer

The document describes an IP lookup module that uses a completion buffer to ensure packets exit in order. It discusses how the completion buffer is implemented as a circular buffer and explains how the methods of the completion buffer and IP lookup module must execute concurrently to avoid dead cycles.

Uploaded by

test1hj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views10 pages

IP Lookup Module with Completion Buffer

The document describes an IP lookup module that uses a completion buffer to ensure packets exit in order. It discusses how the completion buffer is implemented as a circular buffer and explains how the methods of the completion buffer and IP lookup module must execute concurrently to avoid dead cycles.

Uploaded by

test1hj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

IP Lookup-2:

The Completion Buffer

Arvind
Computer Science & Artificial Intelligence Lab
Massachusetts Institute of Technology

October 6, 2009 [Link] L10-1

IP-Lookup module without


the completion buffer
outQ
ge
e
enter

etResult

RAM done?

fifo

module mkIPLookup(IPLookup);
rule recirculate… ; rule exit …;
method Action enter (IP ip);
The
[Link](ip[31:16]);
packets
[Link](ip[15:0]);
may come
endmethod out of
method ActionValue#(Msg) getResult(); order
[Link]();
return [Link]();
endmethod
endmodule
October 6, 2009 [Link] L10-2

1
IP Lookup rules
rule recirculate (!isLeaf([Link]()));
IP rip = [Link](); [Link](rip << 8);
[Link]([Link]() + rip[15:8]);
[Link](); [Link]();
endrule
rule exit (isLeaf([Link]()));
[Link]([Link]()); [Link](); [Link]();
endrule

Method enter and exit can be scheduled simultaneously,


assuming [Link] and [Link] can be done simultaneously

October 6, 2009 [Link] L10-3

IP-Lookup module with the


completion buffer
getToken
outQ
getResult

yes cbuf
e
enter

RAM done?

no
fifo

Completion buffer
- ensures that departures take place in order
even if lookups complete out-of-order
- gives
i outt tokens
t k to
t control
t l the
th entry
t into
i t the
th
circular pipeline

The fifo now must also hold the “token” while the memory
access is in progress: Tuple2#(Token,Bit#(16))

remainingIP
October 6, 2009 [Link] L10-4

2
Completion buffer:
Interface
getToken getResults
cbuf

put (result & token)

interface CBuffer#(type t);


method ActionValue#(Token) getToken();
method Action put(Token tok, t d);
method ActionValue#(t) getResult();
endinterface
typedef Bit#(TLog#(n)) TokenN#(numeric type n);
typedef TokenN#(16) Token;

October 6, 2009 [Link] L10-5

Completion buffer:
Implementation
I
I
A circular buffer with two pointers i i
V
and oo, and a counter cnt
I
o V
Elements are of Maybe type cnt I
buf
module mkCBuffer (CBuffer#(t))
provisos (Bits#(t,sz));
RegFile#(Token, Maybe#(t)) buf <- mkRegFileFull();
Reg#(Token) i <- mkReg(0); //input index
Reg#(Token) o <- mkReg(0); //output index
Reg#(Int#(32)) cnt <- mkReg(0); //number of filled slots

Elements must be representable as bits

October 6, 2009 [Link] L10-6

3
Completion buffer: buf I
I

Implementation
i
V
I
o V
// state elements
// buf, i, o, n ... cnt I
method ActionValue#(t) getToken()
if (cnt < maxToken);
cnt <= cnt + 1; i <= (i== maxToken) ? 0: i + 1;
[Link](i, Invalid); Can these
return i; methods execute
endmethod concurrently?
method Action put(Token tok, t data); Does it matter?
return [Link](tok,
p Valid data);
endmethod
method ActionValue#(t) getResult()
if (cnt > 0) &&&
([Link](o) matches tagged (Valid .x));
o <= (o==maxToken) ? 0 : o + 1; cnt <= cnt - 1;
return x;
endmethod
October 6, 2009 [Link] L10-7

IP-Lookup module with the


completion buffer
getToken
gettResult

yes
y cbuf
enter

RAM done?

no
fifo

module mkIPLookup(IPLookup);
rule recirculate… ; rule exit …; for enter and
method Action enter (IP ip); getResult to
Token tok <- [Link](); execute
ram req(ip[31:16]);
[Link](ip[31:16]); simultaneously
simultaneously,
[Link](tuple2(tok,ip[15:0])); [Link]
endmethod and
method ActionValue#(Msg) getResult(); [Link]
let result <- [Link](); must execute
return result; simultaneously
endmethod
endmodule
October 6, 2009 [Link] L10-8

4
IP Lookup rules with
completion buffer
rule recirculate (!isLeaf([Link]()));
match{.tok,.rip} = [Link]();
[Link](tuple2(tok,(rip << 8)));
[Link]([Link]() + rip[15:8]);
[Link](); [Link]();
endrule

rule exit (isLeaf([Link]()));


[Link]([Link]()); [Link](); [Link]();
endrule

For rule exit and method enter to execute simultaneously,


[Link] and [Link] must execute simultaneously
⇒ For no dead cycles [Link] and [Link] and
[Link] must be able to execute simultaneously
October 6, 2009 [Link] L10-9

Completion buffer: buf I


I

Concurrency Issue
i
V
I
o V
// state elements
// buf, i, o, n ... cnt I
method ActionValue#(t) getToken()
if (cnt < maxToken);
cnt <= cnt + 1; i <= (i==maxToken) ? 0 : i + 1;
[Link](i, Invalid);
Can these methods
return i;
execute concurrently?
endmethod
method Action put(Token tok, t data);
[Link](tok,
p Valid data); NO!
endmethod
method ActionValue#(t) getResult()
if (cnt > 0) &&&
([Link](o) matches tagged (Valid .x));
o <= (o==maxToken) ? 0 : o + 1; cnt <= cnt - 1;
return x;
endmethod
October 6, 2009 [Link] L10-10

5
Concurrency Analysis
I
I
A circular buffer with two pointers i i
V
and oo, and a counter cnt
I
o V
Elements are of Maybe type cnt I
buf

buf must allow two simultaneous updates and


one read
It is possible
bl to d
design such
habbuff because
b the
h
updates are always to different addresses
because of the way buf is used
„ Use a vector of registers
No compiler can detect that
October 6, 2009 [Link] L10-11

Adding Concurrent methods


to the completion buffer

Assume: only put tokens given to


the completion buffer

Can we reach a state where two


methods write on the same data?
„ No.

6
Bypass Registers
For concurrency,
concurrency we want to write
a register logically earlier in the
cycle where it’s read
For correctness we must bypass
the written value to the reads

Build a BypassRegister from RWire


„ write < read
September 29, 2009 [Link] L06-13

Configuration Registers
Sometimes the higher-level semantics
say that the bypass is not necessary
„ The bypass condition never arises
In such cases, an ordinary register
behaves like a bypass register
„ But the concurrency analysis prohibits the
bypass orderings
Configuration Registers are ordinary
registers with CF reads and writes
„ Allows the bypass ordering

October 6, 2009 [Link] L10-14

7
Concurrent Completion
buffer: Implementation
I
I
A circular buffer with two pointers i i
V
and oo, and a counter cnt
I
o V
Elements are of Maybe type cnt I
buf
module mkCBuffer (CBuffer#(t))
provisos (Bits#(t,sz));
Vector#(TokenSz, Maybe#(t)) buf <-
replicateM(mkConfigReg(Invalid));
Reg#(Token) i <- mkReg(0); //input index
Reg#(Token) o <- mkReg(0); //output index
Counter#(32) cnt <- mkCounter(0); //number of filled slots

October 6, 2009 [Link] L10-15

A special counter module


We often need to keep count of certain
events
„ Need to read count, decrement and
increment
„ Since decrementing and incrementing don’t
change the count we can remove some
bypassing links
„ Implemented as Counter Library modules
(implemented using Rwires)

October 6, 2009 [Link] L10-16

8
Completion buffer: buf I
I

Concurrency Issue
i
V
I
o V
// state elements
// buf, i, o, n ... cnt I
method ActionValue#(t) getToken()
if ([Link]() < maxToken);
[Link](); i <= (i==maxToken) ? 0 : i + 1;
buf[i] <= Invalid; Can these methods
return i; execute concurrently?
endmethod
method Action put(Token tok, t data);
buf[tok] <= Valid data; Yes!
endmethod
method ActionValue#(t) getResult()
if ([Link]() > 0) &&&
(buf[o] matches tagged Valid .x);
o <= (o==maxToken) ? 0 : o + 1; [Link]();
return x;
endmethod
October 6, 2009 [Link] L10-17

Longest Prefix Match for IP lookup:


3 possible implementation architectures

Rigid pipeline Linear pipeline Circular pipeline

Inefficient memory Efficient memory Efficient memory


usage but simple usage through with most complex
design memory port control
replicator

Which is “best”?
Arvind,
October 6, Nikhil,
2009 Rosenband[Link]
& Dave ICCAD 2004 L10-18

9
Implementations of Static pipelines
Two designers, two results
LPM versions Best Area Best Speed
(gates) (ns)
S i V (Replicated
Static ( li d FSMs)
S ) 8898 3 60
3.60

Static V (Single FSM) 2271 3.56

IP addr
Replicated:
BEST: MUX
IP addr result
MUX / De-MUX
result

Each packet
is processed FSM FSM FSM FSM
by one FSM
Shared
Counter MUX / De-MUX FSM FSM

RAM
RAM
October 6, 2009 [Link] L10-19

Synthesis results
LPM Code Best Area Best Speed Mem. util.
versions size (gates) (ns) (random
(lines) workload)
Static V 220 2271 3.56 63.5%

Static BSV 179 2391 (5% larger) 3.32 (7% faster) 63.5%

Linear V 410 14759 4.7 99.9%

Linear BSV 168 15910 (8% larger) 4.7 (same) 99.9%

Circular V 364 8103 3.62 99.9%

Circular BSV 257 8170 (1% larger) 3.67 (2% slower) 99.9%

S
Synthesis:
h TSMC
S C 0.18
0 8 µm lib
lb
- Bluespec results can match carefully coded Verilog
- Micro-architecture has a dramatic impact on performance
- Architecture differences are much more important than
language differences in determining QoR

V = Verilog;BSV
October 6, 2009 = Bluespec [Link]
System Verilog L10-20

10

You might also like