You are on page 1of 103

Advanced VLSI Design Self Timed Pipelines

1

synchronous systems
• Sync. systems— • logical ordering of events by clk. It provides a time base • Physical timing constraint- next edge comes when all blocks have reached steady state • Problem—CLB has to wait even though it may finish earlier. Clock distribution network
EE141 2

Wave pipeline systems

3

no registers in between but delays through all paths of combination logic must be matched.Implementation of pipeline • Synchronous pipeline---robust design. Here Over all latency=sum of stage delays 4 . delay= N x worst case delay • Wavepipeline---low latency.

Pipeline. So logical functions will be correct for any actual gate delays thereby achieving speed independence 5 .Self timed pipelines • In between two approaches • Best attributes of both--• safety and tolerances of sync. low overall latency and • increased throughput of wavepipeline • Data token transfer can happen exactly when data computations complete.

• Problem– when to latch the output ? When output is a correct value? • Remedy—system has to meet timing constraints 6 .—next block can start computation as soon as previous block has finished. design—meeting constraints • Adv.Asynch.

physical timing • REQUEST .Local signals • Logical ordering and physical timing --— • START. DONE. -.Logical ordering 7 . ACKNOWLEDGE .

Self timed system • System generate its own timing signal Req Ack HS Start Req Ack Done HS Start Req Ack Done HS Start Req ACK Done In R1 F1 R2 F2 R3 F3 Out tpF1 tpF2 tpF3 8 .

synchronize by mutual agreement • adv. • No clock routing.--timing signals generated locally— • less prop. high speed. • Sync.Self timed system --Hand shake protocol • Hand shaking.– hand shaking circuit design 9 . Delay. Ckts guided by actual conditions • Disadv. • power saving. • robust regarding manufacturing and operating conditions. circuits are guided by extreme conditions and async.

• A binary event cannot be encoded on a single binary signal. for at any point in time for there are three possibilities: • continue sending the current symbol. • start the next symbol with the same value. an explicit transition is required to start every symbol. 10 . • and start the next symbol with the complement value. • Either a ternary signal or two binary signals are required to encode a binary event.Encoding Aperiodic events • For events that are not periodic.

Dual rail signaling
• a binary event stream A is encoded as a pair of unary event streams A1 and AO. • a transition on line Al signals that a new 1 symbol has started on A, • a transition on line A0 signals that a new 0 symbol has started on A, and • no transition implies that the current symbol continues. • Dual-rail signaling should not be confused with differential signaling as A0 and A1 are not complements

11

Types of dual rail signaling
• Dual rail Return-to-Zero (RZ) --With return-to-zero
signaling, the unary event lines return to zero after every event, and only the positive transitions are significant. Return-to-zero dual-rail signaling is the timing convention employed by dual-rail domino logic

• Dual rail Nonreturn-to-Zero (NRZ) Signaling—Nonreturn-to-zero signaling requires remembering the old state of both lines to decode the present value

• Clocked Signaling and Bundling---With clocked signaling,
one line, the clock, encodes a unary event stream that denotes the start of each new symbol while the other line carries the value of the symbol. The clock can either be NRZ, denoting a new symbol on each transition, or RZ, where only the positive transitions are meaningful
12

• The primary advantage of separating the timing information from the symbol values is that it permits bundling of several values with a single event stream. • If N signals are bundled together and associated with a single clock line, then the simultaneous events associated with the N signals can be encoded on N + 1 lines instead of the 2N that would be required if timing information were unbundled.

• Ternary Signaling---binary event stream could be
encoded on a single NRZ ternary logic symbol/ or RZ ternary logic
13

14 .

RECEIVING) • NO RESET STATE 15 .2 phase protocol • SIMPLE AND FAST • TRANSITION SIGNALLING or EVENT LOGIC • DATA TRANSFER HAPPENS AT BOTH THE EDGES (FALLING.

Implementation of HS protocol2 phase (no return to zero) Req Ack SENDER Data RECEIVER Ack 3 2 Req (a) Sender-receiver configuration Data 1 cycle 1 1 cycle 2 Sender’s action Receiver’s action (b) Timing diagram 16 .

2 phase protocol (NRTZ) • Data change---request----data acceptance---acknowledge ---Data change---request----acknowledge----data acceptance events Proceeds in cyclic order. Successive cycles may take different time • 2 active cycles• Sender ---terminated by request event (no change in o/p possible) • Receiver-------terminated by acknowledge event 17 .

4 phase protocol (RTZ) 2 Req Receiver’s action Ack 3 5 4 Sender’s action Data 1 Cycle 1 1 Cycle 2 • Level sensitive—data transfer at only one (positive) edge 18 .

4 phase protocol • Data change---request----data acceptance----acknowledge----Reset--Data change---request----data acceptance----acknowledge• Proceeds in cyclic order 19 .

4-phase vs 2-phase • “return to zero” (RZ) is overhead (time and power) • “level signaling” • “non-return to zero” (NRZ) seems to have lower overhead • “transition signaling” • But implementation is more complex 20 .

Bundled data / single rail protocol Req shd be issued when data output is stable 21 .

Bundled Data Protocol Two control wires associated with each n-bit data channel  (n+2) bits for each n-bit channel Rdy Producer Ack Data 32 Consumer Micropipelines use 2-phase bundled data protocol 22 .

Dual rail protocol • I bit information coded using two wires • Request/ done is merged with data wires 23 .

Dual-Rail Protocol Two wires for each bit of data (a 0-wire and a 1-wire) + An ack wire for each data channel (n*2 + 1) bits for each n-bit channel Redundant encoding realizes QDI circuit B0 B1 SENDER B2 B3 RECIEVER Ack A 4-bit dual-rail channel 24 .

Self timed pipelines implementation 25 .

acts as a first-in first.Asynchronous pipeline example • The simplest asynchronous pipeline. 26 .out (FIFO) buffer • Here data being clocked into the align stages by the input request signal and clocked out by the output acknowledge signal. in which the LOGIC blocks are just wires.

merged with data) • 4 phase single rail (level transition. separate) • 2 phase dual rail (event transition. req. req.phase dual rail (level transition. req. merged with data) 27 .Possible combinations • 2-phase single rail (event transition. separate) • 4. req.

Circuit implementations 28 .

Event Logic – The Muller-C Element A A C B F 0 0 1 1 B 0 1 0 1 F n 1 0 Fn Fn 1 (a) Schematic VDD A A B S R (a) Logic Q F B VDD B (b) Truth table VDD F B A B A F B (b) Majority Function (c) Dynamic 29 .

2 phase 30 .

Muller C pipeline 31 .

Special “capture-pass” latches alternate between capture and pass • First pass data to next. then capture next data token ACK ACK ACK ACK C C C REQ c1 C P REQ p1 c2 C P REQ p2 c3 C P REQ LATCH LATCH LATCH 32 .2-phase single rail FIFO • Transition signaling. then hold.

Operation details 33 .

Rin=Req 34 .

p2 → 1. latch1 in hold • c2 →1. latch1ready for new 35 .Operation • req → 1 • c1→ 1. c3 →1. → latch 1 get data latched but can not pass on to latch 2. latch 3 gets the data. p1 →1. latch 2 gets the data. • c1→ 1. c2 →1. latch2 in hold.

p1 →0. p2 → 0.Operation • req → 0 • c1→ 0. latch 3 gets the data. • c1→ 0. latch 2 gets the data. latch2 in hold. c3 →0. c2 →0. → latch 1 get data latched but can not pass on to latch 2. latch1ready for new 36 . latch1 in hold • c2 →0.

Operation • Hence we get a sequence of Eval → hold →eval →hold------• No reset state in between 37 .

Capture-Pass transition-controlled latch • Transitions on C and P alternate • Micropipelines “Elegant”. no RZ overhead • But implementation (latches and other control circuits) is complex 38 .

39 .

Micropipelines with processing 2 stage 40 .

4 stage pipeline 41 .

Event controlled latch design 42 .

Alternative implementation--Event controlled switch/ latch 43 .

Delay element design 44 .

Mousetrap pipelines These are examples of 2 phase single rail pipelines • No muller-C element. • Stage disables itself 45 .

MOUSETRAP Pipelines Simple asynchronous implementation style. transparent latches – simple control: 1 gate/pipeline stage MOUSETRAP uses a “capture protocol:” Latches … – are normally transparent: before new data arrives – become opaque: after data arrives (“capture” data) Control Signaling: transition-signaling = 2-phase – simple protocol: req/ack = only 2 events per handshake (not 4) – no “return-to-zero” – each transition (up/down) signals a distinct operation Our Goal: very fast cycle time – simple inter-stage communication 46 . uses… – standard logic implementation: Boolean gates.

Mousetrap pipelines • These are also examples of 2 phase single rail (bundled data) pipelines 47 .

MOUSETRAP: A Basic FIFO Stages communicate using transition-signaling: Latch Controller 1 transition per data item! ackN-1 ackN En reqN Data in doneN reqN+1 Data out Data Latch Stage N-1 Stage N Stage N+1 48 nd data item flowing through the pipeline 2 1st data item flowing through the pipeline .

Important points for 2-phase single rail Asynch design 49 .

Drawback of single rail circuits • Matched delay element needs to be introduced in between latches to synchronize request and data arrival 50 .

2 PHASE DUAL RAIL FIFO 51 .

2 phase dual rail FIFO • Done signal act as Req. • Monotonic Dual rail signals need to be generated 52 .

Done/ Completion Signal Generation--no glitches Using Redundant Signal Encoding 53 .

Completion Signal in DCVSL VDD Start VDD B0 Done and B1 done B0 In1 In 1 In 2 In 2 Req Event logic Start B1 start PDN PDN Done 54 .

2 phase 55 .

2-Phase dual rail FIFO Ack1 start1 req2 Done / Req1 start2 req3 start3 D1 data Edge triggered reg 56 .

for event logic Edge triggered reg Edge triggered reg 57 .2 registers are req.

Latch implementation USING C MOS 2 In latch latch Start’ Start 58 .

Event Latch implementation using TG [start is delayed Req]. valid data is issued before Req DATA IN OUT Req’’ START ‘’ Req START 59 .

Signal arrival REQ1 done2 reduce star1 60 .

latch1 in hold. D1/ req2 →1 • start2 →1. D2/ req3 →1 • req3.Operation • Req1 →1 • start1 →1. latch2 gets the data. latch1 gets the data./ Ack1 →1. start3 →1. latch3 gets the data • Req1 can go 0 61 .

D2/ req3 →0 • req3. start3 →0. latch2 gets the data./ Ack1 →0. latch3 gets the data • Req1 can go 1 62 . latch1 gets the data. D1/ req2 →0 • start2 →0.Operation • Req1 →0 • start1 →0. latch1 in hold.

Logic implementation VDD Start VDD B0 Done and B1 start B1 In1 In 1 In 2 In 2 done B0 PDN PDN Start / Start start’ 63 .

4 phase RZ 64 .

65 .

4 phase.require pre-charge evaluate logic 66 .

4-phase bundled data circuits-FIFO 67 .

latch 3 all in hold mode • Operation similar to synchronous pipeline 68 . en3 →1. en3 →0. en2 →1. en2 →0.Operation • Initially all nodes ‘0’ • Req → 1 • en1→ 1.with some delay in between • Latch1 data →latch 2 • Latch2 data →latch 3 • Latch3 data →latch 4-----• when req → 0 • en1→ 0. ---. ----• Latch1 .latch 2.

BLOCK LATCH 69 .4-phase bundled data circuits ACK ACK ACK ACK C C C REQ EN REQ EN REQ EN REQ LATCH LATCH LATCH ACK ACK ACK ACK DELAY C DELAY C DELAY C REQ EN REQ EN REQ EN REQ FUNC. BLOCK LATCH FUNC. BLOCK LATCH FUNC.

4-Phase bundled data FIFO Req must be a pulse signal 70 .

) • better implementations available 71 . with local clocks • When full.  only half the latches store data • Similar to master-slave flip-flops • Speed limited by handshake (2-way comm. the C-elements are 1010101….4-phase bundled data circuits • Looks like a sync pipe.

Dual rail circuits— Dual muller pipeline.implementation1 72 .

f C C C d.implementation1 ACK ACK d.4-phase dual rail FIFO-.t d.f C C C • Two muller pipeline in parallel using commom Ack sig in per stage to synchronize • Muller pipeline (again) with Completion Detection • No REQ – embedded in the data 73 .t d.

t C C C d[0].f d[1].f 74 .t C C C d[1].4-phase dual rail FIFO– many bits ACK ACK C C C d[0].t d[0].t C C C d[0].f d[1].f C C C d[1].

of data bits 75 . of muller C elements increase with no.Drawback • No.

Data 76 .

4 phase dual rail FIFO –2nd implemen

77

Operation
• R1 should be disabled when start begins • When done1 is issued req shd go low • When R2 is enabled, Start1 goes 0 and F1 precharges

78

Signal arrival
REQ1

En. 1

start1 reduce

done1

79

of bits 80 .Adv. of Muller C elements do not increase with no. • No such drawback as no.

en=0 (reset phase) req 81 .Implementation of latch ---always pre-charge evaluate logic • Precharge for req=0.

Implementation-3 4-Phase post charge with reset from successor using MullerC Ack Done / Req start data 82 .

so we take average of two delay 83 .Latency of this pipeline • Lf → forward latency • Delay from new valid data outputs at one stage to new valid data outputs from the following stage • Lr → reverse latency • Delay from the acknowledgement of a stage’s output to the acknowledgement of its predecessor output • Because data tokens and reset stage alternate.

Latency • td →delay of done block • tc →delay of muller C • tf → delay of function block tc tf td 84 .

Latency • Lf → td+ tc + t f • Lr → • ½ [ tc ↓ + tf ↓ + td ↑ + tc ↑ + tf ↑ + td ↓ ] 85 • .

Ack Done / Req start data 86 .

87 .

88 .

89 .

4 phase dual rail FIFO (ex3) post charge logic • Post charge with reset from successor different way of generating control signals other than mullerC elemet 90 .

HS Control signal implementation-I Do not delay evaluation until successor has finished resetting 91 .

Latency of this pipeline L f → t d + t c+ t f Lr →½ [tc ↑ + tf ↓ + td ↓ +tc ↑] 92 .

Req a separate wire Do not delay evaluation until predecessor has finished evaluation 93 .HS Control signal implementation-II mix of single rail and dual rail.

Lf → t f Lr →½ [tc ↑ + tc ↓ + tf ↓ + td] 94 .

Latency 95 .

Req gate completion detection (L1) gate completion detection (L2) gate completion detection (L3) Precharged Logic Block (L1) Precharged Logic Block (L2) Precharged Logic Block (L3) VDD int out A B C Post-charge logic with reset from own output 96 .4 phase dual rail FIFO (ex4) Self-Resetting Logic using exnor gate—speed adv.

Latency 97 .

98 .

99 .

4 phase protocol 100 .

4-Phase bundled data Protocol-FIFO 101 .

102 .

4-Phase dual rail Protocol--FIFO 103 .