Professional Documents
Culture Documents
Retiming: Lan-Da Van (范倫達), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C
Retiming: Lan-Da Van (范倫達), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C
Retiming
Lan-Da Van (), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C. Fall, 2010
ldvan@cs.nctu.edu.tw http://www.cs.nctu.tw/~ldvan/
Outlines
Introduction
Definitions and Properties Solving Systems of Inequalities Retiming Techniques Conclusion Supplement
Lan-Da Van
VLSI-DSP-4-2
Introduction (1/2)
Retiming is a transformation technique used to change the locations of delay elements in a circuit without affecting the input/output characteristics of the circuit. Critical path is defined to be the path with the longest computation time among all paths that contain zero delay. The lower bound on the clock period of the circuit can be achieved by retiming.
Lan-Da Van
VLSI-DSP-4-3
Introduction (2/2)
Application of Retiming
Reduce the clock period. Reduce the number of registers. Reduce the power consumption and logic synthesis.
(1) (1)
x(n)
W(n) (1)
y(n)
D D
(2) a b (2)
x(n)
D
(1)
y(n)
D
W1(n) (2) a b
2D
2D
33% reduction
D
W2(n)
(2)
3 u.t.
w(n)=ay(n-1)+by(n-2) y(n)=w(n-1)+x(n) =ay(n-2)+by(n-3)+x(n)
Lan-Da Van
2 u.t.
w1(n)=ay(n-1) w2(n)=by(n-2) y(n)=w1(n-1)+w2(n-1)+x(n) =ay(n-2)+by(n-3)+x(n)
VLSI-DSP-4-4
Outlines
Introduction Definitions and Properties Solving Systems of Inequalities
Retiming Techniques
Conclusion Supplement
Lan-Da Van
VLSI-DSP-4-5
Definitions
(1)
G
(1)
(2)
(1)
Gr
(1)
1 D 2 D D
(2)
wr(3
3 2D
2)=w(3
2)+r(2)-r(3)
=0+1-0=1 wr(2
e
1)=w(2
1)+r(1)-r(2)
(2)
=1+0-1=0
Lan-Da Van VLSI-DSP-4-6
Properties of Retiming
4.2.1 The weight of the retimed path p=V0
given by wr(p)=w(p)+r(Vk)-r(V0).
e0
V1 e1
ek-1
Vk is
Retiming does not change the number of delays in a cycle. Retiming does not alter the iteration bound in a DFG.
Adding the constant value j to the retiming value of each node does not change the mapping from G to Gr.
Lan-Da Van
VLSI-DSP-4-7
Property 4.2.1
The weight of the retimed path p=V0 given by wr(p)=w(p)+r(Vk)-r(V0). Proof
e0
V1 e1
ek-1
Vk is
wr ( p ) wr (e)
i 0
k 1
k 1
k 1
k 1
k 1
w( p) r (Vk ) r (V0 )
Lan-Da Van VLSI-DSP-4-8
Outlines
Introduction Definitions and Properties Solving Systems of Inequalities
Retiming Techniques
Conclusion Supplement
Lan-Da Van
VLSI-DSP-4-9
Lan-Da Van
VLSI-DSP-4-10
r4 - r1 4
r4 - r3 -1 r3 - r2 2 Step 1 :
0 0 0
Floyd-Warshall algorithm
5 5 4 0 2 1 -1 0 0 0 -1
2
5
R(6) =
1
4
3
-1
R(1) =
5 4 0 2 0 -1 0 0 0 0
5 4 0 2 1 0 -1 0 0 0 -1
0
0
R(2) =
0 0 0
0 0 0
0 0
0 0
5 4 2 1 0 -1 0 0 -1
5 2 0 0 4 1 -1 0 -1
0
0
R(3) =
0 0 0 0 0 0 0 0
5 2 0 0 5 2 0
4 1 -1 0 -1 4 1 -1 -1
R(4) =
R(5) =
R(6) =
ri,j(m+1)=min{ri,k(m)+rk,j(1)}
Lan-Da Van VLSI-DSP-4-12
Outlines
Introduction
Definitions and Properties Solving Systems of Inequalities
Retiming Techniques
Conclusion Supplement
Lan-Da Van
VLSI-DSP-4-13
Step 1: Construct 2 disjointed sub-graphs G1 and G2 Step 2: Add k delays to each edge from G1 to G2 and remove k delays from each edge from G2 to G1. wr(G1->G2)=w(G1->G2)+r(G2)-r(G1)=w(G1->G2)+k wr(G2->G1)=w(G2->G1)+r(G1)-r(G2)=w(G2->G1)-k => -min{w(G1->G2)} k min{w(G2->G1)}
Lan-Da Van
VLSI-DSP-4-14
Cutset = {2->1, 3->2, 1->4} and feasible solution: 0 k 1 Cutset = {2->1, 3->2, 4->2} and feasible solution: 0 k 1
Lower case
1
D D 3 4 2D 2 1
G1
D 3 4
k=1
2
3 4
3D
G2
1 D 2 D 3 4 2D 2 3 4 1 D
G1 k=1
2D
(1)
1
D
(1) 2
(2)
3 4
2D
G2
Lan-Da Van
(2) VLSI-DSP-4-15
Pipelining
Pipelining is a special case of cutset retiming where there are no edges in the cutset from G2 to G1, i.e., pipelining applies to graphs without loops.
Lan-Da Van
VLSI-DSP-4-16
Replace each delay with N delays to create an N-slow version of the DFG. Perform cutset retiming on the N-slow DFG.
N=2
In (a) , CP=101 2
Tcp=101*1+2*2=105
In (c) , CP = 2 2 Tcp=2*1+2*2=6
Lan-Da Van VLSI-DSP-4-17
Systolic Transformation
z 1
z 1
z 1
ai , 2
S z 1
bi , 2
ai , 2
S z 1
bi , 2
ai ,1
S z 1
bi ,1
ai ,1
bi ,1
z 1
bi ,0
z 1 ai , 0
ai , 0
bi ,0
Source: L. D. Van, "A new 2-D systolic digital filter architecture without global broadcast," IEEE Trans. VLSI Systs., vol. 10, pp. 477-486, Aug. 2002.
Lan-Da Van VLSI-DSP-4-18
Retiming Relationship
Systolic Transformation
Lan-Da Van
VLSI-DSP-4-19
(G) = max{t(p) : w(p)=0} (min. feasible clk period) W(U,V) = min{w(p) : U e V} D(U,V) = max{t(p) : U e V and w(p) = W(U,V)}
Lan-Da Van
VLSI-DSP-4-20
W(U,V) and D(U,V) are used to determine if there is a retiming solution that can achieve a desired clock period. Given a desired clock period c, if :
1. feasibility constraint : r(U)-r(V)w(e) for every edge e :UV 2. critical path constraint : r(U)-r(V)W(U,V)-1 for all vertices U,V in G such that D(U,V)>c.
Lan-Da Van
VLSI-DSP-4-21
1 D
(1) 2
D
(2)
3 4
2D
(2)
G
(1)
(1)
1
7 2 7 -2 -2
(2)
3 4 (2)
15
2 5 12 -2 -2
3 7 14 12 12
4 15 22 20 20
D(U,V) 1 2 3 4
1 2 3 4
1 2 4 4
4 1 3 3
3 4 2 6
3 4 6 2
VLSI-DSP-4-22
Feasibility constraint:
Lan-Da Van
VLSI-DSP-4-23
r(1)=r(2)=r(3)=r(4)=0
(1) 1 D (1) 2
D
(2) 2D 3
(2)
VLSI-DSP-4-24
Lan-Da Van
Constraint Graph
0
0 1 1 0 0 2 0 -1 1 3 1 2 0 0 0 5
1 -1 4
Lan-Da Van
VLSI-DSP-4-25
2D
(2)
VLSI-DSP-4-26
Lan-Da Van
Conclusion
Reduction of clock period, number of registers, and power consumption by retiming for synchronous systems Shortest path algorithms can be used to obtain a retiming solution if one exists. Retiming can also be used as a preprocessing step for folding and for computation of round-off noise as discussed in Chap 6 and 11, respectively.
Lan-Da Van
VLSI-DSP-4-27
Lan-Da Van
VLSI-DSP-4-29
Example
U=2 ( node 2 )
1
-3 2 1 2 1 4
3
2
Lan-Da Van
VLSI-DSP-4-31
Lan-Da Van
VLSI-DSP-4-32
Example
R(1) -3 1 R(3) -3 1 -2
2 2
R(2) 1
R(4) 1
-3 -2
-3
1 2 2 -2 -1 1 2 2 -1 0
-3
2 1 2 3 2
1 4
-2 -1 1 2 2 -1 0
R(5) 0 -3 -2 -1 3 0 1 2 3 0 1 2 1 -2 -1 0