Professional Documents
Culture Documents
Folding: Lan-Da Van (范倫達), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C
Folding: Lan-Da Van (范倫達), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C
Folding
ldvan@cs.nctu.edu.tw
http://www.cs.nctu.tw/~ldvan/
VLSI Digital Signal Processing Systems
Outline
Introduction
Folding Transformation
Register Minimization Techniques
Register Minimization in Folded Architecture
Conclusions
Introduction (1/2)
Systematically determine the control circuits in DSP
architectures by folding transformation, where
multiple algorithm operations are time-multiplexed to
a single functional unit.
Use for synthesis of DSP architectures that can be
operated at single or multiple clocks.
Use to reduce the number of hardware functional
units (FUs) by a factor of N at the expense of
increasing computation time by a factor of N.
Lead to an architecture that uses a large number of
registers and thus present the register minimization
technique.
Introduction (2/2)
Outline
Introduction
Folding Transformation
Register Minimization Techniques
Register Minimization in Folded Architecture
Conclusions
w(e)
U(l) V(l+w(e))
N folded N folded
PU+DF
HU(Nl+u) HV(N(l+w(e))+v)
Retiming (1/3)
What situations will be suffered if the folding equation
DF is negative?
Retiming (moving delay elements) the original DFG
prior to folding
Constraint:
e
D’F(U→V)= Nwr(e)–PU +v–u>=0 -----(1)
Substitute wr(e)=w(e)+r(V)–r(U) into (1)
e
r(U)–r(V)<= DF(U→V)/N
Retiming (2/3)
Example:
DF(12)=Nw(e)-PU+v-
u=0-1+1-3=-3
r(1)-r(2)<= floor{DF(12)/N}
=floor{-3/4}=-1
Retiming (3/3)
r(1)=-1, r(2)=0,
r(3)=-1, r(4)=0
r(5)=-1, r(6)=-1,
r(7)=-2, r(8)=-1
Outline
Introduction
Folding Transformation
Register Minimization Techniques
Register Minimization in Folded Architecture
Conclusions
Lifetime Analysis
Lifetime analysis is a procedure used to compute the
minimum number of registers required to implement a
DSP algorithm in hardware.
Linear lifetimes analysis
Circular lifetime analysis
In lifetime analysis, the number of live variables at
each time unit is computed, and the maximum
number of live variables at any time unit is
determined.
Forward-backward register allocation technique
Transpose
abc adg
def beh
ghi c f i
Procedures of Forward-Backward
Register Allocation
Steps:
Step 1: Determinate the minimum number of registers
using lifetime analysis.
Step 2: Input each variable at time step according to the
beginning of its lifetime.
Step 3: Each variable is allocated in a forward manner
until it is dead or it reaches the last register.
Step 4: Since the allocation is periodic, the allocation of
the current iteration also repeats itself in subsequent
iterations. Thus, we hash the position for registers at
period of N.
Step 5: If a variable that reaches the last register and is
still alive, then these variables are allocated to a register
in a backwardly manner.
Step 6: Repeat Steps 4 and 5 as required until the
allocation is completed.
Lan-Da Van VLSI-DSP-6-20
VLSI Digital Signal Processing Systems
Outline
Introduction
Folding Transformation
Register Minimization Techniques
Register Minimization in Folded Architecture
Conclusions
Invalid folding:
DF(1→2) = -3
DF(6→4) = -4
DF(8→4) = -3
DF(7→3) = -3
Tinput = u + Pu
DF(U→V) = Nw(e) – Pu + v - u Toutput = u + Pu + maxv{DF(U→V) }
DF(1→2) = 4(1) – 1 + 1 – 3 = 1
DF(1→5) = 4(1) – 1 + 0 – 3 = 0
DF(1→6) = 4(1) – 1 + 2 – 3 = 2
DF(1→7) = 4(1) – 1 + 3 – 3 = 3
DF(1→8) = 4(2) – 1 + 1 – 3 = 5
DF(3→1) = 4(0) – 1 + 3 – 2 = 0
DF(4→2) = 4(0) – 1 + 1 – 0 = 0
DF(5→3) = 4(0) – 2 + 2 – 0 = 0
DF(6→4) = 4(1) – 2 + 0 – 2 = 4
DF(7→3) = 4(1) – 2 + 2 – 3 = 1
DF(8→4) = 4(1) – 2 + 0 – 1 = 1
Folding Factor = 4
Retiming
Invalid folding:
DF(31) = -3
DF(41) = -2
Lan-Da Van VLSI-DSP-6-30
VLSI Digital Signal Processing Systems
Tinput = u + Pu
Toutput = u + Pu + maxv{DF(U→V) }
DF(U→V) = Nw(e) – Pu + v - u
DF(1→2) = 4(1) – 1 + 1 – 3 = 0
DF(2→3) = 4(1) – 1 + 0 – 3 = 5
DF(2→4) = 4(1) – 1 + 2 – 3 = 2
DF(3→1) = 4(1) – 1 + 3 – 3 = 1
DF(4→1) = 4(2) – 1 + 1 – 3 = 0
Folding Factor = 2
Conclusions
Present a systematic transformation of time-
multiplexed architectures
Explore folding techniques to reduce # of functional
units
Explore register minimization technique to reduce #
of registers
References
K. K. Parhi, VLSI Digital Signal Processing Systems:
Design and Implementation, Wiley, 1999.
S. Y. Huang, Handout of text book, 2004.