You are on page 1of 35

VLSI Digital Signal Processing Systems

Folding

Lan-Da Van (范倫達), Ph. D.


Department of Computer Science
National Chiao Tung University
Taiwan, R.O.C.
Fall, 2010

ldvan@cs.nctu.edu.tw

http://www.cs.nctu.tw/~ldvan/
VLSI Digital Signal Processing Systems

Outline
Introduction
Folding Transformation
Register Minimization Techniques
Register Minimization in Folded Architecture
Conclusions

Lan-Da Van VLSI-DSP-6-2


VLSI Digital Signal Processing Systems

Introduction (1/2)
Systematically determine the control circuits in DSP
architectures by folding transformation, where
multiple algorithm operations are time-multiplexed to
a single functional unit.
Use for synthesis of DSP architectures that can be
operated at single or multiple clocks.
Use to reduce the number of hardware functional
units (FUs) by a factor of N at the expense of
increasing computation time by a factor of N.
Lead to an architecture that uses a large number of
registers and thus present the register minimization
technique.

Lan-Da Van VLSI-DSP-6-3


VLSI Digital Signal Processing Systems

Introduction (2/2)

Lan-Da Van VLSI-DSP-6-4


VLSI Digital Signal Processing Systems

Outline
Introduction
Folding Transformation
Register Minimization Techniques
Register Minimization in Folded Architecture
Conclusions

Lan-Da Van VLSI-DSP-6-5


VLSI Digital Signal Processing Systems

Folding Transformation (1/3)


A systematic techniques for designing control circuits for hardware
where several algorithm operations are time-multiplexed on a single
functional unit.
Notations
 U, V: nodes (operations) of the original DFG
 HU, HV: nodes (functional units) of the folded DFG
 W(x): x-th iteration of node W
 U→e V: an edge e from node U to noe V
 w(e): # of delays of the edge e
 Folding factor N
 # of operations that share one FU
Folding set
 An ordered set of operations that executed by the same FU
 the position of an operation U in folding set is actually the folding order of
U
 The folding set are typically obtained from a scheduling and allocation
algorithm (ref. Appendix B)
 The folding set represents underlying folding transformation

Lan-Da Van VLSI-DSP-6-6


VLSI Digital Signal Processing Systems

Folding Transformation (2/3)


PU: # of the pipeline stages of HU. PU = 0 indicates
that HU is not pipelined.
e
DF(U → V): (folding equation) # of cycles that the
result of HU must be stored
e
DF (U  V )  [ N (l  w(e))]  v ]  [ Nl  PU  u]
 Nw(e)  PU  v  u
Negative value of folding equation DF is possible
before retiming the folding equations.

Lan-Da Van VLSI-DSP-6-7


VLSI Digital Signal Processing Systems

Folding Transformation (3/3)

w(e)
U(l) V(l+w(e))

N folded N folded

PU+DF
HU(Nl+u) HV(N(l+w(e))+v)

Lan-Da Van VLSI-DSP-6-8


VLSI Digital Signal Processing Systems

Folding Retimed Biquad Filter (1/2)


Folding factor N = 4
Folding set S1 = {4, 2, 3, 1}, S2 = {5, 8, 6, 7}, where S1
denote all add operation and S2 denote all multiply
operation.
Assume that
 addition and multiplication require 1 and 2 u.t. respectively.
 1-stage adders and 2-stage pipelined multipliers are available.

Lan-Da Van VLSI-DSP-6-9


VLSI Digital Signal Processing Systems

Folding Retimed Biquad Filter (2/2)


folding equations

Lan-Da Van VLSI-DSP-6-10


VLSI Digital Signal Processing Systems

Retiming (1/3)
What situations will be suffered if the folding equation
DF is negative?
Retiming (moving delay elements) the original DFG
prior to folding
Constraint:
e
 D’F(U→V)= Nwr(e)–PU +v–u>=0 -----(1)
 Substitute wr(e)=w(e)+r(V)–r(U) into (1)
e
 r(U)–r(V)<= DF(U→V)/N

 Since the retiming values of the nodes are restricted to be


integers, the above equations can be rewritten as
e
 r(U)–r(V)<=└DF(U→V)/N┘

Lan-Da Van VLSI-DSP-6-11


VLSI Digital Signal Processing Systems

Retiming (2/3)
Example:
DF(12)=Nw(e)-PU+v-
u=0-1+1-3=-3

r(1)-r(2)<= floor{DF(12)/N}
=floor{-3/4}=-1

Lan-Da Van VLSI-DSP-6-12


VLSI Digital Signal Processing Systems

Retiming (3/3)

r(1)=-1, r(2)=0,
r(3)=-1, r(4)=0

r(5)=-1, r(6)=-1,
r(7)=-2, r(8)=-1

Lan-Da Van VLSI-DSP-6-13


VLSI Digital Signal Processing Systems

Outline
Introduction
Folding Transformation
Register Minimization Techniques
Register Minimization in Folded Architecture
Conclusions

Lan-Da Van VLSI-DSP-6-14


VLSI Digital Signal Processing Systems

Lifetime Analysis
Lifetime analysis is a procedure used to compute the
minimum number of registers required to implement a
DSP algorithm in hardware.
 Linear lifetimes analysis
 Circular lifetime analysis
In lifetime analysis, the number of live variables at
each time unit is computed, and the maximum
number of live variables at any time unit is
determined.
Forward-backward register allocation technique

Lan-Da Van VLSI-DSP-6-15


VLSI Digital Signal Processing Systems

Linear Lifetime Analysis

Variables {a , b , c} Periodicity Implicit


max {0,1,2,2,2,2,2,2}=2

Three iterations with N=6


Lan-Da Van VLSI-DSP-6-16
VLSI Digital Signal Processing Systems

Matrix Transpose Example (1/3)

Transpose
abc adg
def beh
ghi c f i

ihgfedcba Matrix ifchebgda


Transpose

Lan-Da Van VLSI-DSP-6-17


VLSI Digital Signal Processing Systems

Matrix Transpose Example (2/3)


Tzlout = zero-lantacy output time
Tdiff = Tzlout – Tinput
Toutput = Tzlout + max{-Tdiff}

Lan-Da Van VLSI-DSP-6-18


VLSI Digital Signal Processing Systems

Matrix Transpose Example (3/3)


Linear Lifetime Chart Circular Lifetime Chart

The minimum register number is 4.

Lan-Da Van VLSI-DSP-6-19


VLSI Digital Signal Processing Systems

Procedures of Forward-Backward
Register Allocation
Steps:
Step 1: Determinate the minimum number of registers
using lifetime analysis.
Step 2: Input each variable at time step according to the
beginning of its lifetime.
Step 3: Each variable is allocated in a forward manner
until it is dead or it reaches the last register.
Step 4: Since the allocation is periodic, the allocation of
the current iteration also repeats itself in subsequent
iterations. Thus, we hash the position for registers at
period of N.
Step 5: If a variable that reaches the last register and is
still alive, then these variables are allocated to a register
in a backwardly manner.
Step 6: Repeat Steps 4 and 5 as required until the
allocation is completed.
Lan-Da Van VLSI-DSP-6-20
VLSI Digital Signal Processing Systems

Register Allocation for Matrix Transpose


Example

Lan-Da Van VLSI-DSP-6-21


VLSI Digital Signal Processing Systems

Outline
Introduction
Folding Transformation
Register Minimization Techniques
Register Minimization in Folded Architecture
Conclusions

Lan-Da Van VLSI-DSP-6-22


VLSI Digital Signal Processing Systems

Procedures of Register Minimization in


Folded Architectures
Steps:
Step 1: Perform retiming for folding
Step 2: Write the folding equations
Step 3: Use the folding equations to construct a
lifetime table
Step 4: Draw the lifetime chart and determine the
required number of registers
Step 5: Perform forward-backward register
allocation
Step 6: Draw the folded architecture that uses the
minimum number of registers

Lan-Da Van VLSI-DSP-6-23


VLSI Digital Signal Processing Systems

Folding Architecture Example

Lan-Da Van VLSI-DSP-6-24


VLSI Digital Signal Processing Systems

Folded Architecture for Matrix Transpose


Example

Lan-Da Van VLSI-DSP-6-25


VLSI Digital Signal Processing Systems

Biquad Filter Example (1/4)


Retiming
Step 1: Retiming

Invalid folding:
DF(1→2) = -3
DF(6→4) = -4
DF(8→4) = -3
DF(7→3) = -3

Lan-Da Van VLSI-DSP-6-26


VLSI Digital Signal Processing Systems

Biquad Filter Example (2/4)

Step 2: Folding Equations Step 3: Construct the lifetime table

Tinput = u + Pu
DF(U→V) = Nw(e) – Pu + v - u Toutput = u + Pu + maxv{DF(U→V) }

DF(1→2) = 4(1) – 1 + 1 – 3 = 1
DF(1→5) = 4(1) – 1 + 0 – 3 = 0
DF(1→6) = 4(1) – 1 + 2 – 3 = 2
DF(1→7) = 4(1) – 1 + 3 – 3 = 3
DF(1→8) = 4(2) – 1 + 1 – 3 = 5
DF(3→1) = 4(0) – 1 + 3 – 2 = 0
DF(4→2) = 4(0) – 1 + 1 – 0 = 0
DF(5→3) = 4(0) – 2 + 2 – 0 = 0
DF(6→4) = 4(1) – 2 + 0 – 2 = 4
DF(7→3) = 4(1) – 2 + 2 – 3 = 1
DF(8→4) = 4(1) – 2 + 0 – 1 = 1

Lan-Da Van VLSI-DSP-6-27


VLSI Digital Signal Processing Systems

Biquad Filter Example (3/4)

Step 4: Draw the Lifetime Chart Step 5: Register Allocation

Folding Factor = 4

The minimum number


of registers is 2.

Lan-Da Van VLSI-DSP-6-28


VLSI Digital Signal Processing Systems

Biquad Filter Example (4/4)

Step 6: Folded Architecture

Lan-Da Van VLSI-DSP-6-29


VLSI Digital Signal Processing Systems

IIR Filter Example (1/4)


Step 1: Retiming

Retiming

Invalid folding:
DF(31) = -3
DF(41) = -2
Lan-Da Van VLSI-DSP-6-30
VLSI Digital Signal Processing Systems

IIR Filter Example (2/4)

Step 2: Folding Equations Step 3: Construct the lifetime table

Tinput = u + Pu
Toutput = u + Pu + maxv{DF(U→V) }
DF(U→V) = Nw(e) – Pu + v - u

DF(1→2) = 4(1) – 1 + 1 – 3 = 0
DF(2→3) = 4(1) – 1 + 0 – 3 = 5
DF(2→4) = 4(1) – 1 + 2 – 3 = 2
DF(3→1) = 4(1) – 1 + 3 – 3 = 1
DF(4→1) = 4(2) – 1 + 1 – 3 = 0

Lan-Da Van VLSI-DSP-6-31


VLSI Digital Signal Processing Systems

IIR Filter Example (3/4)

Step 4: Draw the Lifetime Chart Step 5: Register Allocation

Folding Factor = 2

The minimum number


of registers is 3.

Lan-Da Van VLSI-DSP-6-32


VLSI Digital Signal Processing Systems

IIR Filter Example (4/4)


Step 6: Folded Architecture

Lan-Da Van VLSI-DSP-6-33


VLSI Digital Signal Processing Systems

Conclusions
Present a systematic transformation of time-
multiplexed architectures
Explore folding techniques to reduce # of functional
units
Explore register minimization technique to reduce #
of registers

Lan-Da Van VLSI-DSP-6-34


VLSI Digital Signal Processing Systems

References
K. K. Parhi, VLSI Digital Signal Processing Systems:
Design and Implementation, Wiley, 1999.
S. Y. Huang, Handout of text book, 2004.

Lan-Da Van VLSI-DSP-6-35

You might also like