Folding: Lan-Da Van (范倫達), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C

VLSI Digital Signal Processing Systems
Folding
Lan-Da Van (范倫達), Ph. D.

Department of Computer Science
National Chiao Tung University
Taiwan, R.O.C.
Fall, 2010
ldvan@cs.nctu.edu.tw
http://www.cs.nctu.tw/~ldvan/
Outline
Introduction
Folding Transformation
Register Minimization Techniques
Register Minimization in Folded Architecture
Conclusions
Lan-Da Van VLSI-DSP-6-2

Introduction (1/2)
Systematically determine the control circuits in DSP
architectures by folding transformation, where
multiple algorithm operations are time-multiplexed to
a single functional unit.
Use for synthesis of DSP architectures that can be
operated at single or multiple clocks.
Use to reduce the number of hardware functional
units (FUs) by a factor of N at the expense of
increasing computation time by a factor of N.
Lead to an architecture that uses a large number of
registers and thus present the register minimization
technique.

Introduction (2/2)

Outline
Introduction
Conclusions

Folding Transformation (1/3)

A systematic techniques for designing control circuits for hardware
where several algorithm operations are time-multiplexed on a single
functional unit.
Notations
 U, V: nodes (operations) of the original DFG
 HU, HV: nodes (functional units) of the folded DFG
 W(x): x-th iteration of node W
 U→e V: an edge e from node U to noe V
 w(e): # of delays of the edge e
 Folding factor N
 # of operations that share one FU
Folding set
 An ordered set of operations that executed by the same FU
 the position of an operation U in folding set is actually the folding order of
U
 The folding set are typically obtained from a scheduling and allocation
algorithm (ref. Appendix B)
 The folding set represents underlying folding transformation


PU: # of the pipeline stages of HU. PU = 0 indicates
that HU is not pipelined.
e
DF(U → V): (folding equation) # of cycles that the
result of HU must be stored
e
DF (U  V )  [ N (l  w(e))]  v ]  [ Nl  PU  u]
 Nw(e)  PU  v  u
Negative value of folding equation DF is possible
before retiming the folding equations.

w(e)
U(l) V(l+w(e))
N folded N folded
PU+DF
HU(Nl+u) HV(N(l+w(e))+v)

Folding Retimed Biquad Filter (1/2)

Folding factor N = 4
Folding set S1 = {4, 2, 3, 1}, S2 = {5, 8, 6, 7}, where S1
denote all add operation and S2 denote all multiply
operation.
Assume that
 addition and multiplication require 1 and 2 u.t. respectively.
 1-stage adders and 2-stage pipelined multipliers are available.

Folding Retimed Biquad Filter (2/2)

folding equations

Retiming (1/3)
What situations will be suffered if the folding equation
DF is negative?
Retiming (moving delay elements) the original DFG
prior to folding
Constraint:
e
 D’F(U→V)= Nwr(e)–PU +v–u>=0 -----(1)
 Substitute wr(e)=w(e)+r(V)–r(U) into (1)
e
 r(U)–r(V)<= DF(U→V)/N
 Since the retiming values of the nodes are restricted to be

integers, the above equations can be rewritten as
e
 r(U)–r(V)<=└DF(U→V)/N┘

Retiming (2/3)
Example:
DF(12)=Nw(e)-PU+v-
u=0-1+1-3=-3
r(1)-r(2)<= floor{DF(12)/N}
=floor{-3/4}=-1

Retiming (3/3)
r(1)=-1, r(2)=0,
r(3)=-1, r(4)=0
r(5)=-1, r(6)=-1,
r(7)=-2, r(8)=-1

Outline
Introduction
Conclusions

Lifetime Analysis
Lifetime analysis is a procedure used to compute the
minimum number of registers required to implement a
DSP algorithm in hardware.
 Linear lifetimes analysis
 Circular lifetime analysis
In lifetime analysis, the number of live variables at
each time unit is computed, and the maximum
number of live variables at any time unit is
determined.
Forward-backward register allocation technique

Linear Lifetime Analysis
Variables {a , b , c} Periodicity Implicit

max {0,1,2,2,2,2,2,2}=2
Three iterations with N=6

Matrix Transpose Example (1/3)
Transpose
abc adg
def beh
ghi c f i
ihgfedcba Matrix ifchebgda

Transpose


Tzlout = zero-lantacy output time
Tdiff = Tzlout – Tinput
Toutput = Tzlout + max{-Tdiff}


Linear Lifetime Chart Circular Lifetime Chart
The minimum register number is 4.

Procedures of Forward-Backward
Register Allocation
Steps:
Step 1: Determinate the minimum number of registers
using lifetime analysis.
Step 2: Input each variable at time step according to the
beginning of its lifetime.
Step 3: Each variable is allocated in a forward manner
until it is dead or it reaches the last register.
Step 4: Since the allocation is periodic, the allocation of
the current iteration also repeats itself in subsequent
iterations. Thus, we hash the position for registers at
period of N.
Step 5: If a variable that reaches the last register and is
still alive, then these variables are allocated to a register
in a backwardly manner.
Step 6: Repeat Steps 4 and 5 as required until the
allocation is completed.
Register Allocation for Matrix Transpose

Example

Outline
Introduction
Conclusions

Procedures of Register Minimization in

Folded Architectures
Steps:
Step 1: Perform retiming for folding
Step 2: Write the folding equations
Step 3: Use the folding equations to construct a
lifetime table
Step 4: Draw the lifetime chart and determine the
required number of registers
Step 5: Perform forward-backward register
allocation
Step 6: Draw the folded architecture that uses the
minimum number of registers

Folding Architecture Example

Folded Architecture for Matrix Transpose

Example

Biquad Filter Example (1/4)

Retiming
Step 1: Retiming
Invalid folding:
DF(1→2) = -3
DF(6→4) = -4
DF(8→4) = -3
DF(7→3) = -3

Step 2: Folding Equations Step 3: Construct the lifetime table
Tinput = u + Pu
DF(U→V) = Nw(e) – Pu + v - u Toutput = u + Pu + maxv{DF(U→V) }
DF(1→2) = 4(1) – 1 + 1 – 3 = 1
DF(1→5) = 4(1) – 1 + 0 – 3 = 0
DF(1→6) = 4(1) – 1 + 2 – 3 = 2
DF(1→7) = 4(1) – 1 + 3 – 3 = 3
DF(1→8) = 4(2) – 1 + 1 – 3 = 5
DF(3→1) = 4(0) – 1 + 3 – 2 = 0
DF(4→2) = 4(0) – 1 + 1 – 0 = 0
DF(5→3) = 4(0) – 2 + 2 – 0 = 0
DF(6→4) = 4(1) – 2 + 0 – 2 = 4
DF(7→3) = 4(1) – 2 + 2 – 3 = 1
DF(8→4) = 4(1) – 2 + 0 – 1 = 1

Step 4: Draw the Lifetime Chart Step 5: Register Allocation
Folding Factor = 4
The minimum number

of registers is 2.

Step 6: Folded Architecture

IIR Filter Example (1/4)

Step 1: Retiming
Retiming
Invalid folding:
DF(31) = -3
DF(41) = -2
Step 2: Folding Equations Step 3: Construct the lifetime table
Tinput = u + Pu
Toutput = u + Pu + maxv{DF(U→V) }
DF(U→V) = Nw(e) – Pu + v - u
DF(1→2) = 4(1) – 1 + 1 – 3 = 0
DF(2→3) = 4(1) – 1 + 0 – 3 = 5
DF(2→4) = 4(1) – 1 + 2 – 3 = 2
DF(3→1) = 4(1) – 1 + 3 – 3 = 1
DF(4→1) = 4(2) – 1 + 1 – 3 = 0

Step 4: Draw the Lifetime Chart Step 5: Register Allocation
Folding Factor = 2
The minimum number

of registers is 3.


Step 6: Folded Architecture

Conclusions
Present a systematic transformation of time-
multiplexed architectures
Explore folding techniques to reduce # of functional
units
Explore register minimization technique to reduce #
of registers

References
K. K. Parhi, VLSI Digital Signal Processing Systems:
Design and Implementation, Wiley, 1999.
S. Y. Huang, Handout of text book, 2004.

Folding: Lan-Da Van (范倫達), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Folding: Lan-Da Van (范倫達), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C

Uploaded by

Copyright:

Available Formats

VLSI Digital Signal Processing Systems

Lan-Da Van (范倫達), Ph. D.

Lan-Da Van VLSI-DSP-6-2

Lan-Da Van VLSI-DSP-6-3

Lan-Da Van VLSI-DSP-6-4

Lan-Da Van VLSI-DSP-6-5

Folding Transformation (1/3)

Lan-Da Van VLSI-DSP-6-6

Folding Transformation (2/3)

Lan-Da Van VLSI-DSP-6-7

Folding Transformation (3/3)

Lan-Da Van VLSI-DSP-6-8

Folding Retimed Biquad Filter (1/2)

Lan-Da Van VLSI-DSP-6-9

Folding Retimed Biquad Filter (2/2)

Lan-Da Van VLSI-DSP-6-10

 Since the retiming values of the nodes are restricted to be

Lan-Da Van VLSI-DSP-6-11

Lan-Da Van VLSI-DSP-6-12

Lan-Da Van VLSI-DSP-6-13

Lan-Da Van VLSI-DSP-6-14

Lan-Da Van VLSI-DSP-6-15

Linear Lifetime Analysis

Variables {a , b , c} Periodicity Implicit

Three iterations with N=6

Matrix Transpose Example (1/3)

ihgfedcba Matrix ifchebgda

Lan-Da Van VLSI-DSP-6-17

Matrix Transpose Example (2/3)

Lan-Da Van VLSI-DSP-6-18

Matrix Transpose Example (3/3)

The minimum register number is 4.

Lan-Da Van VLSI-DSP-6-19

Register Allocation for Matrix Transpose

Lan-Da Van VLSI-DSP-6-21

Lan-Da Van VLSI-DSP-6-22

Procedures of Register Minimization in

Lan-Da Van VLSI-DSP-6-23

Folding Architecture Example

Lan-Da Van VLSI-DSP-6-24

Folded Architecture for Matrix Transpose

Lan-Da Van VLSI-DSP-6-25

Biquad Filter Example (1/4)

Lan-Da Van VLSI-DSP-6-26

Biquad Filter Example (2/4)

Step 2: Folding Equations Step 3: Construct the lifetime table

Lan-Da Van VLSI-DSP-6-27

Biquad Filter Example (3/4)

Step 4: Draw the Lifetime Chart Step 5: Register Allocation

The minimum number

Lan-Da Van VLSI-DSP-6-28

Biquad Filter Example (4/4)

Step 6: Folded Architecture

Lan-Da Van VLSI-DSP-6-29

IIR Filter Example (1/4)

IIR Filter Example (2/4)

Step 2: Folding Equations Step 3: Construct the lifetime table

Lan-Da Van VLSI-DSP-6-31

IIR Filter Example (3/4)

Step 4: Draw the Lifetime Chart Step 5: Register Allocation

The minimum number

Lan-Da Van VLSI-DSP-6-32

IIR Filter Example (4/4)