You are on page 1of 11

Chapter 5 Unfolding

ECE734 VLSI Arrays for Digital Signal Processing 1


Definitions

Unfolding is the process of Applications


unfolding a loop so that Reducing sampling period to
several iterations are achieve iteration bound
unrolled into the same (desired throughput rate) T.
iteration. Parallel (block processing) to
Also known as (a.k.a.) execute several iterations
concurrently.
Loop unrolling (in compilers
for parallel programs) Digit-serial or bit-serial
processing
Block processing

(C) 1997-2006 by Yu Hen Hu ECE734 VLSI Arrays for Digital Signal Processing 2
An example
Before unfolding: Block processing formulation
For n = 0 to N-1, J = 3, 9/J = 3 (an integer)
y(n)=a*y(n-9)+x(n) X(k) = [x(3k) x(3k+1) x(3k+2)]T
end
Y(k) = [y(3k) y(3k+1) y(3k+2)]T
Unfolding once (J = 2)
For k = 0 to N/2-1,
Y(k) = a*Y(k- 3 ) + X(k)
y(2k)=a*y(2k-9)+x(2k) J = 2, 9/J = 5 (not an integer)
y(2k+1)=a*y(2k-8)+x(2k+1) X(k) = [x(2k) x(2k+1)]T
end
Y(k) = [y(2k) y(2k+1)]T
Unfolding twice (J = 3)
Y(k) = a*Y(k- 5 ) + X(k)
For k = 0 to N/3-1,
y(3k)=a*y(3k-9)+x(3k)
y(3k+1)=a*y(3k-8)+x(3k+1)
y(3k+2)=a*y(3k-7)+x(3k+2)
end

(C) 1997-2006 by Yu Hen Hu ECE734 VLSI Arrays for Digital Signal Processing 3
Implementation with J=3
3Ts

Ts y(0) Ts
y(1)
+ X
Serial-to-parallel conversion

parallel-to-Serial conversion
D
y(2)
y(3)
y(4)
y(5)
x(0) + X D .
x(1) .
x(2) .
x(3)
x(4)
x(5) + X D
.
.
.

(C) 1997-2006 by Yu Hen Hu ECE734 VLSI Arrays for Digital Signal Processing 4
Unfolding the DFG

Rewrite the algorithm formulation:


y(2k)=a*y(2k-9)+x(2k)
y(2k+1)=a*y(2k-8)+x(2k+1) T=Ts

y(2k)=a*y(2(k-5)+1)+x(2k)
y(2k+1)=a*y(2(k-4))+x(2k+1)
After J-folded unfolding, the clock
T=J Ts
period T = J Ts, where Ts is the data
sampling period.

(C) 1997-2006 by Yu Hen Hu ECE734 VLSI Arrays for Digital Signal Processing 5
Timing Diagram
y(0) y(1) y(2) y(3) y(4) y(5) y(6) y(7) y(8) y(9) y(10) y(11) y(12) y(13)

9T
T=Ts 9T

T=2Ts
y(0) y(2) y(4) y(6) y(8) y(10) y(12)

4T
5T
y(1) y(3) y(5) y(7) y(9) y(11) y(13)

Above timing diagram is Since 9/2 is not an integer,


obtained assuming that the output (y(0), y(1)) will be
sampling period Ts remains needed by two different future
unchanged. Thus, the clock iterations, 4T and 5T later.
period T is increased J-fold.

(C) 1997-2006 by Yu Hen Hu ECE734 VLSI Arrays for Digital Signal Processing 6
General DFG Unfolding Method
Define
x largest integer that x;
x Smallest integer that x;
a%b a - b a / b , a, b are integers

Step 1. For each node U in


original DFG, draw J nodes
{Ui; 0 iJ-1} in the unfolded
DFG
Step 2. For each edge from
U to V with w delays, draw J
i w i 37 9 i 0,1,2
J 4 10
edges from Ui to V(i+w)%J with
(i+w)/J delays i3

(C) 1997-2006 by Yu Hen Hu ECE734 VLSI Arrays for Digital Signal Processing 7
Another DFG Unfolding Example
J=2 S0
i w (i+w)%J (i w) / J

0 0 0 0 Q0 T0
S
0 2 0 1
R0
0 3 1 1 Q T
2D 3D
1 0 1 0 S1
R
1 2 1 1
Q1 T1
1 3 0 2
T=3
R1
Step 1. Duplicate J copies of each node

(C) 1997-2006 by Yu Hen Hu ECE734 VLSI Arrays for Digital Signal Processing 8
Another DFG Unfolding Example
J=2 S0
i w (i+w)%J (i w) / J

0 0 0 0 Q0 T0
S
0 2 0 1
R0
0 3 1 1 Q T
2D 3D
1 0 1 0 S1
R
1 2 1 1
Q1 T1
1 3 0 2
T=3
R1
Step 2. Add all edges with 0 delay on them.

(C) 1997-2006 by Yu Hen Hu ECE734 VLSI Arrays for Digital Signal Processing 9
Another DFG Unfolding Example
J=2 S0
i w (i+w)%J (i w) / J

0 0 0 0 Q0 T0
S D
0 2 0 1
R0
0 3 1 1 Q T
D 2D
2D 3D
1 0 1 0 S1
R
1 2 1 1
Q1 T1
1 3 0 2
T=3 D

R1
Step 3. Use table on the left to figure
out edges with delays. T=6

(C) 1997-2006 by Yu Hen Hu ECE734 VLSI Arrays for Digital Signal Processing 10
Properties of Unfolding
Unfolding preserves the number A path with w (< J) delays in a
of registers (delays) in a DFG DFG will lead to J-w paths with
For a loop with w delays in a no delays, and w paths with 1
DFG that has been unfolded J delay each in the J-unfolded
times, it leads to DFG.
g.c.d.(w, J) loops in the Any path in the original DFG
unfolded DFG, with each of containing J or more delays
these loops containing leads to J paths with 1 or more
w/(g.c.d.(w,J)) delays and delay in each path. Therefore, it
J/(g.c.d.(w,J)) copies of each can not create a critical path in
node that appear in the original the J-unfolded DFG
loop. Any clock period that can be
Unfolding a DFG with iteration achieved by retiming a J-
bound T results in a J-folded unfolded DFG can be achieved
DFG with iteration bound JT. by retiming the original DFG
and followed by J-unfolding.

(C) 1997-2006 by Yu Hen Hu ECE734 VLSI Arrays for Digital Signal Processing 11