unfolding

Definitions

unfolding a loop so that Reducing sampling period to

several iterations are achieve iteration bound

unrolled into the same (desired throughput rate) T.

iteration. Parallel (block processing) to

Also known as (a.k.a.) execute several iterations

concurrently.

Loop unrolling (in compilers

for parallel programs) Digit-serial or bit-serial

processing

Block processing

(C) 1997-2006 by Yu Hen Hu ECE734 VLSI Arrays for Digital Signal Processing 2

An example

Before unfolding: Block processing formulation

For n = 0 to N-1, J = 3, 9/J = 3 (an integer)

y(n)=a*y(n-9)+x(n) X(k) = [x(3k) x(3k+1) x(3k+2)]T

end

Y(k) = [y(3k) y(3k+1) y(3k+2)]T

Unfolding once (J = 2)

For k = 0 to N/2-1,

Y(k) = a*Y(k- 3 ) + X(k)

y(2k)=a*y(2k-9)+x(2k) J = 2, 9/J = 5 (not an integer)

y(2k+1)=a*y(2k-8)+x(2k+1) X(k) = [x(2k) x(2k+1)]T

end

Y(k) = [y(2k) y(2k+1)]T

Unfolding twice (J = 3)

Y(k) = a*Y(k- 5 ) + X(k)

For k = 0 to N/3-1,

y(3k)=a*y(3k-9)+x(3k)

y(3k+1)=a*y(3k-8)+x(3k+1)

y(3k+2)=a*y(3k-7)+x(3k+2)

end

(C) 1997-2006 by Yu Hen Hu ECE734 VLSI Arrays for Digital Signal Processing 3

Implementation with J=3

3Ts

Ts y(0) Ts

y(1)

+ X

Serial-to-parallel conversion

parallel-to-Serial conversion

D

y(2)

y(3)

y(4)

y(5)

x(0) + X D .

x(1) .

x(2) .

x(3)

x(4)

x(5) + X D

.

.

.

(C) 1997-2006 by Yu Hen Hu ECE734 VLSI Arrays for Digital Signal Processing 4

Unfolding the DFG

y(2k)=a*y(2k-9)+x(2k)

y(2k+1)=a*y(2k-8)+x(2k+1) T=Ts

y(2k)=a*y(2(k-5)+1)+x(2k)

y(2k+1)=a*y(2(k-4))+x(2k+1)

After J-folded unfolding, the clock

T=J Ts

period T = J Ts, where Ts is the data

sampling period.

(C) 1997-2006 by Yu Hen Hu ECE734 VLSI Arrays for Digital Signal Processing 5

Timing Diagram

y(0) y(1) y(2) y(3) y(4) y(5) y(6) y(7) y(8) y(9) y(10) y(11) y(12) y(13)

9T

T=Ts 9T

T=2Ts

y(0) y(2) y(4) y(6) y(8) y(10) y(12)

4T

5T

y(1) y(3) y(5) y(7) y(9) y(11) y(13)

obtained assuming that the output (y(0), y(1)) will be

sampling period Ts remains needed by two different future

unchanged. Thus, the clock iterations, 4T and 5T later.

period T is increased J-fold.

(C) 1997-2006 by Yu Hen Hu ECE734 VLSI Arrays for Digital Signal Processing 6

General DFG Unfolding Method

Define

x largest integer that x;

x Smallest integer that x;

a%b a - b a / b , a, b are integers

original DFG, draw J nodes

{Ui; 0 iJ-1} in the unfolded

DFG

Step 2. For each edge from

U to V with w delays, draw J

i w i 37 9 i 0,1,2

J 4 10

edges from Ui to V(i+w)%J with

(i+w)/J delays i3

(C) 1997-2006 by Yu Hen Hu ECE734 VLSI Arrays for Digital Signal Processing 7

Another DFG Unfolding Example

J=2 S0

i w (i+w)%J (i w) / J

0 0 0 0 Q0 T0

S

0 2 0 1

R0

0 3 1 1 Q T

2D 3D

1 0 1 0 S1

R

1 2 1 1

Q1 T1

1 3 0 2

T=3

R1

Step 1. Duplicate J copies of each node

(C) 1997-2006 by Yu Hen Hu ECE734 VLSI Arrays for Digital Signal Processing 8

Another DFG Unfolding Example

J=2 S0

i w (i+w)%J (i w) / J

0 0 0 0 Q0 T0

S

0 2 0 1

R0

0 3 1 1 Q T

2D 3D

1 0 1 0 S1

R

1 2 1 1

Q1 T1

1 3 0 2

T=3

R1

Step 2. Add all edges with 0 delay on them.

(C) 1997-2006 by Yu Hen Hu ECE734 VLSI Arrays for Digital Signal Processing 9

Another DFG Unfolding Example

J=2 S0

i w (i+w)%J (i w) / J

0 0 0 0 Q0 T0

S D

0 2 0 1

R0

0 3 1 1 Q T

D 2D

2D 3D

1 0 1 0 S1

R

1 2 1 1

Q1 T1

1 3 0 2

T=3 D

R1

Step 3. Use table on the left to figure

out edges with delays. T=6

(C) 1997-2006 by Yu Hen Hu ECE734 VLSI Arrays for Digital Signal Processing 10

Properties of Unfolding

Unfolding preserves the number A path with w (< J) delays in a

of registers (delays) in a DFG DFG will lead to J-w paths with

For a loop with w delays in a no delays, and w paths with 1

DFG that has been unfolded J delay each in the J-unfolded

times, it leads to DFG.

g.c.d.(w, J) loops in the Any path in the original DFG

unfolded DFG, with each of containing J or more delays

these loops containing leads to J paths with 1 or more

w/(g.c.d.(w,J)) delays and delay in each path. Therefore, it

J/(g.c.d.(w,J)) copies of each can not create a critical path in

node that appear in the original the J-unfolded DFG

loop. Any clock period that can be

Unfolding a DFG with iteration achieved by retiming a J-

bound T results in a J-folded unfolded DFG can be achieved

DFG with iteration bound JT. by retiming the original DFG

and followed by J-unfolding.

(C) 1997-2006 by Yu Hen Hu ECE734 VLSI Arrays for Digital Signal Processing 11

