You are on page 1of 7

April 9, 2006 Yu Hen Hu

Department of Electrical and Computer Engineering


University of Wisconsin – Madison
ECE 734 VLSI Array Structures for Digital Signal Processing

Homework #3 Solution
This homework consists of questions taken from the notes and open-ended questions. You are
assigned as groups to do the homework. All problems are graded on CC scale (Completion only).

1. (10 points, CC) Consider the block diagram DFG below:

3D

2D

Assume that the addition and multiplication require 1 and 2 units of time respectively.
(a) (3 points, CC) What is the iteration bound T∞?
Answer: T∞ = Max. {4/3, 4/5} = 4/3 units of time.
(b) (7 points, CC) Find the minimum unfolding factor J such that the J-unfolded DFG can be
retimed so that the critical path of this unfolded and retimed DFT is JT∞.
Answer: Problem 5.13 of textbook.
2. (15 points, CC) Consider the DFG shown below where the computation time of each node is
labeled in parentheses:

(10) A E (8)

(2)
3D B
D

D (4)
(6) D C

(a) (3 points, CC) What is the iteration bound T∞?


Answer:
ABD: (10+2+6)/3 = 6; ABCD: (10+2+4+6)/4 = 5.5; AECD: (10+8+4+6)/4 = 7; BCD:
(2+4+6)/2 = 6. Hence T∞ = 7 u.t.

Page 1 of 7
PDF created with pdfFactory Pro trial version www.pdffactory.com
April 9, 2006 Yu Hen Hu

(b) (5 points, CC) Retime this DFG to minimize iteration period.


Answer: The path delays are: ABC = 16; ABD = 18; AEC = 22. First transfer 2
units from incoming edge of node A to outgoing edges. Then transfer 1 unit of delay
from incoming edge of node E to outgoing edge. Transfer 1 unit of delay from
incoming edge of node B to outgoing edges. This leads to the retimed diagram as
follows:

D
(10) A E (8)
D
(2) D
D B
D
D
D (4)
(6) D C

The iteration period = 10 u. t. which is that node computation time of node A. Note
that retiming does not yield an iteration period that is equal to T∞.
(c) (7 points, CC) Determine the minimum unfolding factor J such that the J-unfolded DFG
(unfold from the original DFG) can be retimed so that the critical path of this unfolded
and retimed DFG is JT∞ where T∞ is determined in part (a). Also plot the unfolded and
retimed DFG and identify its critical path whose delay is JT∞
Answer: Problem 5.16 text book.
3. (10 points, CC) Consider the iteration DG given below:

For each pair of scheduling vector s and projection (assignment) vector d, (i) determine if
they are permissible, (ii) if they are permissible, compute the processor space P, and perform
the node mapping, arc mapping, and I/O mapping.
1 0 1
Answer: Note that the dependence matrix is: D =  .
 0 1 1

Page 2 of 7
PDF created with pdfFactory Pro trial version www.pdffactory.com
April 9, 2006 Yu Hen Hu

1  1 
(a) (6 points, CC) s =   ; d =  
0 0
Answer: (i) s D = [1 0 1], and sTd = 1. permissible. (ii) P = [0 1]T.
T

i  i 
Node mapping: PT   = [0 1]   = j .
 j  j
 sT   1 0  1 0 1 1 0 1
Arc mapping:  T  D =   = 
P   0 1   0 1 1  0 1 1
I/O mapping: Three inputs, one along [i, 1]T, one is [1, j]T, and the third one appear in
both of them.
 s T   i 1  1 0   i 1  i 1 
Input:  T   =  = 
 P  1 j  0 1  1 j  1 j 
 sT   i N  1 0   i N  i N
Output:  T   = =
P  N j   0 1   N j   N j 

1 1
(b) (4 points, CC) s =   ; d =  
 −1 0
Answer: (i) s D = [1 −1 0], and sTd = 1. NOT permissible.
T

4. (20 points, CC) Consider the correlation of two 1-dimensional sequences {x(n)} and {y(n)}
with 0 ≤ n ≤ N−1.
N − n −1
r ( n) = ∑
k =0
x ( k ) y ( k + n)

(a) (5 points, CC) Write a C program (or use any high level language) source code that
compute {r(n)} for N =5, and n = 0, 1, 2.
Answer: any HLL will be acceptable.
For n = 0:2,
r(n) = 0;
For k = 0:5-n-1,
r(n) = r(n) + x(k)*y(k+n);
end
end
(b) (5 points) Convert this program into single assignment format.
Answer:
For n = 0:2,
r1(n,0) = 0;
For k = 0:5-n-1,
r1(n,k) = r1(n,k-1) + x(k)*y(k+n);
end
r(n) = r1(n,5-n-1)
end
(c) (5 points) Convert this program so that it has local data communications.
Answer:
x1(0,k)=x(k), x1(n,k)=x(n-1,k), 0≤k≤5, 0≤n≤2
y1(n,0)=y(n), y1(n,k)=y1(n+1,k-1), 0≤k≤5, 0≤n≤2

Page 3 of 7
PDF created with pdfFactory Pro trial version www.pdffactory.com
April 9, 2006 Yu Hen Hu

r1(0,k)=0, r1(n,k)=r1(n,k-1), 0≤k≤5, 0≤n≤2


(d) (5 points, CC) Plot the corresponding iterative DG.
5. (20 points, CC) (Binary number multiplication) Consider two unsigned, n-bit binary numbers
A = An−1An−2 … A1A0, and B = Bn−1Bn−2 … B1B0, where Ai, Bi ∈ {0, 1}, 0 ≤ i ≤ n−1. Denote C
= A × B = C2n−2C2n−3 … C1C0 to be an 2n−1 bit binary number representing the product of A
and B.
(a) (5 points,CC) Derive an mathematical expression of Ck in terms of Ais and Bis.
n −1 n −1
Answer: Note that A = ∑ Ai 2i , and B = ∑ Bi 2i . Hence,
i =0 i =0

n −1 n −1 2 n− 2
A × B = ∑∑ ( Ai ⋅ B j ) 2i + j = ∑ c% 2 k
k
. By comparison of coefficients, we have
i =0 j = 0 k =0

 k
min{ k , n −1}  ∑ Ai Bk −i k ≤ n − 1;
 i =0
c%k = ∑ Ai Bk −i =  n −1

i =∑
i = max{0, k − n +1}
Ai Bk −i n ≤ k ≤ 2n − 2
k − n +1

However, the range of c%k may be greater than 1. Thus, a final evaluation of
2 n− 2 2 n −2
C = ∑ c%k 2k = ∑C 2 k
k
with Ck ∈ {0, 1}
k =0 k =0

will be needed. To incorporate carry propagation, finer granularity of expressions will be


needed. Consider the addition of two binary number p and q, with a carry-in cin. There
will be two outputs:
Sum s = p⊕ q⊕ cin, and carry-out cout = pq + qcin + cinp.
Instead of computing Ck at once, we adopt a partial-product notation.
(b) (5 points) Write a C or Matlab® program as a nested loop construct to compute Ck for 0 ≤
k ≤ 2n−2. Make sure this nested loop has single-assignment format. Use variable indices
as A(I), B(I), C(K), etc.
Answer:
FOR K=0:2N-2,
C1(K,0)=0;
FOR I=MAX(0,K-N+1):MIN(K,N)
C1(K,I)=C(K,I-1)+A(I)*B(K-I);
END
C(K)=C1(K,MAX(K,N));
END
(c) (5 points) Localize the program in part (b) by converting all variables that need to be
broadcast (used by more than one iterations) during the execution into transmittal
variables that will be passed along from an iteration to the next.
Answer:
C1(K,0)=0; 0 ≤ K ≤ 2N-2
A1(K,0)=A(I); 0 ≤ K ≤ 2N-2, 0 ≤ I ≤ N-1
B1(K,0)=B(K); 0 ≤ K ≤ N-1
C1(K,I)=C1(K,I-1)+A1(K,I)*B1(K,I); 0 ≤ K ≤ 2N-2, 0 ≤ I ≤ N-1

Page 4 of 7
PDF created with pdfFactory Pro trial version www.pdffactory.com
April 9, 2006 Yu Hen Hu

A1(K,I)=A1(K,I-1); 0 ≤ K ≤ 2N-2, 0 ≤ I ≤ N-1


B1(K,I)=B1(K-1,I-1); 0 ≤ K ≤ 2N-2, 0 ≤ I ≤ N-1
C(k) = C1(k,N)
(d) (5 points, CC) Plot the logic schematic diagram of the architecture to implement the
operations of the loop body as specified in part (c). Note that A(I), B(I) are both binary
variables.
Answer:
(e) (5 points) For n = 4, plot the DG of the localized program in part (c), and identify the
critical path of this DG. A hardware implementation of this DG is called an array
multiplier.
(f) (5 points) Use cut-set retiming to convert the DG directly into a 2D systolic array that
implements a fully pipelined version of the array multiplier.

6. (10 points) Saturation arithmetic


Saturation arithmetic is a convenient, nonlinear method of handling overflow of fixed-point
computation in DSP applications. Based on the types of two operands and one results, there
are 8 possible combinations. In practice, the three commonly used options are sss, uuu, and
uus where u stands for unsigned, and s stands for signed. In Intel MMX and SSE-2
instruction set, only uuu, and sss are implemented. The former is called unsigned saturation
arithmetic, and the latter is called signed saturation arithmetic. Suppose now there are two
packed operands each consists of four 16-bit words as shown below
Ra: 58 14 12 77

Rb: 22 192 118 36


The content is given in decimal format for convenience. Refer to MMX instruction set,
(a) (5 points, CC) Write a MMX assembly code segment to compute the maximum of each
pair of integers at the corresponding position. Assuming the result is stored in register Rc.
The final content of Rc should be
Rc: 58 192 118 77
You should prove that your program is correct. (Hing: Use unsigned saturation
arithmetic)
Answer: Let us consider on one pair of these integers a and b. If we perform
unsigned subtraction
PSUBUSW Rc, Ra, Rb
(packed subtraction with unsigned saturation arithmetic on each 16-bit word unit),
then the content of the Rc register will be
Rc: 36 0 0 41
a − b a ≥ b;
Note that unsigned − saturation ( a − b ) =  Now adding b to the result, we
0 a < b.
a − b a ≥ b; a a ≥ b;
have unsigned − saturation ( a − b ) + b = b +  = = Max {a, b}
0 a < b. b a < b.
This can be performed with usual packed addition since we are adding two unsigned
numbers, and the result will not have overflow! Thus the program looks like this:

Page 5 of 7
PDF created with pdfFactory Pro trial version www.pdffactory.com
April 9, 2006 Yu Hen Hu

Operation Content of Rc AFTER operation


PSUBUSW Rc, Ra, Rb 36 0 0 41
PADDW Rc, Rc, Rb 58 192 118 77
(b) (5 points, CC) Still given Ra and Rb with their original content shown in part (a) of this
problem. Now, our goal is to perform absolute difference between each pair of the words
in Ra and Rb. Using registers Re, Rf, …to store intermediate results. The final result
should be stored in Rd. The absolute difference between a and b is |a − b|.
Answer:
a − b a ≥ b;
PADDUSW Re, Ra, Rb unsigned − saturation ( a − b ) = 
0 a ≤ b.
0 a ≥ b;
PADDUSW Rf, Rb, Ra unsigned − saturation ( b − a ) = 
b − a a ≤ b.
PADDW Rd, Re, Rf Regular packed addition = |a − b|

7. (15 points, CC)


Consider the RIA implementation of the FIR filter
y1(n,-1)=0, n = 0,1,2, …
h1(0,k)=h(k), k=0,…,N-1
n=0,1,2,…, and k=0,…,N-1
y1(n,k)=y1(n,k-1)+h1(n,k)*x1(n,k)
h1(n,k)=h1(n-1,k)
x1(n,k)=x1(n-1,k-1)
y(n)=y1(n,N-1), n=0,1,2,
And the corresponding DG:

y(4) y(5) y(6)


k
y(3)
h(4)

y(2)
h(3)

h(2) y(1)

h(1) y(0)

h(0)
n

x(0) x(1) x(2) x(3) x(4) x(5) x(6)

Page 6 of 7
PDF created with pdfFactory Pro trial version www.pdffactory.com
April 9, 2006 Yu Hen Hu

(a) (5 points, CC) Consider a uni-modular transformation matrix


 1 −1
0 1 
 
Find the corresponding dependence matrix after this transformation,
 1 −1 1 0 1 1 −1 0 
Answer:   = 
 0 1   0 1 1  0 1 1 

(b) (5 points, CC) Plot the corresponding DG, and

k y(5) y(6) y(7) y(8) y(9) y(10)


y(4)
h(4) D
y(3) D
h(3) D
y(2) D
h(2) D
y(1) D
h(1) D
y(0) D
h(0) D
n

x(0) x(1) x(2) x(3) x(4) x(5) x(6) d =s


(c) (5 points, CC) Write down the corresponding RIA formulation
Answer: The RIA equations are of the following form:
y1(n,-1)=0, n = 0,1,2, …
h1(0,k)=h(k), k=0,…,N-1
n=0,1,2,…, and k=0,…,N-1
y1(n,k)=y1(n+1,k-1)+h1(n,k)*x1(n,k)
h1(n,k)=h1(n-1,k)
x1(n,k)=x1(n,k-1)
y(n)=y1(n,N-1), n=0,1,2,

Page 7 of 7
PDF created with pdfFactory Pro trial version www.pdffactory.com

You might also like