Homework #3 Solution: Department of Electrical and Computer Engineering University of Wisconsin - Madison

April 9, 2006 Yu Hen Hu
Department of Electrical and Computer Engineering

University of Wisconsin – Madison
ECE 734 VLSI Array Structures for Digital Signal Processing
Homework #3 Solution
This homework consists of questions taken from the notes and open-ended questions. You are
assigned as groups to do the homework. All problems are graded on CC scale (Completion only).
1. (10 points, CC) Consider the block diagram DFG below:
3D
2D
Assume that the addition and multiplication require 1 and 2 units of time respectively.
(a) (3 points, CC) What is the iteration bound T∞?
Answer: T∞ = Max. {4/3, 4/5} = 4/3 units of time.
(b) (7 points, CC) Find the minimum unfolding factor J such that the J-unfolded DFG can be
retimed so that the critical path of this unfolded and retimed DFT is JT∞.
Answer: Problem 5.13 of textbook.
2. (15 points, CC) Consider the DFG shown below where the computation time of each node is
labeled in parentheses:
(10) A E (8)
(2)
3D B
D
D (4)
(6) D C
(a) (3 points, CC) What is the iteration bound T∞?

Answer:
ABD: (10+2+6)/3 = 6; ABCD: (10+2+4+6)/4 = 5.5; AECD: (10+8+4+6)/4 = 7; BCD:
(2+4+6)/2 = 6. Hence T∞ = 7 u.t.
Page 1 of 7
PDF created with pdfFactory Pro trial version www.pdffactory.com
(b) (5 points, CC) Retime this DFG to minimize iteration period.

Answer: The path delays are: ABC = 16; ABD = 18; AEC = 22. First transfer 2
units from incoming edge of node A to outgoing edges. Then transfer 1 unit of delay
from incoming edge of node E to outgoing edge. Transfer 1 unit of delay from
incoming edge of node B to outgoing edges. This leads to the retimed diagram as
follows:
D
(10) A E (8)
D
(2) D
D B
D
D
D (4)
(6) D C
The iteration period = 10 u. t. which is that node computation time of node A. Note
that retiming does not yield an iteration period that is equal to T∞.
(c) (7 points, CC) Determine the minimum unfolding factor J such that the J-unfolded DFG
(unfold from the original DFG) can be retimed so that the critical path of this unfolded
and retimed DFG is JT∞ where T∞ is determined in part (a). Also plot the unfolded and
retimed DFG and identify its critical path whose delay is JT∞
Answer: Problem 5.16 text book.
3. (10 points, CC) Consider the iteration DG given below:
For each pair of scheduling vector s and projection (assignment) vector d, (i) determine if
they are permissible, (ii) if they are permissible, compute the processor space P, and perform
the node mapping, arc mapping, and I/O mapping.
1 0 1
Answer: Note that the dependence matrix is: D =  .
 0 1 1
Page 2 of 7
1  1 
(a) (6 points, CC) s =   ; d =  
0 0
Answer: (i) s D = [1 0 1], and sTd = 1. permissible. (ii) P = [0 1]T.
T
i  i 
Node mapping: PT   = [0 1]   = j .
 j  j
 sT   1 0  1 0 1 1 0 1
Arc mapping:  T  D =   = 
P   0 1   0 1 1  0 1 1
I/O mapping: Three inputs, one along [i, 1]T, one is [1, j]T, and the third one appear in
both of them.
 s T   i 1  1 0   i 1  i 1 
Input:  T   =  = 
 P  1 j  0 1  1 j  1 j 
 sT   i N  1 0   i N  i N
Output:  T   = =
P  N j   0 1   N j   N j 
1 1
(b) (4 points, CC) s =   ; d =  
 −1 0
Answer: (i) s D = [1 −1 0], and sTd = 1. NOT permissible.
T
4. (20 points, CC) Consider the correlation of two 1-dimensional sequences {x(n)} and {y(n)}
with 0 ≤ n ≤ N−1.
N − n −1
r ( n) = ∑
k =0
x ( k ) y ( k + n)
(a) (5 points, CC) Write a C program (or use any high level language) source code that
compute {r(n)} for N =5, and n = 0, 1, 2.
Answer: any HLL will be acceptable.
For n = 0:2,
r(n) = 0;
For k = 0:5-n-1,
r(n) = r(n) + x(k)*y(k+n);
end
end
(b) (5 points) Convert this program into single assignment format.
Answer:
For n = 0:2,
r1(n,0) = 0;
For k = 0:5-n-1,
r1(n,k) = r1(n,k-1) + x(k)*y(k+n);
end
r(n) = r1(n,5-n-1)
end
(c) (5 points) Convert this program so that it has local data communications.
Answer:
x1(0,k)=x(k), x1(n,k)=x(n-1,k), 0≤k≤5, 0≤n≤2
y1(n,0)=y(n), y1(n,k)=y1(n+1,k-1), 0≤k≤5, 0≤n≤2
Page 3 of 7
r1(0,k)=0, r1(n,k)=r1(n,k-1), 0≤k≤5, 0≤n≤2

(d) (5 points, CC) Plot the corresponding iterative DG.
5. (20 points, CC) (Binary number multiplication) Consider two unsigned, n-bit binary numbers
A = An−1An−2 … A1A0, and B = Bn−1Bn−2 … B1B0, where Ai, Bi ∈ {0, 1}, 0 ≤ i ≤ n−1. Denote C
= A × B = C2n−2C2n−3 … C1C0 to be an 2n−1 bit binary number representing the product of A
and B.
(a) (5 points,CC) Derive an mathematical expression of Ck in terms of Ais and Bis.
n −1 n −1
Answer: Note that A = ∑ Ai 2i , and B = ∑ Bi 2i . Hence,
i =0 i =0
n −1 n −1 2 n− 2
A × B = ∑∑ ( Ai ⋅ B j ) 2i + j = ∑ c% 2 k
k
. By comparison of coefficients, we have
i =0 j = 0 k =0
 k
min{ k , n −1}  ∑ Ai Bk −i k ≤ n − 1;
 i =0
c%k = ∑ Ai Bk −i =  n −1

i =∑
i = max{0, k − n +1}
Ai Bk −i n ≤ k ≤ 2n − 2
k − n +1
However, the range of c%k may be greater than 1. Thus, a final evaluation of
2 n− 2 2 n −2
C = ∑ c%k 2k = ∑C 2 k
k
with Ck ∈ {0, 1}
k =0 k =0
will be needed. To incorporate carry propagation, finer granularity of expressions will be

needed. Consider the addition of two binary number p and q, with a carry-in cin. There
will be two outputs:
Sum s = p⊕ q⊕ cin, and carry-out cout = pq + qcin + cinp.
Instead of computing Ck at once, we adopt a partial-product notation.
(b) (5 points) Write a C or Matlab® program as a nested loop construct to compute Ck for 0 ≤
k ≤ 2n−2. Make sure this nested loop has single-assignment format. Use variable indices
as A(I), B(I), C(K), etc.
Answer:
FOR K=0:2N-2,
C1(K,0)=0;
FOR I=MAX(0,K-N+1):MIN(K,N)
C1(K,I)=C(K,I-1)+A(I)*B(K-I);
END
C(K)=C1(K,MAX(K,N));
END
(c) (5 points) Localize the program in part (b) by converting all variables that need to be
broadcast (used by more than one iterations) during the execution into transmittal
variables that will be passed along from an iteration to the next.
Answer:
C1(K,0)=0; 0 ≤ K ≤ 2N-2
A1(K,0)=A(I); 0 ≤ K ≤ 2N-2, 0 ≤ I ≤ N-1
B1(K,0)=B(K); 0 ≤ K ≤ N-1
C1(K,I)=C1(K,I-1)+A1(K,I)*B1(K,I); 0 ≤ K ≤ 2N-2, 0 ≤ I ≤ N-1
Page 4 of 7
A1(K,I)=A1(K,I-1); 0 ≤ K ≤ 2N-2, 0 ≤ I ≤ N-1

B1(K,I)=B1(K-1,I-1); 0 ≤ K ≤ 2N-2, 0 ≤ I ≤ N-1
C(k) = C1(k,N)
(d) (5 points, CC) Plot the logic schematic diagram of the architecture to implement the
operations of the loop body as specified in part (c). Note that A(I), B(I) are both binary
variables.
Answer:
(e) (5 points) For n = 4, plot the DG of the localized program in part (c), and identify the
critical path of this DG. A hardware implementation of this DG is called an array
multiplier.
(f) (5 points) Use cut-set retiming to convert the DG directly into a 2D systolic array that
implements a fully pipelined version of the array multiplier.
6. (10 points) Saturation arithmetic

Saturation arithmetic is a convenient, nonlinear method of handling overflow of fixed-point
computation in DSP applications. Based on the types of two operands and one results, there
are 8 possible combinations. In practice, the three commonly used options are sss, uuu, and
uus where u stands for unsigned, and s stands for signed. In Intel MMX and SSE-2
instruction set, only uuu, and sss are implemented. The former is called unsigned saturation
arithmetic, and the latter is called signed saturation arithmetic. Suppose now there are two
packed operands each consists of four 16-bit words as shown below
Ra: 58 14 12 77
Rb: 22 192 118 36

The content is given in decimal format for convenience. Refer to MMX instruction set,
(a) (5 points, CC) Write a MMX assembly code segment to compute the maximum of each
pair of integers at the corresponding position. Assuming the result is stored in register Rc.
The final content of Rc should be
Rc: 58 192 118 77
You should prove that your program is correct. (Hing: Use unsigned saturation
arithmetic)
Answer: Let us consider on one pair of these integers a and b. If we perform
unsigned subtraction
PSUBUSW Rc, Ra, Rb
(packed subtraction with unsigned saturation arithmetic on each 16-bit word unit),
then the content of the Rc register will be
Rc: 36 0 0 41
a − b a ≥ b;
Note that unsigned − saturation ( a − b ) =  Now adding b to the result, we
0 a < b.
a − b a ≥ b; a a ≥ b;
have unsigned − saturation ( a − b ) + b = b +  = = Max {a, b}
0 a < b. b a < b.
This can be performed with usual packed addition since we are adding two unsigned
numbers, and the result will not have overflow! Thus the program looks like this:
Page 5 of 7
Operation Content of Rc AFTER operation

PSUBUSW Rc, Ra, Rb 36 0 0 41
PADDW Rc, Rc, Rb 58 192 118 77
(b) (5 points, CC) Still given Ra and Rb with their original content shown in part (a) of this
problem. Now, our goal is to perform absolute difference between each pair of the words
in Ra and Rb. Using registers Re, Rf, …to store intermediate results. The final result
should be stored in Rd. The absolute difference between a and b is |a − b|.
Answer:
a − b a ≥ b;
PADDUSW Re, Ra, Rb unsigned − saturation ( a − b ) = 
0 a ≤ b.
0 a ≥ b;
PADDUSW Rf, Rb, Ra unsigned − saturation ( b − a ) = 
b − a a ≤ b.
PADDW Rd, Re, Rf Regular packed addition = |a − b|
7. (15 points, CC)

Consider the RIA implementation of the FIR filter
y1(n,-1)=0, n = 0,1,2, …
h1(0,k)=h(k), k=0,…,N-1
n=0,1,2,…, and k=0,…,N-1
y1(n,k)=y1(n,k-1)+h1(n,k)*x1(n,k)
h1(n,k)=h1(n-1,k)
x1(n,k)=x1(n-1,k-1)
y(n)=y1(n,N-1), n=0,1,2,
And the corresponding DG:
y(4) y(5) y(6)

k
y(3)
h(4)
y(2)
h(3)
h(2) y(1)
h(1) y(0)
h(0)
n
x(0) x(1) x(2) x(3) x(4) x(5) x(6)
Page 6 of 7
(a) (5 points, CC) Consider a uni-modular transformation matrix

 1 −1
0 1 
 
Find the corresponding dependence matrix after this transformation,
 1 −1 1 0 1 1 −1 0 
Answer:   = 
 0 1   0 1 1  0 1 1 
(b) (5 points, CC) Plot the corresponding DG, and
k y(5) y(6) y(7) y(8) y(9) y(10)

y(4)
h(4) D
y(3) D
h(3) D
y(2) D
h(2) D
y(1) D
h(1) D
y(0) D
h(0) D
n
x(0) x(1) x(2) x(3) x(4) x(5) x(6) d =s

(c) (5 points, CC) Write down the corresponding RIA formulation
Answer: The RIA equations are of the following form:
y1(n,-1)=0, n = 0,1,2, …
h1(0,k)=h(k), k=0,…,N-1
n=0,1,2,…, and k=0,…,N-1
y1(n,k)=y1(n+1,k-1)+h1(n,k)*x1(n,k)
h1(n,k)=h1(n-1,k)
x1(n,k)=x1(n,k-1)
y(n)=y1(n,N-1), n=0,1,2,
Page 7 of 7

Homework #3 Solution: Department of Electrical and Computer Engineering University of Wisconsin - Madison

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Homework #3 Solution: Department of Electrical and Computer Engineering University of Wisconsin - Madison

Uploaded by

Copyright:

Available Formats

April 9, 2006 Yu Hen Hu

Department of Electrical and Computer Engineering

1. (10 points, CC) Consider the block diagram DFG below:

(a) (3 points, CC) What is the iteration bound T∞?

(b) (5 points, CC) Retime this DFG to minimize iteration period.

r1(0,k)=0, r1(n,k)=r1(n,k-1), 0≤k≤5, 0≤n≤2

will be needed. To incorporate carry propagation, finer granularity of expressions will be

A1(K,I)=A1(K,I-1); 0 ≤ K ≤ 2N-2, 0 ≤ I ≤ N-1

6. (10 points) Saturation arithmetic

Rb: 22 192 118 36

Operation Content of Rc AFTER operation

7. (15 points, CC)

y(4) y(5) y(6)

x(0) x(1) x(2) x(3) x(4) x(5) x(6)

(a) (5 points, CC) Consider a uni-modular transformation matrix

(b) (5 points, CC) Plot the corresponding DG, and

k y(5) y(6) y(7) y(8) y(9) y(10)

x(0) x(1) x(2) x(3) x(4) x(5) x(6) d =s

You might also like