# ADSD Fall 2011

Lecture # 11

Dr. Rehan Hafiz

<rehan.hafiz@seecs.edu.pk>

Course Website for ADSD Fall 2011
2

http://lms.nust.edu.pk/
Acknowledgement: Material from the following sources has been consulted/used in these slides: 1. [CIL] Advanced Digital Design with the Verilog HDL, M D. Ciletti 2. [SHO] Digital Design of Signal Processing System by Dr Shoab A Khan 3. [STV] Advanced FPGA Design, Steve Kilts 4. Ercegovac’s Book: “Digital Arithmetic” 2004 5. Dr. Shoab A Khan’s CASE Lectures on Advanced Digital System Design
Material/Slides from these slides CAN be used with following citing reference: Dr. Rehan Hafiz: Advanced Digital System Design 2010 Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

Lectures: Contact: Office:

Tuesday @ 5:30-6:20 pm, Friday @ 6:30-7:20 pm By appointment/Email VISpro Lab above SEECS Library

Lecture Overview
3

Last Lecture

This Lecture
 Where

& Why Multi Operand Addition  Multi-Operand Addition <with Focus on Low Latency Design>
 Carry

Save Adders  Wallace Compression Tree  Dada Compression Tree

a[7] a[6] a[5] a[4] a[3] a[2] a[1] a[0]

a[7]+a[6]

a[5]+a[4]

a[3]+a[2]

a[1]+a[0]

a[7]+a[6]+a[5]+a[4]

a[3]+a[2]+a[1]+a[0]

a[7]+a[6]+a[5]+a[4]+a[3]+a[2]+a[1]+a[0]

Example: Matrix Multiplication
5

Each term requires:

Addition of 4 products

Where each product requires addition of 8 partial products as a result of multiplication (assuming 8 bit numbers)

A total of (28+3) additions

So above multiplication requires 465 additions in total

Dot Representation
6

 

Useful, simplified notation Useful when positioning or alignment of the bits, rather than there values, is important.

Each dot represents a digit in a positional number system. Dots in the same column have the same positional weight.

........ ........ ........ ........ ........ ........ ........ ........ ........ ........ ...............

So should we use Ripple Carry Adders (RCA)/Carry Propagate Adders (CPAs) everywhere ?
7

Addition, FIR Filtering, Matrix Multiplication
 Accumulate

OR --- Just Delay the Computation of final result

Compression Trees
8

   

Counter (m,n) Carry Save Dual Carry Save Wallace Tree Dada Tree

(m,n) Counter
as Compression Trees / Term Reducers
9

An (m, n) counter takes as input m bits (all of the same power-of-2 weight) and produces an n-bit binary number whose value is the number of inputs that are equal to 1. Counts the number of 1s in the input and outputs the binary count value. Example: (3, 2) counter.
 

Of the 3 inputs, there can be either 0, 1, 2 or 3 inputs equal to 1. All four of these values can be represented as a 2-bit binary number. A (3, 2) counter is nothing but a full adder, where the sum is the LSB count output and the carry-out is the MSB count output

10

Successively reduce 3 input vectors to 2 output vectors,
i.e. a sum vector and a carry vector.  Each bit of these two vectors are computed independently of all other bits, and there is no carry propagation between adjacent bit positions.

Implements a compression from 3 vectors X, Y and Z to 2 vectors S and C.

H.W: Parallel set of (3, 2) counters, i.e. a parallel set of full adders.

Adding 3 8-bit numbers
11

Operand-1

0

1

1

0

1

1

1

0

Operand-2
Operand-3 Sum bits Carry bits

0
0 0 0 1

1
0 0 0

0
0 1 0

1
0 1 0

0
0 1 1

1
1 1 1

1
0 0 0

0
0 0

Now add the two operands using a Carry Propagation Adder

1 Carry Save Adder (CSA) + 1 CPA

of 2 CPA

1 FA Delay per CSA

Carry Save Adder in Dot Notation
12

Example-8 bit addition of 4 numbers
13

carries 1 0 0

1 1 1

1 1 0

2 0 1

2 1 0

1 1 1

0 1 1 0 0

0
0 Sum 1

0
0 1

0
0 0

0
0 1

0
1 0

1
1 1

0
1 1

0
0 0

What can be done ?

CSA

0 0 0 0

1 1 0 0

1 0 0 1

0 1 0 1

1 0 0 1

1 1 1 1

1 1 0 0

0 0 0 0

14

0 CSA
0

1
0 1 0 1

0
0 0 0 0 0

0
1 0 0 1 0

0
1 0 0 1 1

1
1 1 1 1 1

1
1 1 1 1 0

0
0 0 1 1 0 0 0 0 0

0

0

CPA 0 0

1 0 1

0 0 1

1 0 0

1 1 1

1 1 0

1 0 1

1 0 1

0 0

15

Carry Save Reduction
16

 

Level 0 shows all the PPs The carry save reduction scheme takes first 3 PP layers at Level 0 Reduces these to 2-PP layers using 3-2 and 2-2 compressors The rest of the partial product rows are not touched at this level of reduction The resulting two rows from the previous reduction level are then again added together with the PP4 row in the next level At each level, three partial product rows are added, resulting in two output rows The process continues until the array is reduced to no more than two rows The number of levels for N PP layers is N-2 (staring from Level 0)

Carry Save Architecture
17

Level 1

HA

FA

FA

FA

FA

FA

HA

P0

Level 2

HA

FA

FA

FA

FA

FA

FA

P1

Level 3

HA

FA

FA

FA

FA

FA

FA

P2

Level 4

HA

FA

FA

FA

FA

FA

FA

P3

PC10

PS9 PC9

PS8 PC8

PS 7PC 7

PS 6PC 6

PS 5PC 5

PS4 PC4

PS 3

Free product bits

Adding 6 Terms – Using DUAL CSA
18

The partial product row matrix is divided into two equal size groups Two Simultaneous operations on the partial product matrix The resulting 4 terms are again reduced using CSA Architecture

Wallace Tree
19

Can reduce N binary numbers to two numbers in O(log N) levels.

Simultaneous CSA (Carry Save) operation is applied on all possible three TERMS (partial products) to be added
The same technique is repeated on this matrix

Level:0

Again grouping of these rows into three is done

Each group is reduced to two rows simultaneously
This process continues until only two rows are left

Level:1

Level:2

The final rows are added together for the final product

Level:3

20

Wallace Tree 7 Operands Example
21

Wallace Tree – Architecture 6 Operands
22

Compressing 6 operands P0, P1, …P5 to 2 vectors S and C. This can be done using 3 levels of CSAs.

The left arrow on some CSA inputs means that that vector is shifted left by one bit position to account for the fact that it is a carry vector output of a prior CSA.

Wallace Tree – Architecture 8 Operands
23

No. of Operands Vs. No. of Full Adder Levels
24

Number Operands 3 4

No. of full adder Levels 1 2

5n6 7n9
10  n  13 14  n  19 20  n  28 29  n  42 43  n  63

3 4
5 6 7 8 9
Assuming Level 0 = 1st level

25

 

An area optimized Wallace tree for partial products ! Requires the same number of adder levels It uses less number of computational elements as compared to Wallace tree It corresponds to less power dissipation and less area

26

Number Operands 3 4 5n6 7n9 10  n  13 14  n  19 20  n  28 29  n  42 43  n  63

No. of full adder Levels 1 2 3 4 5 6 7 8 9

For each column; reduce only to the extent so that the number of PP in the next level = to the maximum of the range of Number of Operands in the Wallace tree table
27

Need to compress these since > 6 Also need to take care of carries coming from previous column, e.g from 7th to 8th L 1 2 3 4 FAs 3 12 9 11 35 HAs 3 2 1 1 7

No Need to reduce these columns; since these already have 6 PP/operands

FAs 12 13 6 8 39

HA 4 3 4 3 14

28

Ron S. Waters, Earl E. Swartzlander, "A Reduced Complexity Wallace Multiplier Reduction," IEEE Transactions on Computers, vol. 59, no. 8, pp. 1134-1137, Apr. 2010,

QUIZ
29

Quiz – 20 Minutes
30

You have to design the architecture that is able to perform the addition of 4 8-bit numbers with minimum possible latency.

Questions  Draw micro-architecture of your design [5 Marks]  Report the latency of your design in terms of Gate Delays [3 Marks]  Report the latency of your design in terms of Gate Delays if you were using Ripple Carry Adder for these 4 8-bit numbers[2 Marks]
Assumptions  Assume single gate delay for all gates including XOR gate  Assume 2 Gate Delay for a FA  You do not need to keep care about FAN OUTs/Ins  For your selection of adder; you just need to draw its block diagram

31

Lecture Ends !

Assignment-02
32

You have to explain the micro-architecture & write the Verilog Code for addition of 8 8-bit binary numbers for the following cases and compare their performances (as reported by Xilinx)
 Ripple

Carry Adder  Carry Select Adder  Wallace Compression Tree

Duration: 2 Weeks

Quiz
33

Sign up to vote on this title