This action might not be possible to undo. Are you sure you want to continue?

# ADSD Fall 2011

Lecture # 11

Dr. Rehan Hafiz

<rehan.hafiz@seecs.edu.pk>

**Course Website for ADSD Fall 2011
**

2

**http://lms.nust.edu.pk/
**

Acknowledgement: Material from the following sources has been consulted/used in these slides: 1. [CIL] Advanced Digital Design with the Verilog HDL, M D. Ciletti 2. [SHO] Digital Design of Signal Processing System by Dr Shoab A Khan 3. [STV] Advanced FPGA Design, Steve Kilts 4. Ercegovac’s Book: “Digital Arithmetic” 2004 5. Dr. Shoab A Khan’s CASE Lectures on Advanced Digital System Design

Material/Slides from these slides CAN be used with following citing reference: Dr. Rehan Hafiz: Advanced Digital System Design 2010 Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

Lectures: Contact: Office:

Tuesday @ 5:30-6:20 pm, Friday @ 6:30-7:20 pm By appointment/Email VISpro Lab above SEECS Library

Lecture Overview

3

Last Lecture

Two Operand Adders

This Lecture

Where

**& Why Multi Operand Addition Multi-Operand Addition <with Focus on Low Latency Design>
**

Carry

Save Adders Wallace Compression Tree Dada Compression Tree

**Multi-operand Addition
**

a[7] a[6] a[5] a[4] a[3] a[2] a[1] a[0]

a[7]+a[6]

a[5]+a[4]

a[3]+a[2]

a[1]+a[0]

a[7]+a[6]+a[5]+a[4]

a[3]+a[2]+a[1]+a[0]

a[7]+a[6]+a[5]+a[4]+a[3]+a[2]+a[1]+a[0]

**Example: Matrix Multiplication
**

5

**Each term requires:
**

Addition of 4 products

Where each product requires addition of 8 partial products as a result of multiplication (assuming 8 bit numbers)

A total of (28+3) additions

So above multiplication requires 465 additions in total

Dot Representation

6

Useful, simplified notation Useful when positioning or alignment of the bits, rather than there values, is important.

Each dot represents a digit in a positional number system. Dots in the same column have the same positional weight.

........ ........ ........ ........ ........ ........ ........ ........ ........ ........ ...............

**So should we use Ripple Carry Adders (RCA)/Carry Propagate Adders (CPAs) everywhere ?
**

7

**Addition, FIR Filtering, Matrix Multiplication
**

Accumulate

using cascaded CP-Adders

OR --- Just Delay the Computation of final result

Compression Trees

8

Counter (m,n) Carry Save Dual Carry Save Wallace Tree Dada Tree

(m,n) Counter

as Compression Trees / Term Reducers

9

An (m, n) counter takes as input m bits (all of the same power-of-2 weight) and produces an n-bit binary number whose value is the number of inputs that are equal to 1. Counts the number of 1s in the input and outputs the binary count value. Example: (3, 2) counter.

Of the 3 inputs, there can be either 0, 1, 2 or 3 inputs equal to 1. All four of these values can be represented as a 2-bit binary number. A (3, 2) counter is nothing but a full adder, where the sum is the LSB count output and the carry-out is the MSB count output

**Carry Save Adder
**

10

**Successively reduce 3 input vectors to 2 output vectors,
**

i.e. a sum vector and a carry vector. Each bit of these two vectors are computed independently of all other bits, and there is no carry propagation between adjacent bit positions.

**Implements a compression from 3 vectors X, Y and Z to 2 vectors S and C.
**

H.W: Parallel set of (3, 2) counters, i.e. a parallel set of full adders.

**Adding 3 8-bit numbers
**

11

Operand-1

0

1

1

0

1

1

1

0

Operand-2

Operand-3 Sum bits Carry bits

0

0 0 0 1

1

0 0 0

0

0 1 0

1

0 1 0

0

0 1 1

1

1 1 1

1

0 0 0

0

0 0

Now add the two operands using a Carry Propagation Adder

**1 Carry Save Adder (CSA) + 1 CPA
**

Instead

of 2 CPA

1 FA Delay per CSA

**Carry Save Adder in Dot Notation
**

12

**Example-8 bit addition of 4 numbers
**

13

carries 1 0 0

1 1 1

1 1 0

2 0 1

2 1 0

1 1 1

0 1 1 0 0

0

0 Sum 1

0

0 1

0

0 0

0

0 1

0

1 0

1

1 1

0

1 1

0

0 0

What can be done ?

CSA

0 0 0 0

1 1 0 0

1 0 0 1

0 1 0 1

1 0 0 1

1 1 1 1

1 1 0 0

0 0 0 0

14

0 CSA

0

1

0 1 0 1

0

0 0 0 0 0

0

1 0 0 1 0

0

1 0 0 1 1

1

1 1 1 1 1

1

1 1 1 1 0

0

0 0 1 1 0 0 0 0 0

0

0

CPA 0 0

1 0 1

0 0 1

1 0 0

1 1 1

1 1 0

1 0 1

1 0 1

0 0

15

**Carry Save Reduction
**

16

Level 0 shows all the PPs The carry save reduction scheme takes first 3 PP layers at Level 0 Reduces these to 2-PP layers using 3-2 and 2-2 compressors The rest of the partial product rows are not touched at this level of reduction The resulting two rows from the previous reduction level are then again added together with the PP4 row in the next level At each level, three partial product rows are added, resulting in two output rows The process continues until the array is reduced to no more than two rows The number of levels for N PP layers is N-2 (staring from Level 0)

**Carry Save Architecture
**

17

Level 1

HA

FA

FA

FA

FA

FA

HA

P0

Level 2

HA

FA

FA

FA

FA

FA

FA

P1

Level 3

HA

FA

FA

FA

FA

FA

FA

P2

Level 4

HA

FA

FA

FA

FA

FA

FA

P3

PC10

PS9 PC9

PS8 PC8

PS 7PC 7

PS 6PC 6

PS 5PC 5

PS4 PC4

PS 3

Free product bits

**Adding 6 Terms – Using DUAL CSA
**

18

The partial product row matrix is divided into two equal size groups Two Simultaneous operations on the partial product matrix The resulting 4 terms are again reduced using CSA Architecture

Wallace Tree

19

Can reduce N binary numbers to two numbers in O(log N) levels.

**Simultaneous CSA (Carry Save) operation is applied on all possible three TERMS (partial products) to be added
**

The same technique is repeated on this matrix

Level:0

Again grouping of these rows into three is done

**Each group is reduced to two rows simultaneously
**

This process continues until only two rows are left

Level:1

Level:2

The final rows are added together for the final product

Level:3

20

**Wallace Tree 7 Operands Example
**

21

**Wallace Tree – Architecture 6 Operands
**

22

**Compressing 6 operands P0, P1, …P5 to 2 vectors S and C. This can be done using 3 levels of CSAs.
**

The left arrow on some CSA inputs means that that vector is shifted left by one bit position to account for the fact that it is a carry vector output of a prior CSA.

**Wallace Tree – Architecture 8 Operands
**

23

**No. of Operands Vs. No. of Full Adder Levels
**

24

Number Operands 3 4

No. of full adder Levels 1 2

5n6 7n9

10 n 13 14 n 19 20 n 28 29 n 42 43 n 63

3 4

5 6 7 8 9

Assuming Level 0 = 1st level

DADA Trees

25

An area optimized Wallace tree for partial products ! Requires the same number of adder levels It uses less number of computational elements as compared to Wallace tree It corresponds to less power dissipation and less area

DADA Trees

26

Number Operands 3 4 5n6 7n9 10 n 13 14 n 19 20 n 28 29 n 42 43 n 63

No. of full adder Levels 1 2 3 4 5 6 7 8 9

For each column; reduce only to the extent so that the number of PP in the next level = to the maximum of the range of Number of Operands in the Wallace tree table

27

Need to compress these since > 6 Also need to take care of carries coming from previous column, e.g from 7th to 8th L 1 2 3 4 FAs 3 12 9 11 35 HAs 3 2 1 1 7

No Need to reduce these columns; since these already have 6 PP/operands

FAs 12 13 6 8 39

HA 4 3 4 3 14

Wallace DADA

Further Reading

28

Ron S. Waters, Earl E. Swartzlander, "A Reduced Complexity Wallace Multiplier Reduction," IEEE Transactions on Computers, vol. 59, no. 8, pp. 1134-1137, Apr. 2010,

QUIZ

29

Quiz – 20 Minutes

30

You have to design the architecture that is able to perform the addition of 4 8-bit numbers with minimum possible latency.

Questions Draw micro-architecture of your design [5 Marks] Report the latency of your design in terms of Gate Delays [3 Marks] Report the latency of your design in terms of Gate Delays if you were using Ripple Carry Adder for these 4 8-bit numbers[2 Marks]

Assumptions Assume single gate delay for all gates including XOR gate Assume 2 Gate Delay for a FA You do not need to keep care about FAN OUTs/Ins For your selection of adder; you just need to draw its block diagram

31

Lecture Ends !

Assignment-02

32

You have to explain the micro-architecture & write the Verilog Code for addition of 8 8-bit binary numbers for the following cases and compare their performances (as reported by Xilinx)

Ripple

**Carry Adder Carry Select Adder Wallace Compression Tree
**

Duration: 2 Weeks

Quiz

33