1 views

Uploaded by Rajesh Dvg R

save

You are on page 1of 19

**ECE666/Koren Part.6c.1
**

Israel Koren

Spring 2008

UNIVERSITY OF MASSACHUSETTS

Dept. of Electrical & Computer Engineering

Digital Computer Arithmetic

ECE 666

Part 6c

High-Speed Multiplication - III

Copyright 2008 Koren

ECE666/Koren Part.6c.2

Array Multipliers

+The two basic operations - generation and

summation of partial products - can be merged,

avoiding overhead and speeding up multiplication

+Iterative array multipliers (or array multipliers)

consist of identical cells, each forming a new

partial product and adding it to previously

accumulated partial product

-Gain in speed obtained at expense of extra

hardware

-Can be implemented so as to support a high

rate of pipelining

Copyright 2008 Koren

ECE666/Koren Part.6c.3

Illustration - 5 x 5 Multiplication

+Straightforward implementation -

-Add first 2 partial products

(a4x0, a3x0,…,a0 x0 and a4x1, a3x1,…,a0x1)

in row 1 after proper alignment

-The results of row 1 are then added to

a4x2, a3x2,…,a0x2 in row 2, and so on

Copyright 2008 Koren

ECE666/Koren Part.6c.4

5 x 5 Array Multiplier for

Unsigned Numbers

+Basic cell - FA accepting

one bit of new partial

product aixj

+ one bit of previously

accumulated partial

product

+ carry-in bit

+No horizontal carry propagation in first 4 rows - carry-save type

addition - accumulated partial product consists of intermediate

sum and carry bits

+Last row is a ripple-carry adder - can be replaced by a fast

2-operand adder (e.g., carry-look-ahead adder)

Copyright 2008 Koren

ECE666/Koren Part.6c.5

Array Multiplier for Two’s

Complement Numbers

+Product bits like a4x0 and

a0x4 have negative weight

+Should be subtracted

instead of added

Copyright 2008 Koren

ECE666/Koren Part.6c.6

Type I and II Cells

+Type I cells: 3 positive inputs - ordinary FAs

+Type II cells: 1 negative and 2 positive inputs

+Sum of 3 inputs of type II cell can vary from -1 to 2

- c output has weight +2

- s output has weight -1

+Arithmetic operation of type II cell -

+s and c outputs given by

II

Copyright 2008 Koren

ECE666/Koren Part.6c.7

Type I’ and II’ Cells

+Type II' cells: 2 negative inputs and 1 positive

+Sum of inputs varies from -2 to 1

- c output has weight -2

- s output has weight +1

+Type I' cell: all negative inputs -

has negatively weighted c and s outputs

+Counts number of -1's at its inputs - represents

this number through c and s outputs

+Same logic operation as type I cell - same gate

implementation

+Similarly - types II and II' have the same gate

implementation

II’

Copyright 2008 Koren

ECE666/Koren Part.6c.8

Booth’s Algorithm Array Multiplier

+For two's complement operands

+n rows of basic cells - each row capable of adding

or subtracting a properly aligned multiplicand to

previously accumulated partial product

- Cells in row i perform an add, subtract or transfer-only

operation, depending on xi and reference bit

+4-bit operands

+Controlled add/

subtract/shift

(CASS)

Copyright 2008 Koren

ECE666/Koren Part.6c.9

Controlled add/subtract/shift - CASS

+H and D: control signals indicating type of operation

+H=0: no arithmetic operation done

+H=1: arithmetic operation performed - new Pout

- Type of arithmetic operation indicated by D signal

- D=0: multiplicand bit, a, added to Pin with cin as incoming

carry - generating Pout and cout as outgoing carry

- D=1: multiplicand bit, a, subtracted from Pin with incoming

borrow and outgoing borrow

+Pout=Pin©(a H)©(cin H)

cout=(Pin©D)(a+cin) + a cin

+Alternative: combination of

multiplexer (0, +a and -a) and FA

+H and D generated by CTRL -

based on xi and reference

bit x{i-1}

Copyright 2008 Koren

ECE666/Koren Part.6c.10

Booth’s Algorithm Array Multiplier - details

+First row - most significant bit of multiplier

+Resulting partial product need be shifted left before

adding/subtracting next multiple of multiplicand

+A new cell with input Pin=0 is added

+Number of bits in

partial product

increases by one

each row - expand

multiplicand before

adding/subtracting it

+Accomplished by

replicating sign bit

of multiplicand

Copyright 2008 Koren

ECE666/Koren Part.6c.11

Properties and Delay

+Cannot take advantage of strings of 0's or 1's -

cannot eliminate or skip rows

+Only advantage: ability to multiply negative numbers

in two's complement with no need for correction

+Operation in row i need not be delayed until all upper

(i-1) rows have completed their operation

+P0, generated after one CASS delay (plus delay of

CTRL), P1 generated after two CASS delays, and

P{2n-2}, generated after (2n-1) CASS delays

+Similarly can implement higher-radix multiplication

requiring less rows

+Building block: multiplexer-adder circuit that selects

correct multiple of multiplicand A and adds it to

previously accumulated partial product

Copyright 2008 Koren

ECE666/Koren Part.6c.12

Pipelining

+Important characteristic of array multipliers - allow

pipelining

+Execution of separate multiplications overlaps

+The long delay of carry-propagating addition must be

minimized

+Achieved by replacing CPA with several additional

rows - allow carry propagation of only one position

between consecutive rows

+To support pipelining, all cells must include latches -

each row handles a separate multiplier-multiplicand

pair

+Registers needed to propagate multiplier bits to their

destination, and propagate completed product bits

Copyright 2008 Koren

ECE666/Koren Part.6c.13

Pipelined Array

Multiplier

Copyright 2008 Koren

ECE666/Koren Part.6c.14

Optimality of Multiplier Implementations

+Bounds on performance of algorithms for

multiplication

+Theoretical bounds for multiplication similar to those

for addition

+Adopting the idealized model using (f,r) gates:

+Execution time of a multiply circuit for two operands

with n bits satisfies

+Tmult > log f 2n (

+If residue number system is employed:

+Tmult > log f 2m (

+m - number of digits needed to represent largest

modulus in residue number system

Copyright 2008 Koren

ECE666/Koren Part.6c.15

Optimal Implementations

+Need to compare performance (execution time)

and implementation costs (e.g., regularity of

design, total area, etc.)

+Objective function like A T can be used

+A - area and T - execution time

+A more general objective function: A T

- o can be either smaller or larger than 1

o

Copyright 2008 Koren

ECE666/Koren Part.6c.16

Basic Array Multiplier

+Very regular structure - can be implemented as a

rectangular-shaped array - no waste of chip area

+n least significant bits of final product are produced

on right side of rectangle; n most significant bits are

outputs of bottom row of rectangle

+Highly regular and simple layout but has two

drawbacks:

- Requires a very large area, proportional to n², since it

contains about n² FAs and AND gates

- Long execution time T of about 2 n AFA (AFA - delay of FA)

+More precisely, T consists of (n-1)AFA for first (n-1)

rows and (n-1)AFA for CPA (ripple-carry adder)

+AT is proportional to n³

Copyright 2008 Koren

ECE666/Koren Part.6c.17

Pipelined & Booth Array Multipliers

+Required area increases even further (CPA replaced)

+Latency of a single multiply operation increases

+However, pipeline period (¬ pipeline rate) shorter

+Booth based array multiplier offers no advantage

- A - order of n² and T - linear in n

+Radix-4 Booth can potentially be better - only n/2

rows - could reduce T and A by factor of two

+However, actual delay & area higher - recoding logic

and, more importantly, partial product selectors, add

complexity & interconnections - longer delay per row

+Also, since relative shift between adjacent rows is two

bits, must allow carry to propagate horizontally

- Can be achieved locally or in last row - then carry propagation

through 2n-1 bits (instead of n-1)

+Exact reduction depends on design and technology

Copyright 2008 Koren

ECE666/Koren Part.6c.18

Radix-8 Booth & CSA Tree

+Similar problems with radix-8 Booth's array

multiplier

- In addition, 3A should be precalculated

- Reduction in delay and area may be less than expected 1/3

- Still, may be cost-effective in certain technologies and

design styles

+Partial products can be accumulated using a cascade

or a tree structure with shorter execution time

+But CSA tree structures have irregular interconnects

- no area-efficient layout with a rectangular shape

+Moreover - overall width 2n usually required -

multiplier area of order 2n log k

+AT may increase as 2n log² k

Copyright 2008 Koren

ECE666/Koren Part.6c.19

Delay of Balanced Delay Tree

+Balanced delay tree - more regular structure

- Increments in number of operands - 3,3,5,7,9...

- Sum of series - order of k=j (j - number of elements

in series, k - number of operands)

+Number of levels - determines overall delay -

linear in j= \ k

+Compare to log k - number of levels in complete

binary tree

+Proof: exercise

+Above expressions - theoretical, limited practical

significance

+Detailed analysis of alternative designs is

necessary for specific technology

2

2 Copyright 2008 Koren . avoiding overhead and speeding up multiplication Iterative array multipliers (or array multipliers) consist of identical cells. each forming a new partial product and adding it to previously accumulated partial product Gain in speed obtained at expense of extra hardware Can be implemented so as to support a high rate of pipelining ECE666/Koren Part.Array Multipliers The two basic operations .generation and summation of partial products .can be merged.6c.

….6c. a3x0.a0 x0 and a4x1.…. a3x2.3 Copyright 2008 Koren . and so on ECE666/Koren Part.Illustration . a3x1.a0x2 in row 2.5 x 5 Multiplication Straightforward implementation Add first 2 partial products (a4x0.a0x1) in row 1 after proper alignment The results of row 1 are then added to a4x2.….

6c.g.FA accepting one bit of new partial product aixj + one bit of previously accumulated partial product + carry-in bit No horizontal carry propagation in first 4 rows .5 x 5 Array Multiplier for Unsigned Numbers Basic cell . carry-look-ahead adder) ECE666/Koren Part.can be replaced by a fast 2-operand adder (e.carry-save type addition .4 Copyright 2008 Koren ..accumulated partial product consists of intermediate sum and carry bits Last row is a ripple-carry adder .

6c.Array Multiplier for Two’s Complement Numbers Product bits like a4x0 and a0x4 have negative weight Should be subtracted instead of added ECE666/Koren Part.5 Copyright 2008 Koren .

6c.Type I and II Cells Type I cells: 3 positive inputs .ordinary FAs Type II cells: 1 negative and 2 positive inputs Sum of 3 inputs of type II cell can vary from -1 to 2 c output has weight +2 s output has weight -1 Arithmetic operation of type II cell - II s and c outputs given by ECE666/Koren Part.6 Copyright 2008 Koren .

represents this number through c and s outputs Same logic operation as type I cell .types II and II' have the same gate implementation ECE666/Koren Part.7 Copyright 2008 Koren .6c.same gate implementation Similarly .Type I’ and II’ Cells Type II' cells: 2 negative inputs and 1 positive Sum of inputs varies from -2 to 1 c output has weight -2 s output has weight +1 II’ Type I' cell: all negative inputs - has negatively weighted c and s outputs Counts number of -1's at its inputs .

each row capable of adding or subtracting a properly aligned multiplicand to previously accumulated partial product Cells in row i perform an add. For two's complement operands n rows of basic cells . depending on xi and reference bit Booth’s Algorithm Array Multiplier 4-bit operands Controlled add/ subtract/shift (CASS) ECE666/Koren Part.8 Copyright 2008 Koren .6c. subtract or transfer-only operation.

a.generating Pout and cout as outgoing carry D=1: multiplicand bit.9 Copyright 2008 Koren . subtracted from Pin with incoming borrow and outgoing borrow Pout=Pin(a H)(cin H) cout=(PinD)(a+cin) + a cin Alternative: combination of multiplexer (0. added to Pin with cin as incoming carry .6c.CASS H and D: control signals indicating type of operation H=0: no arithmetic operation done H=1: arithmetic operation performed . a. +a and -a) and FA H and D generated by CTRL based on xi and reference bit x{i-1} ECE666/Koren Part.new Pout Type of arithmetic operation indicated by D signal D=0: multiplicand bit.Controlled add/subtract/shift .

6c.Booth’s Algorithm Array Multiplier .most significant bit of multiplier Resulting partial product need be shifted left before adding/subtracting next multiple of multiplicand A new cell with input Pin=0 is added Number of bits in partial product increases by one each row .details First row .expand multiplicand before adding/subtracting it Accomplished by replicating sign bit of multiplicand ECE666/Koren Part.10 Copyright 2008 Koren .

P1 generated after two CASS delays. generated after one CASS delay (plus delay of CTRL).Properties and Delay Cannot take advantage of strings of 0's or 1's cannot eliminate or skip rows Only advantage: ability to multiply negative numbers in two's complement with no need for correction Operation in row i need not be delayed until all upper (i-1) rows have completed their operation P0. and P{2n-2}.6c.11 Copyright 2008 Koren . generated after (2n-1) CASS delays Similarly can implement higher-radix multiplication requiring less rows Building block: multiplexer-adder circuit that selects correct multiple of multiplicand A and adds it to previously accumulated partial product ECE666/Koren Part.

allow pipelining Execution of separate multiplications overlaps The long delay of carry-propagating addition must be minimized Achieved by replacing CPA with several additional rows .allow carry propagation of only one position between consecutive rows To support pipelining. all cells must include latches each row handles a separate multiplier-multiplicand pair Registers needed to propagate multiplier bits to their destination.6c.Pipelining Important characteristic of array multipliers .12 Copyright 2008 Koren . and propagate completed product bits ECE666/Koren Part.

6c.Pipelined Array Multiplier ECE666/Koren Part.13 Copyright 2008 Koren .

6c.r) gates: Execution time of a multiply circuit for two operands with n bits satisfies Tmult log f 2n If residue number system is employed: Tmult log f 2m m .number of digits needed to represent largest modulus in residue number system ECE666/Koren Part.14 Copyright 2008 Koren .Optimality of Multiplier Implementations Bounds on performance of algorithms for multiplication Theoretical bounds for multiplication similar to those for addition Adopting the idealized model using (f.

regularity of design. etc.6c.g.Optimal Implementations Need to compare performance (execution time) and implementation costs (e.15 Copyright 2008 Koren .) Objective function like A T can be used A .execution time A more general objective function: A T can be either smaller or larger than 1 ECE666/Koren Part. total area..area and T .

6c.can be implemented as a rectangular-shaped array . T consists of (n-1)FA for first (n-1) rows and (n-1)FA for CPA (ripple-carry adder) AT is proportional to n³ ECE666/Koren Part. n most significant bits are outputs of bottom row of rectangle Highly regular and simple layout but has two drawbacks: Requires a very large area.delay of FA) Copyright 2008 Koren . since it More precisely. proportional to n².16 contains about n² FAs and AND gates Long execution time T of about 2 n FA (FA .Basic Array Multiplier Very regular structure .no waste of chip area n least significant bits of final product are produced on right side of rectangle.

Required area increases even further (CPA replaced) Latency of a single multiply operation increases However. must allow carry to propagate horizontally Can be achieved locally or in last row . since relative shift between adjacent rows is two bits.17 Copyright 2008 Koren through 2n-1 bits (instead of n-1) . actual delay & area higher .longer delay per row Also.6c.order of n² and T .only n/2 rows .linear in n Radix-4 Booth can potentially be better .then carry propagation Exact reduction depends on design and technology ECE666/Koren Part. pipeline period ( pipeline rate) shorter Booth based array multiplier offers no advantage Pipelined & Booth Array Multipliers A .recoding logic and.could reduce T and A by factor of two However. add complexity & interconnections . more importantly. partial product selectors.

18 Copyright 2008 Koren .Radix-8 Booth & CSA Tree Similar problems with radix-8 Booth's array multiplier In addition. may be cost-effective in certain technologies and design styles Partial products can be accumulated using a cascade or a tree structure with shorter execution time But CSA tree structures have irregular interconnects .no area-efficient layout with a rectangular shape Moreover .6c.overall width 2n usually required multiplier area of order 2n log k AT may increase as 2n log² k ECE666/Koren Part. 3A should be precalculated Reduction in delay and area may be less than expected 1/3 Still.

limited practical significance Detailed analysis of alternative designs is necessary for specific technology ECE666/Koren Part.order of k=j 2 (j . Sum of series .7.theoretical.9.6c. k ..number of operands) Increments in number of operands .3.19 Copyright 2008 Koren .Delay of Balanced Delay Tree Balanced delay tree .5.determines overall delay - linear in j= k Compare to log k .3..number of levels in complete binary tree Proof: exercise Above expressions .more regular structure in series.number of elements Number of levels .

- A New Architecture for 2’s Complement Gray Encoded Array MultiplierUploaded byPravin Wilfred
- c166sv1 MacUploaded bySairakshitHarathas
- Chapter 3Uploaded bytesfu
- Array MultiplierUploaded byVarsha Lalapura
- multiplicationUploaded byapi-296427690
- LN18 ArithmeticUploaded by謝君揚
- 10. Article AZOJETE Vol 11 114-119 DibalUploaded byOyeniyi Samuel Kehinde
- 00660163Uploaded byMathew George
- Verilog OperatorsUploaded bySyed Ashmad
- prUploaded byPArk100
- Com Bi National MultiplierUploaded byDon Khan
- unit lesson plansUploaded byapi-270818785
- mathlessonplanUploaded byapi-259580000
- Atmel AVR201 Using the AVR Hardware MultiplierUploaded byAlex Ordaz
- eng_ca11.pdfUploaded bySulochana Jangra
- Computer ArithmeticUploaded bySailaja Vangimalla
- lesson3-numberlineUploaded byapi-282169441
- unit 1 1 acc study guideUploaded byapi-270891801
- vlsiUploaded byChidambaram Kct
- 456 DocumentationUploaded bySu Rya Reddy
- weekly20overview49janegoodallUploaded byapi-84160735
- Booth EncoderUploaded byThahsin Thahir
- Tips for Math CalculationUploaded byMayur Sharma
- IRJET-Power Efficient Carry Select Adder using D-LatchUploaded byIRJET Journal
- 19371_chapter10-computer arithmatic.pptUploaded byGeorgian Chail
- clarification of expectations regarding basic num facts and strategiesUploaded byapi-265143818
- Presentation1Uploaded byAnonymous rcmeSfj5
- EC304Uploaded byapi-3853441
- lesson plan10Uploaded byapi-310794656
- Hardware Slides 13Uploaded byTaqi Shah