This action might not be possible to undo. Are you sure you want to continue?

John Morris

Chung-Ang University The University of Auckland

‘Iolanthe’ at 13 knots on Cockburn Sound, Western Australia

Multipliers

‘Long’ multiplication

x x x x x x x x x

x x x x x x x

x x x x x

x x x x x x x

}

x x x

multiplicand In binary, the partial products multiplier are trivial – if multiplier bit = 1, copy the partial multiplicand products else 0 Use an ‘and’ gate! product

Multipliers

‘Long’ multiplication

x

a3 b3 x x

a2 b2 x x

a1 a0 b1 b0 x x x

b0 b1 b2 b3

x x x x x x x x x x x x x

x

x

In binary, the partial products are trivial – if multiplier bit = 1, copy the multiplicand else 0 Use an ‘and’ gate! a3 a2 a1 a0

b0

first row of partial products

Multipliers – Simple binary multiplier We can add the partial products with FA blocks a3 a2 a1 a0 0 b0 FA FA FA FA b1 FA FA FA FA b2 FA FA FA FA p1 p0 product bits .

… but you need to fill in the PORT MAP using internal signals! . pb. cout : ARRAY( 0 TO n-1 ) OF ARRAY( 0 TO n-1 ) OF std_logic. This part is straight-forward! END GENERATE.Generate a row pjk : full_adder PORT MAP( … ).Parallel Array Adder .For each row FOR j IN 0 TO n-1 GENERATE –. END GENERATE.VHDL We can build this adder in VHDL with two GENERATE loops SIGNAL pa. FOR j IN 0 TO n-1 GENERATE -.

Multipliers – Adding partial products We can add the partial products with FA blocks a3 a2 a1 a0 0 b0 Optimization 1: Replace this row of FAs FA FA FA FA b1 FA FA FA FA Time? What’s the worst b2 case propagation delay? FA FA FA FA p1 p0 product bits .

Multipliers – Using carry save adders We can add the partial products with FA blocks a3 a2 a1 a0 0 Try to use a b0 more efficient adder A simpler scheme in each row? uses a ‘carry save’ FA FA FA FA adder – which b1 pushes the carry out’s down to the next row! FA FA FA FA b2 FA FA FA FA Note that an extra adder is needed below the last row to add the last partial products and p0 product bits p1 the carries from the row above! Carry select adder .

Tree Chris Wallace discovered a way to build fast multipliers by reducing the number of carry propagations – and thus the delay All the partial product bits can be generated directly from the operand bits A full adder adds 3 input bits to produce a 2 bit result Use it to add the bits in columns Produce pairs of ‘first level’ sums Combine bits in these sums vertically again ······ ······ ······ ······ ······ ······ ······ ······ · · ·· ·· ·· ·· · · · ·· ·· · · · · ·· · ·· · Combine pp bits vertically! 3 at a time First level results Pairs of bits from FA cells .Multipliers .

Tree Summing the partial products ······ ······ ······ ······ ······ ······ ······ ······ · · ·· ·· ·· ·· · · · ·· ·· · · · · · · · ·· · So combine them vertically! First level results .Multipliers .

then we have a redundant system We’re using more bits than the minimum log2n needed to represent a number of magnitude. These redundant number systems generally have the ability to avoid carry propagation This may be exploited in the addition of sequences of numbers Carries are transferred to the following addition Concept similar to that used in carry-save multiplier where carries are transferred to the following partial product addition .Signed digit arithmetic – Avoiding the carries! Terminology First. n. one bit is used to represent each binary digit (0 or 1) of a number However. we need to distinguish carefully between digits of a number and bits used in representing the number In the standard binary representations. we can use other representation schemes … If we use more than one bit to represent each digit of an operand.

Booth Recoding A binary number can be re-coded according to Booth’s scheme to reduce the number of partial products in a multiplier Original idea Early computers: shift much faster than add Observe than when there is a 0 in the multiplier. the ability to skip some additions reduces the average completion time Booth observed that when there is a long sequence of 1s. then 2j + 2j+1 + … +2k-1 + 2k = 2j+1 – 2k . you can skip the addition and just shift the multiplicand In a synchronous computer. you still have to perform an add for each digit of the multiplier (all or most of them are 1’s) but in an asynchronous computer. eg digits j through (down to) k are 1s. this doesn’t help – in the worst case.

1} to one using {-1. add The recoding can be done in O(1) time by inspecting neighbouring digits .0. skip.Booth Recoding A binary number can be re-coded according to Booth’s scheme to reduce the number of partial products in a multiplier Booth recoding Booth observed that when there is a long sequence of 1s.1} – corresponding to subtract. then 2j + 2j+1 + … +2k-1 + 2k = 2j+1 – 2k Thus the sequence of additions can be replaced by An addition of the multiplicand shifted by j+1 positions and A subtraction of the multiplicand shifted by k positions This is equivalent to recoding the multiplier from a representation using {0. eg digits j through (down to) k are 1s.

inspect xj and xj-1 to determine the bits (2 needed!) of yj Example x: 1 0 0 1 1 1 0 1 1 0 1 0 1 1 1 y: -1 0 1 0 0 -1 1 0 -1 1 -1 1 0 0 -1 In practice. this scheme is no use in a synchronous machine.Booth Recoding Booth’s scheme Radix-2 Booth recoding xj 0 0 1 1 xj-1 0 1 0 1 yj 0 1 -1 0 Note No 1’s End of a string of 1’s .add Start of a string of 1’s . Worst case: sequence of alternating 0 1 More additions than necessary! 0 (0) 0 but if we use a higher radix Booth recoding . j.skip For each position.subtract Middle of a string of 1’s .

Higher Radix Multiplication Radix-2 multiplier Use 1 bit of the multiplier at a time Form partial product with and gates Radix-4 multiplier Use 2 bits of the multiplier at a time If A is the multiplicand . Multiplier bits Operation 00 01 10 11 none +A +2A (shift A) +3A (precompute A+2A?) Radix-4 Booth recoding … ..

and.Radix-4 Booth Recoding Recode multiplier into a signed digit form Use 3 bits of the original multiplier at a time Recoded multiplier has half the number of digits. start new one -A Start of 1’s string Middle of 1’s n/2 partial products generated Potentially 2× speed! 1 1 1 0 1 1 0 -1 .2] Operands to the adders are now formed by shifts alone Recode Constant time Partial products Shift. select x2j+1 0 0 0 0 1 1 x2j 0 0 1 1 0 0 x2j-1 0 1 0 1 0 1 yj 0 1 1 2 -2 -1 Operation No 1’s +A End of 1’s string +A Isolated 1 +2A End of 1’s string -2A Beginning of 1’s -A End one string. but each digit is in [-2.

No carries at all? Residue Number Systems .

5 and 3. 3 and 2 when divided by the numbers 7. over 1500 years ago posed this problem What number has remainders 2. is represented by the list of its residues (remainders) with respect to k relatively prime moduli. m0 Thus x is represented by (xk-1. Sun Tsu.Residue Arithmetic Residue Number Systems A verse by the Chinese scholar. respectively? This is probably the first documented use of number representations using multiple residues In a residue number system. ….3)? . x0) where xi = x mod mi So the puzzle may be re-written What is the decimal representation of (2. …. a number.3. xk-2. mk-1.5. mk-2.2) in RNS(7. x.

2.4.5.0) represents (105 1 + 120 2 336 4 + 280 0)840 = (1689)840 = 9 .0.5.3) Decimal (0.3).7. the weights are: 105 120 336 280 Thus (1.2) 0 or 840 or -840 or … 1 or 841 or -839 or … 2 or 842 or … 8 or 848 or … Any RNS can be viewed as a weighted representation In RNS(8.0) (1.2.3) M = 8 7 5 3 = 840 Thus we have RNS(8.2.Residue Number Systems The dynamic range of a RNS. M = mk-1 mk-2 … m0 For example.3.7.1.1) (2.1. in the system RNS(8.2) (0.1.5.7.0.

5-1. 4 . 4 . 2 )RNS ( 7 . 2 )RNS = 510 ( 7 . 0 . 4 .0. 1 )RNS Multiplication is also achieved by operations on each digit ( 5 .1. 5 .0) = 510 = -110 = 410 = 410 Addition or subtraction is performed on each digit ( 5 . (2+2)=13)RNS ( 4 .0) so -21 = (8-5. 0 .0. 4 .Residue Number Systems . 2 )RNS ( (5+7)=48. 0 . (5+6)=47. (2x2)=13)RNS = -510 ( 3 . (5x6)=27. 1 )RNS = -510 .4. 2 .0) = (3. 0 . 2 )RNS = -110 ( (5x7)=38. 5 . 4 . 6 .0.Operations Complement To find –x. complement each of the digits with respect to the modulus for that digit 21 = (5. 6 .

Residue Arithmetic .Advantages Parallel independent operations on small numbers of digits Significant speed ups Especially for multiplication! 4 bit x 4 bit multiplier (moduli up to 15) much simpler than 16 bit x 16 bit one Carries are strictly confined to small numbers of bits Each modulus is only a small number of bits Can be implemented in Look Up Tables (LUTs) 6 bit residues (moduli up to 64) 64 x 64 x 6 bits required (<4Kbytes) .

2).11.3.Residue Arithmetic – Choosing the moduli Largest modulus determines the overall speed – Try to make it as small as possible Simple strategy Choose sequence of prime numbers until the dynamic range.5.3. so add one more modulus: RNS(17. M = 102.7.13. ie M 105 For RNS(13.510 Now • each modulus requires a separate circuit and • our range is now ~5 times as large as needed. M = 30.11.13.2).2).300 Range is too low. M = 510.11.102 Six residues. so remove 5: RNS(17. becomes large enough eg Application requires a range of at least 105.7. so … .5.3. M. requiring 5 + 4 + 4 + 3 + 2 + 1 = 19 bits The largest modulus (17 requiring 5 bits) determines the speed.7.

ie M 105 … RNS(17.13.11.2).17. requiring 5 + 4 + 4 + 3 + 2 + 1 = 19 bits The largest modulus (17 requiring 5 bits) determines the speed. but 2 fewer ALUs!) Better …? .102 Four residues. so combine some of the smaller moduli (Remember the requirement is that they be relatively prime!) Try to produce the largest modulus using only 5 bits – Pair 2 and 13. M = 102.102 Six residues. 3 and 7 RNS(26.7.21.Residue Arithmetic – Choosing the moduli Application requires a range of at least 105.3. requiring 5 + 5 + 5 + 4 = 19 bits (no improvement in total bit count. M = 102. 11).

M = 360. starting with RNS(3.5).3).32. 11).11.23.102 Four residues. M = 102.720 Add 13 RNS(13. ie M 105 … RNS(26.520 Add 11 RNS(11.7. so move to RNS(22.7. M = 2. note that 23 and 32 are smaller than 11: RNS(32.21. M = 27.5). 5.Residue Arithmetic – Choosing the moduli Application requires a range of at least 105. M = 6 Note that 22 is smaller than the next prime.5).23. M = 12 (trying to minimize the size of the largest modulus) After including 5 and 7.2).23.32.7.17. requiring 5 + 5 + 5 + 4 = 19 bits (no improvement in total bit count.360 . but 2 fewer ALUs!) Include powers of smaller primes before primes.

32.Residue Arithmetic – Choosing the moduli Application requires a range of at least 105.5).7). Parhami. 4 + 4 + 4 + 3 + 3 = 18 bits.11.360 M is now 3 larger than needed. Oxford University Press. so replace 9 with 3. ie M 105 … Add 13 RNS(13.11. Computer Arithmetic: Algorithms and Hardware Designs. 2000 .23.23.360 5 moduli. largest modulus has 4 bits You can actually do somewhat better than this! Reference: B. M = 360. M = 360.7.13. then combine 5 and 3 RNS(15.

Conversion Inputs and outputs will invariably be in standard binary or decimal representations. used by the RNS . y.Residue Numbers . mi. mi Divisions would be too time-consuming! Use this equality: (yk-1yk-2…y1y0)2mi = 2k-1yk-1 mi + … + 2y1 mi + y0 mi mi So we only need to precompute the residues 2 j mi for each of the moduli. find its residues wrt moduli. conversion to and from them is required Conversion from binary | decimal to RNS Problem: Given a number.

3) : • <y>8 is trivially calculated (3 LSB bits) • For 7. 5 and 3. 5 and 3 j 0 1 2 2j 1 2 4 2 j 7 1 2 4 2 j 5 1 2 4 2 j 3 1 2 1 3 4 5 6 8 16 32 64 1 2 4 1 3 1 2 4 2 1 2 1 7 8 9 128 256 512 2 4 1 3 1 2 2 1 2 .Residue Numbers .5.7. we need the powers of 2 modulus 7.Conversion For RNS(8.

3) : • <164>8 is 1002 = 410 j 0 1 2 2j 1 2 4 2 j 7 1 2 4 2 j 5 1 2 4 2 j 3 1 2 1 <164>7 = <2 + 4 + 4>7 = <10>7 = 3 3 4 5 6 8 16 32 64 1 2 4 1 3 1 2 4 2 1 2 1 Note that the additions are done in a modular adder! Worst case: 7 8 9 128 256 512 2 4 1 3 1 2 2 1 2 k additions for each residue for a k -bit number .5.Residue Numbers .Conversion Find 16410 = 1010 01002 = 27 + 25 + 22 in RNS(8.7.

x1.Residue Numbers . x2. x0)RNS = x3 × 105 + x2 × 120 + x1 × 336 + x0 × 280 i .3) the weightings are 105 120 336 280 The weightings may be calculated using the Chinese Remainder Theorem x = (xk-1xk-2 … x1x0)RNS = S Mi aixim M where i Mi = M / mi and ai = < Mi-1>m is the multiplicative inverse of Mi wrt mi This means that (x3. eg for RNS(8.7.Conversion Conversion from RNS to binary Digits of an RNS representation can be shown to have position weightings.5.

5.7. eg for RNS(8.5.Conversion Conversion from RNS to binary Digits of an RNS representation can be shown to have position weightings. x1. this requires only Sk-1i=0 mi words – a reasonable number! .3) requires only 8 + 7 + 5 + 3 = 23 entries In general. x2.7. x0)RNS = x3 × 105 + x2 × 120 + x1 × 336 + x0 × 280 This is most efficiently done through a LUT Note that the table for RNS(8.3) the weightings are 105 120 336 280 Calculate position weights with CRT … This means that (x3.Residue Numbers .

Disadvantages Range is limited Division is hard! Comparison <. Fourier transforms . x Range is limited Result range is known Examples: digital filters. >. sign (<0?) are hard Still suitable for some DSP applications Only use +.Residue Arithmetic .

Sign up to vote on this title

UsefulNot useful- CO UNIT4 19-8
- vedic multiplier6
- A Radix-10 Combinational Multiplier
- Multipliers
- dflpMult
- CompArchCh03L06ArrayMult
- Efficient Implementation of 16-Bit Multiplier-Accumulator Using Radix-2 Modified Booth Algorithm and SPST Adder Using Verilog
- Booths Algorithm
- ppt
- 8 Document
- Booth Algorithm
- ICNVS-44
- Project Report
- 31i9-Design and Implementation
- IJAIEM-2015-03-08-14
- 10 Research Paper Sneha Irdindia(1)
- Computer Artihmetic CO
- ch4b
- j z 3418011804
- Multiplier Design Based on Ancient Indian Vedic
- Fixed Point Arithmetics 1
- San to Sh
- IJCSET12-03-04-055
- 컴퓨터구조-10장
- CSE2304_Lecture08_chp03(3)
- Paper 9
- 00140862
- multiply
- Sequential Shift Add-Method
- IEEE2010_DSP
- Multiplier s