# VIF COLLEGE OF ENGINEERING & TECHNOLOGY ( Himayat Nager ,gandipet x road

)

(Affiliated to Jawaharlal Nehru Technological University)

logo

P.G.DEPARTMENT OF ECE

Seminar Report On

-------------------------------------Submitted By

Name: Roll No: Branch:

**VIF COLLEGE OF ENGINEERING & TECHNOLOGY
**

(---------)

Logo

P.G.DEPARTMENT OF ECE

Certificate

This is to certify that Mrs. --------------------------------bearing H.T.No. -------- has satisfactorily completed the course of Seminar entitled --------------SEMISTER of 2011. prescribed by JNTU for the I M.Tech( Branch) during the academic year 2010-

Faculty In-charge

Head of the department

Acknowledgement
.

Thus the proposed multiplier outperforms the conventional multiplier in terms of power and speed efficiencies
. the proposed multiplier maintains an acceptable output quality with enough accuracy when truncation is performed. Thus it provides a flexible arithmetic capacity and a tradeoff between output precision and power consumption. Thus the ineffective circuitry can be efficiently deactivated. Furthermore. single 8-bit multiplication or twin parallel 8-bit multiplication. thereby reducing power consumption and increasing the speed of operation. The output product can be truncated to further decrease power consumption and increase speed by sacrificing a bit of output precision. The approach also dynamically detects the input range of multipliers and disables the switching operation of the non effective ranges.Abstract
This paper provides a detailed study of a configurable multiplier optimized for low power and high speed operations and which can be configured either for single 16-bit multiplication operation.

List of Tables
.

List of Figures
.

Test File Verilog Code References
.Table of Content
Page No
Acknowledgement Abstract List of Tables List of Figures Chapter 1: Introduction Chapter 2: Multipliers Chapter 3: Booths Algorithm Chapter 4: Timing and Area Analysis Chapter 5: Conclusion Appendix A.

This algorithm. it required five hardware components.1. is the basis for the pen and paper algorithm. The first multiplication algorithm that was developed for the early computing requirements follow the steps that we use to multiply two numbers by hand [1].1 – First Multiplication Hardware Implementation
As seen in Figure 1. multiplier. as seen in Figure 1. when this algorithm was translated for computer use. Many multimedia and DSP applications are highly multiplication intensive so that the performance and power consumption of these systems are dominated by multipliers. and product). the multiplication of each digit to multiplicand translates to shifting and adding of the multiplicand. and short design cycle. The algorithm involves multiplying each digit of the multiplier with the multiplicand and adding up the individual results [1]. which in the CMOS circuit design requires many switching activities. an ALU. have become increasingly popular over the past few years. The multiplier is then shifted to the right to fetch the next bit and multiplicand is shifted to the left to prepare for the next multiplication iteration. Therefore.2 [1]. If the LSB is 1.
Figure 1.
. it will send a signal to the ALU to add the multiplicand to the current calculated product. the control tests the multiplier’s least significant bit (LSB). energy-efficient multiplier is greatly desirable for many multimedia applications. Thus. Since binary multiplication involves only 1s and 0s.1. which typically require flexible processing ability. The components included one register for each number (multiplicand. shown as a flow chart in Figure 1. The computation of the multipliers manipulates two input data to generate many partial products for subsequent addition operations. switching activity within the functional unit requires for majority of power consumption and also increases delay. and a control. minimizing the switching activities can effectively reduce power dissipation and increase the speed of operation without impacting the circuit’s operational performance. Besides. low power consumption. According to Patterson and Hennessy [1].Chapter 1 Introduction
Portable multimedia and digital signal processing (DSP) systems.

or twin parallel 8-b multiplication operations but also offer a flexible tradeoff between output accuracy and power consumption to achieve more power savings. power efficiency and structural flexibility.Figure 1. truncation etc. and the truncation technique to design a high speed and power-efficient configurable BM (CBM). Several techniques are available [1] – [3] to improve the speed and power efficiency is analyzed. based on the idea that computers are faster at shifting bits than adding them [1].2 – First Multiplication Algorithm Flowchart for 32-bit Numbers
Many tried to make several improvements to the traditional pen and paper algorithm by reducing the amount of additions being performed in the algorithm. The main concerns are speed. Here attempt is made to combine configuration. In 1951. The proposed multiplier not only perform single 16-b. partially guarded computation. signal gating. There were many such discoveries through the years to improve the efficiency and performance of the multiplication algorithms. Approaches termed guarded evaluation. clock gating. Techniques in [5] that can dynamically adjust two voltage supplies based on the range of the incoming operands and disable ineffective ranges with a zero-detection circuitry were presented to decrease the power
. single 8-b. reduce the power consumption and increase the speed of multipliers by eliminating spurious computations according to the dynamic range of the input operands. Andrew Donald Booth developed an algorithm known as Booth’s algorithm. The work in [4] separated the arithmetic units into the most and least significant parts and turned off the most significant part when it did not affect the computation results to save power.

significant power saving can be achieved by directly omitting the adder cells for computing the least significant bits of the output product. data-dependent error compensation approaches [8] – [10] were developed to achieve better accuracy than that of the constant schemed were in data dependent error compensation values will be added to reduce the truncation error of array and Booth multipliers (BMs).
Chapter 2
. or twin parallel 8-b multiplication operations. Most common multimedia and DSP applications are based on 8–16-b operands. The one with the smaller dynamic range is processed to generate booth encoding so that partial products have a greater opportunity to be zero. The experimental results demonstrate that the proposed multiplier can provide various configurable characteristics for multimedia and DSP systems and achieve more power savings with slight area overhead. constant error compensation values were pre-computed and added to reduce the truncation error. but large truncation errors are introduced. Furthermore. In the constant scheme [7]. and the truncation technique to design a power-efficient configurable BM (CBM). which add the estimated compensation carries to the carry inputs of the retained adder cells to reduce the truncation error. With this characteristic. the proposed multiplier is designed to not only perform single 16-b but also performs single 8-b. On the contrary.consumption of multipliers. In [6] a dynamic-range detector to detect the effective range of two operands was developed. partially guarded computation. in many multimedia and DSP systems is frequently truncated due to the fixed register size and bus width inside the hardware. Here. Various error compensation approaches and circuits. Our main concerns are power efficiency and structural flexibility. we attempt to combine configuration. thereby reducing power consumption maximally.

shifting them to the left and then adding them together. Because some common digital signal processing algorithms spend most of their time multiplying. This process is similar to the method taught to primary schoolchildren for conducting long multiplication on base-10 integers. As more transistors per chip became available due to larger-scale integration. Mainframe computers had multiply instructions. digital signal processor designers sacrifice a lot of chip area in order to make the multiply as fast as possible. introduced in 1978. Early microprocessors also had no multiply instruction. often written using loop unwinding. but has been modified here for application to a base-2 (binary) numeral system. but they did the same sorts of shifts and adds as a "multiply routine". was one of the earliest microprocessors with a dedicated hardware multiply instruction. such as a computer. A variety of computer arithmetic techniques can be used to implement a digital multiplier. The most difficult part is to obtain the partial products. Multiplication basics The method taught in school for multiplying decimal numbers is based on calculating partial products. it became possible to put enough adders on a single chip to sum all the partial products at once. to multiply two binary numbers. It did the same sorts of shifts and adds as a "multiply routine". and then summing the partial products together. but implemented in the microcode of the MUL instruction. rather than reuse a single adder to handle each partial product one at a time. History Until the late 1970s. It is built using binary adders. The Motorola 6809. and so programmers used a "multiply routine" which repeatedly shifts and accumulates partial results. a single-cycle multiply–accumulate unit often used up most of the chip area of early DSPs.Multipliers
A binary multiplier is an electronic circuit used in digital electronics. most minicomputers did not have a multiply instruction. Most techniques involve computing a set of partial products. as that involves multiplying a long number by one digit (from 0 to 9):
123
.

This method is mathematically correct and has the advantage that a small CPU may perform the multiplication by using the shift and add features of its arithmetic logic unit rather than a specialized circuit. as it involves many intermediate additions. etc. of course):
1011 (this is 11 in binary) (this is 14 in binary) (this is 1011 x 0) (this is 1011 x 1.). but with binary numbers. shifted two positions to the left) (this is 1011 x 1. shifted one position to the left) (this is 123 x 4.yields -". The method is slow.x 456 ===== 738 615 + 492 ===== 56088 (this is 123 x 6) (this is 123 x 5. "+ with . shifted three positions to the
x 1110 ====== 0000 1011 1011 + 1011 left) ========= 10011010
(this is 154 in binary)
This is much simpler than in the decimal system. shifted one position to the left) (this is 1011 x 1. however. a modern processor can multiply two 64-bit numbers with 16 additions (rather than 64). The second problem is that the basic school method handles the sign with a separate rule ("+ with + yields +". Modern computers embed the sign of the
. These additions take a lot of time. as the product by 0 or 1 is just 0 or the same number. as there is no table of multiplication to remember: just shifts and adds. and can do several steps in parallel. Faster multipliers may be engineered in order to do fewer additions. shifting them left. the multiplication of two binary numbers comes down to calculating partial products (which are 0 or the first number). and that is much easier than in decimal. shifted two positions to the left)
A binary computer does exactly the same. In binary encoding each long number is multiplied by one digit (either 0 or 1). and then adding them together (a binary addition. Therefore.

That forces the multiplication process to be adapted to handle two's complement numbers.number in the number itself. To produce our product. we then need to add up all eight of our partial products. More advanced approach: signed integers
. suppose we want to multiply two unsigned eight bit integers together: a[7:0] and b[7:0]. IEEE-754 or other binary representations require specific adjustments to the multiplication process. A more advanced approach: an unsigned example For example. one for each bit in multiplicand a:
p0[7:0] = a[0] × b[7:0] = {8{a[0]}} & b[7:0] p1[7:0] = a[1] × b[7:0] = {8{a[1]}} & b[7:0] p2[7:0] = a[2] × b[7:0] = {8{a[2]}} & b[7:0] p3[7:0] = a[3] × b[7:0] = {8{a[3]}} & b[7:0] p4[7:0] = a[4] × b[7:0] = {8{a[4]}} & b[7:0] p5[7:0] = a[5] × b[7:0] = {8{a[5]}} & b[7:0] p6[7:0] = a[6] × b[7:0] = {8{a[6]}} & b[7:0] p7[7:0] = a[7] × b[7:0] = {8{a[7]}} & b[7:0]
where {8{a[0]}} means repeating a[0] (the 0th bit of a) 8 times (Verilog notation). p1 << 1. and that complicates the process a bit more. We can produce eight partial products by performing eight one-bit multiplications. signand-magnitude. to produce our final unsigned 16-bit product. usually in the two's complement representation. Similarly. as shown here:
p0[7] p0[6] p0[5] p0[4] p0[3] p0[2] p0[1] p0[0] + p1[7] p1[6] p1[5] p1[4] p1[3] p1[2] p1[1] p1[0] 0 + p2[7] p2[6] p2[5] p2[4] p2[3] p2[2] p2[1] p2[0] 0 + p3[7] p3[6] p3[5] p3[4] p3[3] p3[2] p3[1] p3[0] 0 + p4[7] p4[6] p4[5] p4[4] p4[3] p4[2] p4[1] p4[0] 0 + p5[7] p5[6] p5[5] p5[4] p5[3] p5[2] p5[1] p5[0] 0 + p6[7] p6[6] p6[5] p6[4] p6[3] p6[2] p6[1] p6[0] 0 + p7[7] p7[6] p7[5] p7[4] p7[3] p7[2] p7[1] p7[0] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
------------------------------------------------------------------------------------------P[15] P[14] P[13] P[12] P[11] P[10] P[9] P[8] P[7] P[6] P[5] P[4] P[3] P[2] P[1] P[0]
In other words. processors that use ones' complement. and so forth. p2 << 2. P[15:0] is produced by summing p0.

When the +1 from the two's complement negation for p7 in
. The sequence of p7 (noncomplemented bit followed by all complemented bits) is because we're subtracting this term so they were all negated to start out with (and a 1 was added in the least significant position). then the partial products would need to have been sign-extended up to the width of the product before summing. The above array multiplier can be modified to support two's complement notation signed numbers by inverting several of the product terms and inserting a one to the left of the first partial product term:
1 ~p0[7] p0[6] p0[5] p0[4] p0[3] p0[2] p0[1] p0[0]
~p1[7] +p1[6] +p1[5] +p1[4] +p1[3] +p1[2] +p1[1] +p1[0] 0 ~p2[7] +p2[6] +p2[5] +p2[4] +p2[3] +p2[2] +p2[1] +p2[0] 0 0 ~p3[7] +p3[6] +p3[5] +p3[4] +p3[3] +p3[2] +p3[1] +p3[0] 0 0 0 ~p4[7] +p4[6] +p4[5] +p4[4] +p4[3] +p4[2] +p4[1] +p4[0] 0 0 0 0 ~p5[7] +p5[6] +p5[5] +p5[4] +p5[3] +p5[2] +p5[1] +p5[0] 0 0 0 0 0 0 0
~p6[7] +p6[6] +p6[5] +p6[4] +p6[3] +p6[2] +p6[1] +p6[0] 0 1 0 0 0 0 0
+p7[7] ~p7[6] ~p7[5] ~p7[4] ~p7[3] ~p7[2] ~p7[1] ~p7[0] 0 0 0
0
0
----------------------------------------------------------------------------------------------------------P[15] P[3] P[14] P[2] P[13] P[1] P[12] P[0] P[11] P[10] P[9] P[8] P[7] P[6] P[5] P[4]
Where ~p represents the complement (opposite value) of p. then partial product p7 would need to be subtracted from the final sum. There are a lot of simplifications in the bit array above that are not shown and are not obvious. The sequences of one complemented bit followed by noncomplemented bits are implementing a two's complement trick to avoid sign extension. the last bit is flipped and an implicit -1 should be added directly below the MSB.If b had been a signed integer instead of an unsigned integer. rather than added to it. If a had been a signed integer. For both types of sequences.

For an explanation and proof of why flipping the MSB saves us the sign extension.bit position 0 (LSB) and all the -1's in bit columns 7 through 14 (where each of the MSBs are located) are added together. see a computer arithmetic book
. they can be simplified to the single 1 that "magically" is floating out to the left.

When a string runs through the MSB. the multiplicand times 2i is subtracted from P.
A typical implementation
. it proceeds from LSB to MSB. The algorithm was invented Booth used desk calculators that were faster by Andrew Donald Booth in 1950 while doing research on crystallography at Birkbeck College in Bloomsbury. and the net effect is interpretation as a negative of the appropriate value. including an implicit bit below the least significant bit. Where these two bits are equal. for i running from 0 to N-1. y-1 = 0. at shifting than adding and created the algorithm to increase their speed. The algorithm is often described as converting strings of 1's in the multiplier to a high-order +1 and a low-order –1 at the ends of the string. The representation of the multiplicand and product are not specified. starting at i = 0. but any number system that supports addition and subtraction will work as well. there is no high-order +1. Booth's algorithm is of interest in the study of computer architecture. low bits can be shifted out.[1] There are many variations and optimizations on these details. As stated here. the order of the steps is not determined. Typically. the multiplicand times 2i is added to P. these are both also in two's complement representation. Where yi = 0 and yi-1 = 1. typically. The final value of P is the signed product. the bits yi and yi-1 are considered. For each bityi. like the multiplier. and subsequent additions and subtractions can then be done just on the highest N bits of P. and where yi = 1 and yi-1 = 0. The algorithm Booth's algorithm examines adjacent pairs of bits of the N-bit multiplier Y in signed two's complement representation. the product accumulator P remains unchanged. London.Chapter 3 Booth’s Algorithm
Booth's multiplication algorithm is a multiplication algorithm that multiplies two signed binary numbers in two's complement notation. the multiplication by 2i is then typically replaced by incremental shifting of the P accumulator to the right between steps.

2. Let m and r be the multiplicand and multiplier. append the value of r. Arithmetic right shift. 3. Repeat steps 2 and 3 until they have been done y times. then performing a rightwardarithmetic shift on P. 1.
P = 0000 0110 0. 3. and x = 4 and y = 4:
m = 0011. To the right of this. 4. -m = 1101. do nothing. find the value of P + S. 2. If they are 00. 1.
. S: Fill the most significant bits with the value of (−m) in two's complement notation. The last two bits are 00. If they are 01. and let x and y represent the number of bits in m and r. Drop the least significant (rightmost) bit from P. This is the product of m and r. respectively. If they are 11. All of these numbers should have a length equal to (x + y + 1). The last two bits are 00. Fill the remaining (y + 1) bits with zeros. 5. 2. Determine the two least significant (rightmost) bits of P. 4. with m = 3 and r = −4. Use P directly in the next step. P = 0000 0110 0. Ignore any overflow. r = 1100 A = 0011 0000 0 S = 1101 0000 0 P = 0000 1100 0 Perform the loop four times : 1. If they are 10. Fill the remaining (y + 1) bits with zeros.Booth's algorithm can be implemented by repeatedly adding (with ordinary unsigned binary addition) one of two predetermined values A and S to a product P. P: Fill the most significant x bits with zeros. Ignore any overflow. and the initial value of P. do nothing. Determine the values of A and S. Fill the least significant (rightmost) bit with a zero. 3. Arithmetically shift the value obtained in the 2nd step by a single place to the right. 1. P = 0000 1100 0. Example Find 3 × (−4). 2. Use P directly in the next step. find the value of P + A. Let P now equal this new value. A: Fill the most significant (leftmost) bits with the value of m.

Right shift. we demonstrate the improved technique by multiplying −8 by 2 using 4 bits for the multiplicand and the multiplier:
A = 1 1000 0000 0 S = 0 1000 0000 0 P = 0 0000 0010 0 Perform the loop four times : 1.
The above mentioned technique is inadequate when the multiplicand is the largest negative number that can be represented (e. P = 1 1110 0000 0.
4. P = P + S. One possible correction to this problem is to add one more bit to the left of A. Arithmetic right shift. The last two bits are 11. Below. The last two bits are 00. P = 0000 0011 0. Right shift. P = 1101 0011 0. The last two bits are 00.
3.
3.
4. P = 1111 0100 1. P = 0 0100 0000 1. P = P + S. P = 1110 1001 1.
P = 0000 0011 0. P = 1 1110 0000 0. P = 0 0100 0000 1. which is −12. P = P + A.g. Right shift.
The product is 1111 0100. S and P.
Booth Recoding
. P = 1 1100 0000 1.
2. Right shift. The last two bits are 01. The last two bits are 10. The last two bits are 10. Arithmetic right shift. if the multiplicand has 4 bits then this value is −8).
The product is 11110000 (after discarding the first and the last bit) which is −16. P = 0 1000 0001 0.
P = 0 0000 0001 0. P = 1 1111 0000 0. P = 0 0000 0001 0. Arithmetic right shift. P = 0 0000 0010 0. P = 1110 1001 1.

and multiply the partial product aligned with the third column by 2:
. to multiply by 7. by using the technique of radix 4 Booth recoding. shift the multiplicand the appropriate number of columns and multiply it by the value of the digit in that column of the multiplier. for each column in the multiplier. we can multiply the partial product aligned against the least significant bit by -1. and multiply by ±1. the number of partial products is exactly the number of columns in the multiplier. 001011 010011 001011 001011 000000 000000 001011 0011010001 With this system.Booth multiplication is a technique that allows for smaller. and provides significant improvements over the "long multiplication" technique. ±2. The basic idea is that. to obtain the same results. faster multiplication circuits. or normal "long multiplication". It is the standard technique used in chip design. So. by recoding the numbers that are multiplied. That is. we only take every second column. to obtain a partial product. Shift and Add A standard approach that might be taken by a novice to perform multiplication is to "shift and add". instead of shifting and adding for every column of the multiplier term and multiplying by 1 or 0. or 0. Reducing the Number of Partial Products It is possible to reduce the number of partial products by half. The partial products are then added to obtain the final result:.

The least significant block uses only two bits of the multiplier. It is also important to note that there is comparatively little complexity penalty in multiplying by 0. which has a delay time that is independent of the size of the inputs. but this can be overcome by adding a single correction term with the necessary "1"s in the correct positions.
shift
and left left left left 0 1 2 3
add bits bits bits bits
method: (x (x (x (x 1) 2) 4) 0)
Multiplicand Multiplicand Multiplicand Multiplicand
shifted shifted shifted shifted
The advantage of this method is the halving of the number of partial products. Negating 2's complement numbers has the added complication of needing to add a "1" to the LSB. Grouping starts from the LSB. 1 or 2. and the complexity and power consumption of its implementation.
shifted shifted
left left
0 2
bits bits
(x (x
-1) 8)
This Partial Partial Partial Partial
is
the Product Product Product Product
same 0 1 2 3 = = = =
result
as
the * * * *
equivalent 1.Partial Partial
Product Product
0 1
= =
Multiplicand Multiplicand
* *
-1. This is important in circuit design as it relates to the propagation delay in the running of the circuit. and the first block only uses two bits of the multiplier (since there is no previous block to overlap):
Figure 1 : Grouping of bits from the multiplier term. Radix-4 Booth Recoding To Booth recode the multiplier term. 1. 0. 1. such that each block overlaps the previous block by one bit. and assumes a zero for the third bit. we consider the bits in blocks of three. All that is needed is a multiplexer or equivalent.
. 2. for use in Booth recoding.

error correction for negation . Hence. 1 -1 . Block 000 001 010 011 100 101 110 Partial Product 0 1 * Multiplicand 1 * Multiplicand 2 * Multiplicand -2 * Multiplicand -1 * Multiplicand -1 * Multiplicand
111 0 Table 1 : Booth recoding strategy for each of the possible block values. We then consult the table 2-3 to decide what the encoding will be. such as the one in figure 2-16. with its outputs being used to form the partial product:
. as the MSB of the block acts like a sign bit.
0 0 1 1 1 block 0 : 1 1 0 block 1 : 0 1 1 block 2 : 0 0 0 Encoding : * (-1) Encoding : * (2) Encoding : * (0)
The previous example can then be rewritten as:
0 0 1 0 1 0 1 1 1 0 1 0 1 1 1 0 0 0 1 0 0 1 1 . 1 0 0 1 0 0 1 multiplicand multiplier booth encoding of multiplier negative term sign extended
1 1 1 1 0 0 0 0 1 0 0 0 1 1
. as below. we would recode our example of 7 (binary 0111) as :
0 1 1 1 block 0 : 1 1 0 block 1 : 0 1 1 Encoding : * (-1) Encoding : * (2)
In the case where there are not enough bits to obtain a MSB of the last block. and there are never any negative products before the least significant block. the LSB of the first block is always assumed to be 0. Since we use the LSB of each block to know what the sign bit was in the previous block. 1 0 0 . discarding the carried high bit
One possible implementation is in the form of a Booth recoder entity. we sign extend the multiplier by one bit.The overlap is necessary so that we know what happened in the last block. 0 1 1 .

it was found to be better to implement the truth tables in terms of VHDL case and if/then/else statements. to select whether or not the partial product bits are shifted left one position. and so on. the neg signal indicates whether or not to invert all of the bits to create a negative product (which must be corrected by adding "1" at some later stage)
•
•
The described operations for booth recoding and partial product generation can be expressed in terms of logical operations if desired but. In figure 2.Figure 2 : Booth Recoder and its associated inputs and outputs. error correction for negation
Fortunately.
1 1 1 1 1 1 1 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 1 1 0 0 0 1 1 0 1 • • 0 1 1 0 0 0 1 0 0 0 1 . Finally. for synthesis. This is easily achievable in hardware.
•
The zero signal indicates whether the multiplicand is zeroed before being used as a partial product The shift signal is used as the control to a 2:1 multiplexer. Sign Extension Tricks Once the Booth recoded partial products have been generated. but requires additional logic gates than if those bits could be permanently kept constant. they need to be shifted and added together in the following fashion:
[Partial Product 1] [Partial Product 2] 0 0 [Partial Product 3] 0 0 0 0 [Partial Product 4] 0 0 0 0 0 0
The problem with implementing this in hardware is that the first partial product needs to be sign extended by 6 bits. the second by four bits. there is a technique that achieves this: Invert the most significant bit (MSB) of each partial product Add an additional '1' to the MSB of the first partial product
.

without the need to sign extend all of the bits.•
Add an additional '1' in front of each partial product
This technique allows any sign bits to be correctly propagated.
0 1 0 1 0 1 1 0 1 1 0 1 1 0 1 1 0 0 0 1 1 0 1 (additional "1"s) 0 1 0 0 1 1 0 0 0 1 0 0 0 1 . error correction for negation
.