You are on page 1of 47

Multiplier using Booth Algorithm

Project Report
Submitted
in the partial fulfillment of the requirements for
the award of ECE -5382

MASTERS
In

Electronics and Computer Engineering


By

Santhosh Kumar Vempati


Yaswanth Popuri

(R11344923)
(R11358263)

Under the Guidance of


Dr. Tooraj Nikoubin

DEPARTMENT OF ELECTRONICS AND COMPUTER ENGINEERING


TEXAS TECH UNIVERSITY
FALL 2014

ACKNOWLEDGEMENT
To discover, analyse and to present something new is to venture on an
untraded path towards and unexplored destination is an arduous adventure unless one
gets a true torchbearer to show the way. We would have never succeeded in
completing our task without the cooperation, encouragement and help provided to us
by various people. Words are often too less to reveals ones deep regards. We

take

this opportunity to express our profound sense of gratitude and respect to all those
who helped me through the duration of this thesis. We acknowledge with gratitude
and humility our indebtedness to Dr. Tooraj Nikoubin,ECE, Texas Tech University
under whose guidance we had the privilege to complete this projet.. We wish to
express our deep gratitude towards his for providing individual guidance and support
throughout the work.

Santhosh Kumar Vempati

R11344923

Yashwanth Popuri

R11358263

ii

ABSTRACT
The following report contains the information about the work carried by us during the Fall
2014 at Texas Tech university. The purpose of this project is to create a multiplier using
Booth algorithm in the Verilog language and work done on the cadence.
Booth algorithm is used for Simulation and Development of Digital Multiplier. It is a
powerful algorithm for signed-number multiplication, which treats both positive and negative
numbers uniformly. Booth algorithm uses a small number of additions and shift operations to
do the work of multiplication. This approach uses fewer additions and subtractions than more
straightforward algorithms. This work evaluates the performance of the design in terms of
delay, power and their products by hand with logical efforts through custom design using
Verilog language in Xilinx ISE 14.2 tool.

iii

INDEX 1
Acknowledgement

ii

Abstract

iii

Index 1

iv

Index 2: List Of Tables

vi

Index 3 : List Of Figures

vii

1. Introduction
1.1 Algorithm
1.2 Implementation
1.3 Flow Chart
1.4 Example
2.0 Multiplication of two 4bit signed numbers
2.0.1 Verilog Code for 4bit binary numbers
2.0.2 Test bench
2.0.3 Results
2.0.4 Synthesis Report
2.0.5 Schematic
2.0.6 Power Calculation
2.1 Multiplication of two 4bit signed numbers
2.1.1 Verilog Code for 4bit binary numbers
2.1.2 Test bench
2.1.3 Results
2.1.4 Synthesis Report
2.1.5 Schematic

iv

2.1.6 Power Calculation


2.1.7 Delay
2.2 Multiplication of two 16 bit signed numbers
2.1.1 Verilog Code for 4bit binary numbers
2.2.2 Test bench
2.2.3 Results
2.2.4 Synthesis Report
2.2.5 Schematic
2.2.6 Power Calculation
2.2.7 Delay
2.3 Total number of modules used
2.4 Power delay comparison
2.5 Future Work
3.0 Modules Implemented in Cadence

INDEX-II
LIST OF FIGURES
S.No FIGURE No

TITLE

PAGE No

Figure 1

4-bit Output

Figure 2

4-bit Schematic

18

Figure 3

Power Calculation 4 bit

18

Figure 4

8-bit Output

21

Figure 5

8-bit Schematic

22

Figure 6

Total Time delay

22

Figure 7

Total Power

23

Figure 8

16-bit Output

25

Figure 9

Total Power 16bit

26

10 Figure 10

Time Delay 16bit

26

11 Figure 11

Nand Schematic

28

12 Figure 12

Nand delay

29

13 Figure 13

Nand Power

29

14 Figure 14

Nor Schematic

30

15 Figure 15

NOR delay

30

16 Figure 16

XOR schematic

31

17 Figure 17

XOR output

31

18 Figure 18

Half adders schematic

32

19 Figure 19

Half adder output

32

20 Figure 20

Full Adder schematic

33

vi

21 Figure 21

Full Adder output

33

22 Figure 22

Multiplexer schematic

34

23 Figure 23

Multiplexer TB

34

24 Figure 24

Multiplexer Output

34

25 Figure 25

Decoder Schematic

35

26 Figure 26

Deocder TB

35

27 Figure 27

Decoder Output

35

28 Figure 28

D Flip-flop schematic

36

29 Figure 29

D Flip-flopr TB

36

30 Figure 30

D Flipflop output

36

31 Figure 31

Adder Subtractor schematic

37

32 Figure 32

Adder Subtractor TB

37

33 Figure 33

Adder Subtractor output

38

vii

INDEX-III
LIST OF TABLES
S.No

TABLE No

TITLE

PAGE No

Table 2.1

Total number of modules

Table 2.2

Power and Delay Comparison

viii

27
27

1. Introduction:
Booth's multiplication algorithm is a multiplication algorithm that multiplies two signed binary
numbers in two's complement notation. The algorithm was invented by Andrew Donald Booth in
1950 while doing research on crystallography at Birkbeck College in Bloomsbury, London. Booth
used desk calculators that were faster at shifting than adding and created the algorithm to increase
their speed. Booth's algorithm is of interest in the study of computer architecture.
Multiplication is more complicated than addition, being implemented by shifting as well as
addition. Multiplication is nothing but addition of partial products generation and accumulation.
Because of the partial products involved in most multiplication algorithms, more time and more
circuit area is required to compute, allocate, and sum the partial products to obtain the multiplication
result.
A Booth multiplier is a hardware multiplier that performs multiplication of two signed (two's
complement) binary numbers (integers). Booth algorithm, which encodes a binary number one bitpair at a time to the signed-digit set S = {-2, 1,0,1,2},is often used to encode one of the multiplier
inputs to reduce the number of partial products that need to be added.
Signed multiplication is a careful process. With unsigned multiplication there is no need to take the
sign of the number into consideration. However in signed multiplication the same process cannot be
applied because the signed number is in a 2s compliment form which would yield an incorrect result
if multiplied in a similar fashion to unsigned multiplication. Thats where Booths algorithm comes
in. Booths algorithm preserves the sign of the result.
Booth multiplication is a technique that allows for smaller, faster multiplication circuits, by recoding
the numbers that are multiplied. This approach uses fewer additions and subtractions than more
straightforward algorithms.

1.1 Algorithm:
Booth's algorithm examines adjacent pairs of bits of the N-bit multiplier Y in signed two's
complement representation, including an implicit bit below the least significant bit, y-1 = 0. For each
bit yi, for i running from 0 to N-1, the bits yi and yi-1 are considered. Where these two bits are equal,
the product accumulator P is left unchanged. Where yi = 0 and yi-1 = 1, the multiplicand times 2i is
added to P; and where yi = 1 and yi-1 = 0, the multiplicand times 2i is subtracted from P. The final
value of P is the signed product.
The multiplicand and product are not specified; typically, these are both also in two's complement
representation, like the multiplier, but any number system that supports addition and subtraction will
work as well. As stated here, the order of the steps is not determined. Typically, it proceeds from LSB

to MSB, starting at i = 0; the multiplication by 2i is then typically replaced by incremental shifting of


the P accumulator to the right between steps; low bits can be shifted out, and subsequent additions and
subtractions can then be done just on the highest N bits of P.[1] There are many variations and
optimizations on these details.
The algorithm is often described as converting strings of 1's in the multiplier to a high-order +1 and a
low-order 1 at the ends of the string. When a string runs through the MSB, there is no high-order +1,
and the net effect is interpretation as a negative of the appropriate value.

1.2 Implementation:
Booth's algorithm can be implemented by repeatedly adding (with ordinary unsigned binary addition)
one of two predetermined values A and S to a product P, then performing a rightward arithmetic shift
on P. Let m and r be the multiplicand and multiplier, respectively; and let x and y represent the
number of bits in m and r.
Determine the values of A and S, and the initial value of P. All of these numbers should have a length
equal to (x + y + 1).

A: Fill the most significant (leftmost) bits with the value of m. Fill the remaining (y + 1) bits
with zeros.

S: Fill the most significant bits with the value of (m) in two's complement notation. Fill the
remaining (y + 1) bits with zeros.

P: Fill the most significant x bits with zeros. To the right of this, append the value of r. Fill the
least significant (rightmost) bit with a zero.

Determine the two least significant (rightmost) bits oxf P.

If they are 01, find the value of P + A. Ignore any overflow.

If they are 10, find the value of P + S. Ignore any overflow.

If they are 00, do nothing. Use P directly in the next step.

If they are 11, do nothing. Use P directly in the next step.

Arithmetically shift the value obtained in the 2nd step by a single place to the right. Let P now equal
this new value.Repeat steps 2 and 3 until they have been done y times.
Drop the least significant (rightmost) bit from P. This is the product of m and r.

1.3Flow Chart:

1.4 Example:
We demonstrate the technique by multiplying -8 by 2 using 4 bits for the multiplicand and the
multiplier:
A = 1 1000 0000 0
S = 0 1000 0000 0
P = 0 0000 0010 0
Perform the loop four times :
P = 0 0000 0010 0. The last two bits are 00.
P = 0 0000 0001 0. Right shift.
P = 0 0000 0001 0. The last two bits are 10.
P = 0 1000 0001 0. P = P + S.
P = 0 0100 0000 1. Right shift.
P = 0 0100 0000 1. The last two bits are 01.
P = 1 1100 0000 1. P = P + A.
P = 1 1110 0000 0. Right shift.
P = 1 1110 0000 0. The last two bits are 00.
P = 1 1111 0000 0. Right shift.
The product is 11110000 (after discarding the first and the last bit) which is 16.

2.0 Multiplication of two 4 bit signed binary numbers:


As we discussed the flowchart and an example of booth algorithm, it should now be technically easy
to implement the algorithm for higher bits. For 4 bit signed binary number, the max decimal value is
15 to -15. So whenever we give an input greater than 15, the program will consider its 2s compliment
and predict it as negative number .Hence the product of two 4 bit signed binary number is a 8 bit
result.
2.0.1 Verilog code for 4 bit Binary number:
module Multi4bit(X,Y,Z);
input signed [3:0] X,Y;
output signed [7:0] Z;
reg signed [7:0] Z;

reg [1:0] temp_check;


integer i;
reg checkBit;
reg [7:0] Y1;
always @ (X,Y)
begin
Z=8'd0;
checkBit=1'd0;
//Number of shifts is equal to number of bits of operation
for (i=0 ; i<4 ; i=i+1)
begin
temp_check= {X[i],checkBit};
Y1= -Y;
case(temp_check)
2'd2 : begin
//If temp_check is 10 , substract Y from Z, i.e., add Z and Y1
Z[7:3]= Z[7:3]+Y1;
end
2'd1 : begin
//If temp_check is 01 , add Y to Z
Z[7:3]= Z[7:3]+Y;
end
default : begin //If temp_check is 00 or 11 , do nothing
end
endcase
//After add or sub or default case, right shift the Z by 1
Z = Z>>1;
//Restore the sign bit.

Z[7]= Z[6];
//New check bit is equal to current X bit
checkBit=X[i];
end
end
endmodule

2.0.2 Test Bench


module tb_Multi4bit;
// Inputs
reg [3:0] X;
reg [3:0] Y;
// Outputs
wire [7:0] Z;
// Instantiate the Unit Under Test (UUT)
Multi8bit uut (
.X(X),
.Y(Y),
.Z(Z)

);

initial begin
// Initialize Inputs
X= 4'd2;
Y= 4'd3;
$monitor ("X=%d, NegX=%d, Y=%d , Z=%d, NegZ=%d",X,-X, Y, Z, -Z );
#50; // Add stimulus here
end
endmodule

2.0.3 Results:

Fig:1 4-bit output


2.04 Synthesis Report:
Release 14.2 - xst P.28xd (nt)
Copyright (c) 1995-2012 Xilinx, Inc. All rights reserved.
--> Parameter TMPDIR set to xst/projnav.tmp
Total REAL time to Xst completion: 0.00 secs
Total CPU time to Xst completion: 0.12 secs
--> Parameter xsthdpdir set to xst
Total REAL time to Xst completion: 0.00 secs
Total CPU time to Xst completion: 0.12 secs
--> Reading design: Multi4bit.prj
TABLE OF CONTENTS
1) Synthesis Options Summary
2) HDL Compilation
3) Design Hierarchy Analysis
4) HDL Analysis
5) HDL Synthesis

5.1) HDL Synthesis Report


6) Advanced HDL Synthesis
6.1) Advanced HDL Synthesis Report
7) Low Level Synthesis
8) Partition Report
9) Final Report
9.1) Device utilization summary
9.2) Partition Resource Summary
9.3) TIMING REPORT

========================================================================
=
*

Synthesis Options Summary

========================================================================
=
---- Source Parameters
Input File Name

: "Multi4bit.prj"

Input Format

: mixed

Ignore Synthesis Constraint File : NO


---- Target Parameters
Output File Name

: "Multi4bit"

Output Format

: NGC

Target Device

: xc3s100e-4-vq100

---- Source Options


Top Module Name

: Multi4bit

Automatic FSM Extraction

: YES

FSM Encoding Algorithm

: Auto

Safe Implementation
FSM Style

: No
: LUT

RAM Extraction

: Yes

RAM Style

: Auto

ROM Extraction
Mux Style

: Yes
: Auto

Decoder Extraction

: YES

Priority Encoder Extraction

: Yes

Shift Register Extraction

: YES

Logical Shifter Extraction

: YES

XOR Collapsing
ROM Style

: YES
: Auto

Mux Extraction

: Yes

Resource Sharing

: YES

Asynchronous To Synchronous
Multiplier Style

: NO

: Auto

Automatic Register Balancing

: No

---- Target Options


Add IO Buffers

: YES

Global Maximum Fanout

: 500

Add Generic Clock Buffer(BUFG)


Register Duplication
Slice Packing

: 24

: YES
: YES

Optimize Instantiated Primitives : NO


Use Clock Enable
Use Synchronous Set
Use Synchronous Reset

: Yes
: Yes
: Yes

Pack IO Registers into IOBs

: Auto

Equivalent register Removal

: YES

---- General Options


Optimization Goal

: Speed

Optimization Effort

:1

Keep Hierarchy

: No

Netlist Hierarchy

: As_Optimized

RTL Output

: Yes

Global Optimization
Read Cores

: AllClockNets
: YES

Write Timing Constraints

: NO

Cross Clock Analysis

: NO

Hierarchy Separator

:/

Bus Delimiter

: <>

Case Specifier

: Maintain

Slice Utilization Ratio

: 100

BRAM Utilization Ratio


Verilog 2001

: 100
: YES

Auto BRAM Packing


Slice Utilization Ratio Delta

: NO
:5

========================================================================
=
*

HDL Compilation

========================================================================
=
Compiling verilog file "bit_4.v" in library work
Module <Multi4bit> compiled
No errors in compilation
Analysis of file <"Multi4bit.prj"> succeeded.

10

========================================================================
=
*

Design Hierarchy Analysis

========================================================================
=
Analyzing hierarchy for module <Multi4bit> in library <work>.
========================================================================
=
*

HDL Analysis

========================================================================
=
Analyzing top module <Multi4bit>.
Module <Multi4bit> is correct for synthesis.

========================================================================
=
*

HDL Synthesis

========================================================================
=

Performing bidirectional port resolution...

Synthesizing Unit <Multi4bit>.


Related source file is "bit_4.v".
WARNING:Xst:646 - Signal <temp_check> is assigned but never used. This unconnected signal will
be trimmed during the optimization process.
WARNING:Xst:646 - Signal <checkBit> is assigned but never used. This unconnected signal will be
trimmed during the optimization process.
WARNING:Xst:646 - Signal <Y1> is assigned but never used. This unconnected signal will be
trimmed during the optimization process.
Found 5-bit adder for signal <$add0000> created at line 41.
11

Found 5-bit adder for signal <$add0001> created at line 45.


Found 5-bit adder for signal <$add0002> created at line 41.
Found 5-bit adder for signal <$add0003> created at line 45.
Found 5-bit adder for signal <$add0004> created at line 41.
Found 5-bit adder for signal <$add0005> created at line 45.
Found 1-bit 4-to-1 multiplexer for signal <Z$mux0000> created at line 38.
Found 1-bit 4-to-1 multiplexer for signal <Z$mux0001> created at line 38.
Found 1-bit 4-to-1 multiplexer for signal <Z$mux0002> created at line 38.
Found 1-bit 4-to-1 multiplexer for signal <Z$mux0003> created at line 38.
Found 1-bit 4-to-1 multiplexer for signal <Z$mux0004> created at line 38.
Found 1-bit 4-to-1 multiplexer for signal <Z$mux0005> created at line 38.
Found 1-bit 4-to-1 multiplexer for signal <Z$mux0006> created at line 38.
Found 1-bit 4-to-1 multiplexer for signal <Z$mux0007> created at line 38.
Found 1-bit 4-to-1 multiplexer for signal <Z$mux0008> created at line 38.
Found 1-bit 4-to-1 multiplexer for signal <Z$mux0009> created at line 38.
Found 1-bit 4-to-1 multiplexer for signal <Z$mux0010> created at line 38.
Found 1-bit 4-to-1 multiplexer for signal <Z$mux0011> created at line 38.
Found 1-bit 4-to-1 multiplexer for signal <Z$mux0012> created at line 38.
Found 1-bit 4-to-1 multiplexer for signal <Z$mux0013> created at line 38.
Found 1-bit 4-to-1 multiplexer for signal <Z$mux0014> created at line 38.
Found 1-bit 4-to-1 multiplexer for signal <Z$mux0015> created at line 38.
Found 1-bit 4-to-1 multiplexer for signal <Z$mux0016> created at line 38.
Found 1-bit 4-to-1 multiplexer for signal <Z$mux0017> created at line 38.
Found 1-bit 4-to-1 multiplexer for signal <Z$mux0018> created at line 38.
Found 1-bit 4-to-1 multiplexer for signal <Z$mux0019> created at line 38.
Summary:
inferred 7 Adder/Subtractor(s).
inferred 20 Multiplexer(s).

12

Unit <Multi4bit> synthesized.


========================================================================
=
HDL Synthesis Report
Macro Statistics
# Adders/Subtractors

:7

5-bit adder

:6

8-bit adder

:1

# Multiplexers

: 20

1-bit 4-to-1 multiplexer

: 20

========================================================================
=
*

Advanced HDL Synthesis

========================================================================
Advanced HDL Synthesis Report
Macro Statistics
# Adders/Subtractors

:7

5-bit adder

:7

# Multiplexers

: 19

1-bit 4-to-1 multiplexer

: 19

Optimizing unit <Multi4bit> ...


Mapping all equations...
Building and optimizing final netlist ...
Found area constraint ratio of 100 (+ 5) on block Multi4bit, actual ratio is 5.
========================================================================
=
*

Final Report

========================================================================
=
13

Final Results
RTL Top Level Output File Name
Top Level Output File Name
Output Format

: Multi4bit.ngr
: Multi4bit

: NGC

Optimization Goal

: Speed

Keep Hierarchy

: No

Design Statistics
# IOs

: 16

Cell Usage :
# BELS

: 120

GND

:1

LUT2

:8

LUT3

: 24

LUT4

: 42

MULT_AND

MUXCY

MUXF5

: 14

XORCY

: 15

# IO Buffers
#

IBUF

OBUF

:4
: 12

: 16
:8
:8

========================================================================
=

Device utilization summary:


---------------------------

Selected Device : 3s100evq100-4

14

Number of Slices:

42 out of

Number of 4 input LUTs:


Number of IOs:

960

4%

74 out of 1920

3%

16

Number of bonded IOBs:

16 out of

66 24%

--------------------------Partition Resource Summary:


---------------------------

No Partitions were found in this design.

---------------------------

========================================================================
=
TIMING REPORT

NOTE: THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE.


FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
GENERATED AFTER PLACE-and-ROUTE.

Clock Information:
-----------------No clock signals found in this design

Asynchronous Control Signals Information:


---------------------------------------No asynchronous control signals found in this design
15

Timing Summary:
--------------Speed Grade: -4

Minimum period: No path found


Minimum input arrival time before clock: No path found
Maximum output required time after clock: No path found
Maximum combinational path delay: 22.571ns

Timing Detail:
-------------All values displayed in nanoseconds (ns)

========================================================================
=
Timing constraint: Default path analysis
Total number of paths / destination ports: 25181 / 8
------------------------------------------------------------------------Delay:

22.571ns (Levels of Logic = 19)

Source:

Y<1> (PAD)

Destination:

Z<7> (PAD)

Data Path: Y<1> to Z<7>


Gate

Net

Cell:in->out

fanout Delay Delay Logical Name (Net Name)

---------------------------------------- -----------IBUF:I->O
LUT2:I0->O

16 1.218 1.209 Y_1_IBUF (Y_1_IBUF)


3 0.704 0.531 Madd__old_Y1_2_Madd_xor<1>11 (Z_mux0003_mand)
16

MULT_AND:I1->LO

0 0.741 0.000 Z_mux0003_mand (Z_mux0003_mand1)

MUXCY:DI->O

1 0.888 0.000 Madd__add0000_cy<0> (Madd__add0000_cy<0>)

XORCY:CI->O

1 0.804 0.424 Madd__add0000_xor<1> (_add0000<1>)

LUT4:I3->O

3 0.704 0.535 Mmux_Z_mux000221 (X<0>_mmx_out12)

LUT4:I3->O

2 0.704 0.622 Mmux_Z_mux000841 (Z_mux0008)

LUT2:I0->O

1 0.704 0.000 Madd__add0002_lut<0> (Madd__add0002_lut<0>)

MUXCY:S->O

1 0.464 0.000 Madd__add0002_cy<0> (Madd__add0002_cy<0>)

MUXCY:CI->O

1 0.059 0.000 Madd__add0002_cy<1> (Madd__add0002_cy<1>)

XORCY:CI->O

1 0.804 0.420 Madd__add0002_xor<2> (_add0002<2>)

MUXF5:S->O

4 0.739 0.622 Mmux_Z_mux00055_f5 (X<1>_mmx_out4)

LUT3:I2->O

1 0.704 0.499 Mmux_Z_mux00101221 (Z_mux0012)

LUT4:I1->O

3 0.704 0.566 Madd__add0005_cy<1>11 (Madd__add0005_cy<1>)

LUT3:I2->O

1 0.704 0.455 Madd__add0005_cy<2>11 (Madd__add0005_cy<2>)

LUT4:I2->O

1 0.704 0.595 Mmux_Z_mux00101219 (Mmux_Z_mux00101219)

LUT3:I0->O

1 0.704 0.000 Mmux_Z_mux00101258_G (N45)

MUXF5:I1->O

2 0.321 0.447 Mmux_Z_mux00101258 (Z_6_OBUF)

OBUF:I->O

3.272

Z_7_OBUF (Z<7>)

---------------------------------------Total

22.571ns (15.646ns logic, 6.925ns route)

(69.3% logic, 30.7% route)


========================================================================
Total REAL time to Xst completion: 3.00 secs
Total CPU time to Xst completion: 3.70 secs
Total memory usage is 200684 kilobytes
Number of errors : 0 ( 0 filtered)
Number of warnings : 0 ( 0 filtered)
Number of infos :

0 ( 0 filtered)

17

2.0.5 Schematic:

Fig 2. Schematic -4 bit

2.0.6Power calculation:

Fig 3. Power Calculation -4 bit


18

2.1 Multiplication of two 8 bit signed binary numbers


As we discussed the flowchart and an example of booth algorithm, it should now be technically easy
to implement the algorithm for higher bits. For 8 bit signed binary number, the max decimal value is
127 to -128. So whenever we give an input greater than 127, the program will consider its 2s
compliment and predict it as negative number. Hence the product of two 8 bit signed binary number is
a 16 bit result.
2.1.1 Verilog code for 8 bit:
module Multi8bit(X,Y,Z);
input signed [7:0] X,Y;
output signed [15:0] Z;
reg signed [15:0] Z;
reg [1:0] temp_check;
integer i;
reg checkBit;
reg [7:0] Y1;
always @ (X,Y)
begin
Z=16'd0;
checkBit=1'd0;
//Number os shifts is equal to number of bits of operation
for (i=0 ; i<8 ; i=i+1)
begin
temp_check= {X[i],checkBit};
Y1= -Y;
case(temp_check)
2'd2 : begin
//If temp_check is 10 , substract Y from Z, i.e., add Z and Y1
Z[15:8]= Z[15:8]+Y1;
end
2'd1 : begin
19

//If temp_check is 01 , add Y to Z


Z[15:8]= Z[15:8]+Y;
end
default : begin //If temp_check is 00 or 11 , do nothing
end
endcase
//After add or sub or default case, right shift the Z by 1
Z = Z>>1;
//Restore the sign bit.
Z[15]= Z[14];
//New check bit is equal to current X bit
checkBit=X[i];
end
end
endmodule
2.1.2 Test Bench
module tb_Multi8bit;
// Inputs
reg [7:0] X;
reg [7:0] Y;
// Outputs
wire [15:0] Z;
// Instantiate the Unit Under Test (UUT)
Multi8bit uut (
.X(X),
.Y(Y),
.Z(Z)
);

20

initial begin
// Initialize Inputs
X= 8'd2;
Y= -8'd12;
$monitor ("X=%d, NegX=%d, Y=%d , Z=%d, NegZ=%d",X,-X, Y, Z, -Z );
#50;
// Add stimulus here
end
endmodule
2.1.3 Test Results:
Output:

Fig 4: Output 8-bit


2.1.4 HDL Synthesis Report
Macro Statistics
# Adders/Subtractors
8-bit adder
# Multiplexers
1-bit 4-to-1 multiplexer

: 15
: 15
: 64
: 64

21

2.1.5 Schematic:

Fig 5: Schematic -8 bit


2.1.6 Time delay:

Fig 6 : Total Time delay


The total time delay for the 8 bit multiplier 36.640ns

22

2.1.7 Total Power:

Fig 7. Total Power


The total power consumed is 34 mW.
2.2 Verilog Code for 16 bit multiplier:
module BoothAlgthm16bit_VCode(X,Y,Z);
input signed [15:0] X,Y;
output signed [31:0] Z;
reg signed [31:0] Z;
reg [1:0] temp_check;
integer i;
reg checkBit;
reg [15:0] Y1;
always @ (X,Y)
begin
Z=32'd0;
checkBit=1'd0;
//Number os shifts is equal to number of bits of operation
for (i=0 ; i<16 ; i=i+1)
begin
temp_check= {X[i],checkBit};

23

Y1= -Y;
$monitor ("NegY=%d", Y1);
case(temp_check)
2'd2 : begin
//If temp_check is 10 , substract Y from Z, i.e., add Z and Y1
Z[31:16]= Z[31:16]+Y1;
end
2'd1 : begin
//If temp_check is 01 , add Y to Z
Z[31:16]= Z[31:16]+Y;
end
default : begin
//If temp_check is 00 or 11 , do nothing
end
endcase
//After add or sub or default case, right shift the Z by 1
Z = Z>>1;
//Restore the sign bit.
Z[31]= Z[30];
//New check bit is equal to current X bit
checkBit=X[i];
end
end
endmodule

2.2.1 Test Bench


module tb_Multi;
reg [15:0] X;

24

reg [15:0] Y;
wire [31:0] Z;
// Instantiate the Unit Under Test (UUT)
BoothAlgthm16bit_VCode uut (
.X(X),
.Y(Y),
.Z(Z)
);
initial begin
// Initialize Inputs
X= 16'd2555;
Y= -16'd2;
$monitor ("X=%d, NegX=%d, Y=%d , Z=%d, NegZ=%d",X,-X, Y, Z, -Z );
#50;
End
endmodule
2.2.2 Test Results:

Fig 8 :16 bit Output


25

2.2.3 Total Power:

Fig 9.Total Power


The total power consumed is 34mW.
2.2.4 Time dealy

Fig 10.Total Time Delay


The total time delay for the circuit is 61.251 ns.
2.2.5 HDL Synthesis Report
Macro Statistics
# Adders/Subtractors
16-bit adder
# Multiplexers
1-bit 4-to-1 multiplexer

: 31
: 31
: 256
: 256
26

2.3 Total no of modules used:


Module
4 bit
Adders
7(8bit Adders)
Mux 4:1
19
Table 2.1 Total number of modules

8 bit
15 (Adders)
64

16
31(16bit Adders)
256

8bit
34
36.64

16 bit
34
61.259

2.4 Power and Delay Comparison:


Parameter
4 bit
Power (mW)
13.859
Delay
27
Table 2..2 Power and Delay Comparison
2.5 Future Work:
The standard booth algorithm which helps in speeding the multiplication process. For the further
improvement in speed of multiplication, engineers have introduced the modified booth algorithm ,It is
possible to reduce the number of partial products by half, by using the technique of Radix-4 Booth
recoding. The basic idea is that, instead of shifting and adding for every column of the multiplier term
and multiplying by 1 or 0, we only take every second column, and multiply by +/- 1, +/- 2 or 0, to
obtain the same results. Radix-4 booth encoder performs the process of encoding the multiplicand
based on multiplier bits. It will compare 3 bits at a time with overlapping technique.
In Verilog the limitation is calculation of power, so in future it would be better to implement the same
in Cadence in order to have correct power, delay.

27

3.0 Modules implemented in Cadence:


Following basic modules are implemented using Cadence
1. Inverter
2. Nor
3. Nand
4. Xor
5. Half Adder
6. Full Adder
7. Decoder
8. Multiplexer
9. D- Flipflop
10. Shifter
11. Adder-Subtractor

Nand schematic:

Fig 11. Nand Schematic


28

Nand Delay:

Fig 12 Nand delay


Delay: 78.4239 p S

Fig 13: .Nand Power


Power: 3.16uW

29

Nor Schematic:

Fig 14: Nor Schematic

Nor Delay:

Delay: 78.423pS
Fig 15. NOR delay

30

Nor Power:

Fig16 .Nor power


Power : 6.802uW.

Xor Schematic:

Fig 17. XOR schematic

31

Xor Output:

Fig 18. XOR output


Half Adder Schematic:

Fig19. Half adder schematic

32

Half Adder Output:

Fig 20. Half adder output


Full Adder Schematic:

Fig 21. Full Adder schematic


33

Full Adder Output:

Fig 22. Full Adder output


Multiplexer schematic:

Fig 23.Multiplexer schematic

34

Multiplexer TB:

Fig 24. Multiplexer TB

Multiplexer Output:

Fig 25. Multiplexer output

35

Decoder Schematic:

Fig 26. Decoder Schematic


Decoder TB:

Fig 27 : Deocder TB
Decoder Output:

Fig 28: Decoder Output

36

D-Flip-flop Schematic:

Fig29. D Flip-flop schematic


D-Flip-flop TB:

Fig 30 .D Flipflop TB

Output:

Fig 31. D Flipflop output


37

Adder Subtractor Schematic:

Fig 31. Adder Subtractor schematic


Adder Subtractor Tb:

Fig 32. Adder Subtractor TB

38

Adder Subtractor Output:

Fig 33. Adder Subtractor output.

4.0 Conclusion
The result of the above multiplication technique reviewed in this paper is the reduction of the
maximum height of the partial product array, which simplifies the partial product reduction tree, in
terms of delay and regularity of the layout. Considering different technique or design of MAC unit,
pipelining booth multiplication gives good performance in terms of speed and SPST and block
enabling technique are better in low power consumption and area.
5.0 References
1.
2.
3.
4.

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)


Digital Design by M. Morris Mano, Michael D. Ciletti, 4th edition.
Introduction to VLSI by Doughlas A. Pucknell
Fabrizio Lamberti, Nikolaos Andrikos,Elisardo Antelo, Paolo Montuschi, Speeding-up
Booth Encoded Multipliers by Reducing the Size of Partial Product Array," INTERNAL
REPORT DAUIN/DELEN-POLITECNICO DI TORINO AND UNIVERSITY DE
SANTIAGO DE COMPOSTELA, 2009
5. Sandeep Shrivastava*, Jaikaran Singh* and Mukesh Tiwari*, Implementation of Radix-2
Booth Multiplier and Comparison with Radix-4 Encoder Booth Multiplier, International
Journal on Emerging Technologies 2(1): 14-16(2011) ISSN : 0975-8364
6. Dr. Ravi Shankar Mishra,Prof. Puran Gour,Braj Bihari Soni, Design and Implements of
Booth and Robertsons multipliers algorithm on FPGA. International Journal of Engineering
Research and Applications (IJERA) ISSN: 2248-9622.

39