You are on page 1of 10

A High-Speed Effective Signed Multiplier Using Ladner-Fischer Adder for

Hardware Boosters

Pedada Ravi Raj1 and G. V. Subba Reddy2


1M. Tech Scholar, Department of ECE, GRIET, Hyderabad, India

ravirajp1997@gmail.com
2Department of ECE, GRIET, Hyderabad, India

gvsreddy2005@gmail.com

Abstract. Multiplication is a mathematical operation that is used in many different applications, including digital
signal processing and communication systems. The Booth algorithm and the Ladner-Fischer architecture are used to
design a signed multiplier in this study. The Booth algorithm is based on the Radix-4 Booth encoding multiplier,
which minimizes the generated partial products in halves, increasing the multiplier’s speed and reducing the
multiplier’s circuit area. The Ladner-Fischer design has advantages in partial product production and addition that
occur concurrently. To add up the generated partial product, this multiplier employs the Ladner-Fischer parallel
prefix adder, whereas previously designed partial product generators employ the ripple carry adder which increases
the latency. As a result, employing this adder provides greater performance in terms of delay. The proposed approach
is written in Verilog HDL and is synthesised with the 14.7 version of the Xilinx ISE design suite.

Keywords: Radix-4 Algorithm, Multiplier, Ladner-Fischer, Look Up Tables (LUTs).

1 Introduction
Multipliers are essential components of many digital systems, including microprocessors, digital filters, and
digital signal processors. These can also be utilised in the discrete Fourier transform implementations,
ranging measurement and correlation. Multiplication is nothing more than an add and shift technique, which
is a sequence of repetitive additions. In other words, the multiplicand multiplies itself several times. This
multiplication will require a high number of hardware parts and will function at a slower pace. The most
crucial aspect to consider in many real-time applications is speed. Because the DSP sections are not
distributed uniformly over the Field Programmable Gate Array (FPGA), the critical path latency may be
influenced once the majority of them must be combined for enormous multiplication operations [1, 2]. In
general, multiplication is performed by first producing a partial product and afterwards adding to it. The
multiplier's velocity is determined by how quickly partial products are produced and combined. To speed
up the production of partial products, decrease the number of partial products, and use an efficient adder to
speed up the addition. In linear and vectors computing, high-speed specialised multipliers are employed [3].
Booth multipliers with high speeds and pipelines are used in digital signal processing (DSP) applications
such as multimedia and communication networks. Fast Fourier transform (FFT) and other high-speed DSP
processing applications require adds and multiplications.
Technology is currently growing at a breakneck pace in such a brief period. The circuits under designed
contain billions of elements that are small in size, fast, and consume little power. As a result, the design of
any circuit must consider area, speed, and power. To fulfil market demands, a device with a small footprint
and minimal latency limits must be built [4]. LUTs are key resources in Field Programmable Gate Arrays
(FPGAs) which perform Boolean operations. LUTs have 6 available inputs in Xilinx FPGAs and can
perform any Boolean function with up to six inputs. The LUTs may be set up as a single 6-input LUT with
a single output, or as two 5-input LUTs with distinct outcomes but shared inputs. Every LUT output can be
stored in a flip-flop if desired. There are specialised interconnected routes within each Combinational Logic
Block (CLB) for linking LUTs without having to leave and re-enter a CLB, substantially lowering the
utilisation of global routing capabilities. Whenever the FPGA layout changes, the synthesis methods must
adjust to provide the optimum mapping on the resources available [5]. The most essential function in
computer arithmetic is binary addition. VLSI integer adders are essential components in digital signal
processors and general-purpose microprocessors because they are used in ALUs, floating-point arithmetic
data paths and address generating elements [6].
A parallel prefix adder (PPA) is presently thought to be an efficient adder for adding two multi-bit values.
At the level of efficient circuitry, PPA speed and circuit complexity are critical criteria. Parallel prefix adders
are now regarded as effective combinational circuits for conducting binary addition of 2 multi-bit values.
Such adders are commonly found in arithmetic-logic units found in current processors, such as digital signal
processors and microprocessors [7]. When designing a VLSI circuit, various entities must be optimised.
Often, these entities cannot be optimised concurrently and must be enhanced one at a time at the expense of
one or more others. It has become challenging to design an effective integrated circuit that is efficient in
terms of area, power, and speed at the same time. Power dissipation is now considered an important element
in current multipliers. The goal of a good multiplier is to create a spatially compact, high-speed, low-power
semiconductor.
1.1 Concept of Radix-4 Algorithm and Sign Extension
A typical sort of multiplication is multiplication using the Modified Booth algorithm. It is a signed-radix-4 encoding
technique which is redundant. Its major feature is that, when compared to any other radix-2 representation, it decreases
the number of partial products in multiplication by half, increasing speed. It is also known as the bit pair algorithm. The
key concept is that rather than shifting and merging each column of the multiplier phrase and multiplying with one or
zero, just choose each second column and multiply with +1, -1, +2, -2, 0.
Table 1. Radix-4 Algorithm

i+1 i i-1 Booth encoding

0 0 0 0

0 0 1 +1

0 1 0 +1

0 1 1 +2

1 0 0 -2

1 0 1 -1

1 1 0 -1

1 1 1 0

A multiplicand (the MSB) sign and the associated sum of Booth encoding determine the partial product right
sign in a booth’s encoding-stationed multiplier (BE). The Bewick’s sign extension (SE) approach is used in
this case to provide the proper sign for a partial product.
Table 2. Sign Extension

Booth Encoding MSB Multiplicand Sign Extension

0 0 0

0 1 0

1 0 0

1 1 1

2 0 0

2 1 1

2̅ 0 1
̅2 1 0

1̅ 0 1
̅1 1 0
Concept of Ladner-Fischer Adder

Fig. 1. Ladner-Fischer Adder [Website URL: https://renaysha.me/ladner-fischer-adder-63/]

The Ladner-Fischer parallel prefix adder is utilised to conduct the addition operation. It looks to be a tree
structure used to execute the arithmetic process. For high-performance addition, the Ladner-Fischer adder
is utilised. The black and grey cells that make up this adder. Every black cell is equipped with two AND
gates and one OR gate. Every grey cell contains a single AND gate.
It consists of 3 stages: Pre-Processing, Carry generation and Post-Processing stages.
Pre-Processing stage: During this phase, generate and propagate are taken from every pair of inputs. The
propagate performs a "XOR" action on the input bits whereas the generate performs a "AND" action.
Carry generation stage: Carry is produced for every bit at this stage, which is referred to as carry generate.
Carry propagate and carry generate are formed for subsequent operations, while carry is provided by the last
cell in every bit operation. The final bit carry will aid in producing the sum of the next bit and the previous
bit concurrently.
Post-Processing stage: At the last step of an efficient Ladner-Fischer adder, the carry of a first bit is XORed
with the next bit of propagates, and the output is delivered as sum.

2 Literature Survey

Semeen Rehman, Salim Ullah, Muhammad Shafique, Akash Kumar have proposed high-performance
exact and approximation multipliers for FPGA-based hardware boosters. Multipliers is a common arithmetic
operation in a variety of applications including machine learning and image/video processing. High-
performance multipliers are available as DSP blocks from FPGA companies. These multipliers can cause
additional routing delays, be inefficient for multiplications of smaller bit widths, and are limited in quantity
and have fixed placements on FPGAs. As a result, FPGA suppliers now provide soft IP cores that are
designed for multiplication. Furthermore, for complicated applications with competing FPGA resource
requirements, manually optimising the allocation of necessary FPGA resources to improve performance
improvements may be possible.
D. Kalaiyarasi, M. Saraswathi have proposed the improvement of a highspeed Radix-4 booth multiplier
for signed and unsigned numbers. A generic add-and-shift operation may be used to do multiplication, in
which each multiplier bit creates a number of bits of the multiplicand that must be added to the partial
product. In the discrete Fourier transform, correlation, and range measurement, multipliers are also used.
Given that multiplication is a very slow process, any digital system's performance is typically judged by the
number of multipliers it uses. A sequence of repeated additions make up the add and shift algorithm, which
is all that multiplication is. To put it another way, the multiplicand multiplies itself several times. Reducing
the amount of additions will reduce the number of unfinished products, which will enhance performance.
Jie Han, Honglan Jiang, Fabrizio Lombardi, Fei Qiao have proposed Radix-8 booth multipliers with
minimal power and great performance. Multipliers are more complicated than subtractors and adders, and
their speed generally sets the operating speed of the DSP system. With additional variables such as system
latency, hardware complexity, and power consumption, high accuracy is often seen as a stringent criterion.
The Booth Multiplier is frequently used for good performance signed multiplication by encoding and
minimising the amount of partial products. The Radix-4 method multiplier is highly efficient owing to the
simplicity of creating partial products, however the Radix-8 Booth multiplier is sluggish because making
the multiplicand's odd multiples is so challenging.
Haroon Waris, Weiqiang Liu, Chenghua Wang have edeveloped hybrid low radix encoding-based
approximation booth multipliers. In recent years, the design of energy-efficient embedded systems has
become increasingly important. Because a significant variety of applications necessitate bespoke hardware
with lower power consumption, this is the case. Conversely, the amount of data that these hardware units
must handle has expanded dramatically, making it increasingly difficult to achieve both criteria. To resolve
difficulty, approximate computing has emerged as a viable option. In general, approximation circuits refer
to the construction of arithmetic circuits like multipliers and adders. Booth Multiplier with truncation
technique provides great hardware gain but has a big error; consequently, solutions with error compensation
modules are also exhibited. Radix-4 Booth encoding is commonly utilised to generate power-efficient and
small-area signed multipliers because it facilitates the synthesis of partial products.
Pakkiraiah Chakali, Madhu Kumar Patnala have designed carry select adder based on high speed
Ladner-Fischer. There are a variety of addition algorithms available, ranging from simple Ripple Carry
Adders to complicated CLA. Multiplication, Addition, and Accumulation are the three basic operations in
any Digital Signal Processing system. In every digital, DSP, or control system, addition is a necessary action.
As a result, the performance of adders determines how quickly and accurately a digital system operates. As
a result, the key topic of research in VLSI system design is enhancing adder performance. Many various
adder architectures have been developed and suggested to accelerate binary additions during the last decade.
L. P. Deepthi Bollepalli, Chris D. Martinez, David H. K. Hoe have proposed a parallel prefix adder with
fault tolerance for FPGA and VLSI design. Circuits used in nanoscale technologies require a fault-tolerant
system in particular because the small device dimensions make the circuit vulnerable to outside interference,
such as cosmic radiation. Future technologies will therefore prioritise a circuit's ability to identify and
address problems. Optimising the adder design is a current research subject since it commonly determines
the critical path across many digital circuit systems, such as digital signal processors and processor data
pipelines.

3 Methodology

Both the existing and the proposed designs use the same methodology, but in the partial product
accumulation stage of the proposed design, a Ladner-Fischer adder is used rather than RCA, which has the
drawback of being slower because, unlike the existing design, each full adder must wait until the last full
adder generates output carry, which uses a full adder right away. This proposed design will perform better
in terms of latency thanks to the Ladner-Fischer PPA utilised in it.
3.1 Generation of Accurate Signed Partial Products

(a) (b) (c)

Fig. 2. LUT configuration: (a) Type-A (b) Type-B (c) Type-C

A LUT Type-A arrangement is used to carry out booth encoding as shown in fig. 2(a). The multiplicand's
an and an-1, as well as the multiplier's bm+1, bm, and bm-1, are its five inputs. The LUT's core implementation
includes three multiplexers. Depending on the BE value, the first MUX chooses whether to transmit an-1 or
an for partial product manufacture. The 2nd MUX, which is regulated by the ‘c’ signal, inverts the output of
the first MUX. Finally, depending on the strength of the z signal, the third multiplexer can make the partial
product zero. This data is transmitted as a carry propagate signal "p out" in the direction of the associated
carry chain. Using input an, the carry generate signal "gout" is produced.
As demonstrated in Fig. 2(b) and (c), the Bewick's approach, a sign extension strategy, is applied in each
row of the partial product. The multiplicand's an (MSB of multiplicand) , pin, as well as the multiplier's bm+1,
bm, and bm-1, are its five inputs. When it comes to the first partial product of the rows, this input pin is fixed
at "1," while it remains at "0" for the subsequent rows. The LUT determines the signal 𝑆𝐸 ̅̅̅̅ , XORs it, and
then sends the carry propagate signal to the appropriate carry chain. The produced carry signal "gout" is
instantly provided by the pin signal. To transmit the right sign content from one row of partial product to
the next partial product row, LUT Type-C is used.
Fig. 3 (a) depicts the rows of 1st partial products for an 8x8 multiplier using LUTs A, B, and C. The needed
input carry is computed using the rightmost Type-A LUT in every row of partial product. In 2's complement
format, this carry input is used to indicate a partial product. 4 partial product rows will be created for an 8x8
multiplier. The last partial product row doesn't really necessitate the use of a Type-C LUT.

(a)

(b)

Fig. 3. First row partial product for an 8x8 multiplier: (a) First version multiplier (b) Optimized version

Optimization of critical path delay


A NxM multiplier's carry chain in every partial product row is N+4 bits long. It can be shortened to N+1
bits to enhance the multiplier's critical path time. Fig. 3 (b) depicts a critical path delay reduced design of
the multiplier. The pp(x, 0) and pp(x, 1) partial product terms in every partial product row require 1 and 2 bits
of multiplicand, correspondingly. A single 6-input LUT "A1" can implement these 2 partial product terms.
Similarly, in each partial product row, pp (x, 2) may be done independently using another LUT 'A2' which is
of 6-input. The proper input carry may be calculated for each partial product row using a separate 6-input
LUT called "CG."
(a)

(b)

Fig. 4. LUT’s configuration types: (a) LUT A1 (b) LUT A2/CG

Fig. 4. illustrates the internal arrangements of A1, A2, and CG LUTs. The output signal pp(x, 2) and cgout are
the only differences between LUT A2 and CG. The pp(x, 2) signal is used exclusively by LUT A2, but the
cgout signal is used only by LUT CG. For a NxM multiplier, the needed amount of LUTs to produce partial
products is,
𝑀
(N+3) x [ ]
2

4 Existing Design

Fig. 5. Ripple Carry Adder-based existing signed multiplier

The above fig. 5 represents the signed multiplier using Ripple Carry Adder (RCA) in the accumulation stage
of generated partial products. The resultant output from the multiplexer is i.e., the partial products which are
produced using the radix-4 booth encoding are entered into the partial product generation stage and then the
generated partial products are accumulated using the ripple carry adder where it produces the final product.
These are implemented using 6-input LUT which are of different types and the associated carry chains. This
design thereby parallelizes the computation of all partial products, adding the resulting partial products using
the Ripple carry adder.

4.1 Proposed Design

Fig. 6. Proposed signed multiplier using Ladner-Fischer Adder

The above fig. 6 represents then signed multiplier using Ladner-Fischer Adder. Comparing, this signed
multiplier with existing design the resultant output from multiplexer which is obtained using Radix-4 booth
technique is entered to the partial product generation stage where all the partial products are computed
parallelly. The generated partial products will be accumulated using parallel prefix adder called Ladner-
Fischer adder where it gives the better performance in terms of delay unlike in existing design, RCA had the
drawback in terms of delay because of each full adder should wait until the previous full adder to generate
output carry.
Generated Partial Products Accumulation
Ternary adders, Binary adders and 4:2 compressors can be used to reduce partial products which are
generated in order to calculate the final result. Four partial product rows can be split into two output rows
using a 4:2 compressor. In contrast to binary adders, ternary adders experience longer critical path delays.
In the existing work, the binary adders and 4:2 compressors are employed to decrease the partial products
which are generated. Further to increase the performance in terms of delay, binary adders are replaced by
the Ladner-Fischer adder in the proposed design.

4 Results and Discussion

In this section, the simulation results of existing and proposed signed multiplier structures will be
discussed.

4.1 Simulation results of existing design

Fig. 7. Simulation results of existing 8-bit signed multiplier using RCA

The above fig. 7 represents the simulation outcomes of existing 8-bit signed multiplier. It multiplies two 8-
bits and produces a result of 16-bits.
Fig. 8. Simulation results of existing 16-bit signed multiplier using RCA

The above fig. 8 represents the simulation outcomes of existing 16-bit signed multiplier. It multiplies two
16-bits and produces a result of 32-bits.

Fig. 9. Simulation results of existing 32-bit signed multiplier using RCA

The above fig. 9 represents the simulation outcomes of existing 32-bit signed multiplier. It multiplies two
32-bits and produces a result of 64-bits.
Simulation results of proposed design

Fig. 10. Simulation results of proposed 8-bit signed multiplier using Ladner-Fischer Adder

The above fig. 10 represents the proposed 8-bit signed multiplier which multiplies two 8-bit inputs and
produces a result of 16-bits which is implemented using Ladner-Fischer Adder.

Fig. 11. Simulation results of proposed 16-bit signed multiplier using Ladner-Fischer Adder

The above fig. 11 represents the proposed 16-bit signed multiplier which multiplies two 16-bit inputs and
produces a result of 32-bits which is implemented using Ladner-Fischer Adder.
Fig. 12. Simulation results of proposed 32-bit signed multiplier using Ladner-Fischer Adder

The above fig. 12 represents the proposed 32-bit signed multiplier which multiplies two 32-bit inputs and
produces a result of 64-bits which is implemented using Ladner-Fischer Adder.
Comparison table of existing and proposed multipliers
Table 2. Comparison Table

Parameters Existing signed multiplier Proposed signed multiplier

8-bit 16-bit 32-bit 8-bit 16-bit 32-bit

Delay 2.384ns 8.003ns 6.170ns 1.622ns 7.952ns 4.486ns

Power 82mW 27mW 14mW 100mW 11mW 14mW


Frequency 419.5MHz 124.9MHz 162.07MHz 616.67MHz 125.75MHz 222.936MHz

Number of slices 156 576 2316 173 585 2000

5 Conclusion

In this work, the implementation of the 8, 16, 32-bit signed multipliers designs are carried out by using the
software tool XILINX ISE DESIGN SUITE 14.7 version. The existing signed multiplier computes the
partial products parallelly and adds the partial products which are generated using ripple carry adder in the
partial product accumulation stage where it had the drawback in terms of delay, because each full adder
must wait for the output carry of the previous full adder. The proposed work shows better performance in
terms of delay where the ripple carry adder is replaced by the parallel prefix adder called Ladner-Fischer
adder in the accumulation stage. From the output results, it is observed that the delay is reduced when
compared with the existing design.

References

1. S. Ullah, et al, “Area-optimized low-latency approximate multipliers for FPGA-based hardware


accelerators,” in DAC (IEEE, San Francisco, CA, USA, 2018).
2. I.Kuon, J. Rose, “Measuring the gap between FPGAs and ASICs,” in IEEE TCADICS (IEEE, 2007).
3. Ravindra P Rajput, M. N Shanmukha Swamy, “High speed Modified Booth Encoder multiplier for
signed and unsigned numbers,” in International Conference on Computer Modelling and Simulation
(IEEE, Cambridge, UK, 2012).
4. Yamini devi Ykuntam, Katta Pavani, Krishna Saladi, “Design and analysis of High-speed Wallace tree
multiplier using parallel prefix adders for VLSI circuit designs,” in ICCCNT (IEEE, Kharagpur, India,
2020).
5. G.C. Cardarilli, S. Pontarelli, M. Re, A. Salsano, “On the use of Signed Digit Arithmetic for the new 6-
inputs LUT based FPGAs,” in ICECS (IEEE, Saint Julian’s, Malta, 2008).
6. Shilpa K. C and Shwetha M, Geetha B. C, Lohitha D. M, Navya and Pramod N. V, “Performance
Analysis of Parallel Prefix Adder for data path VLSI design,” in ICICCT (IEEE, Coimbatore, India,
2018).
7. Aung Myo San, Alexey N. Yakunin, “Reducing the Hardware Complexity of a Parallel Prefix Adder,”
in IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (IEEE,
Moscow and St. Petersburg, Russia, 2018).
8. David H. K. Hoe, Chris Martinez and Sri Jyothsna Vundavalli, “Design and Characterization of Parallel
Prefix Adders using FPGAs,” in SSST (IEEE, Auburn, AL, USA, 2011).
9. Chris D. Martinez, L. P. Deepthi Bollepalli, and David H. K. Hoe, “A Fault Tolerant Parallel Prefix
Adder for VLSI and FPGA Design,” in SSST (IEEE, Jacksonville, FL, USA, 2012).
10. V. Gupta, et al, “Low-Power Digital Signal Processing Using Approximate Adders,” in IEEE
Transactions on CAD of Integrated Circuits and Systems, vol. 32, pp. 124-137, 2013.
11. Jamal, K., & Srihari, P., “Analysis of test sequence generators for built-in-self-test implementation,” in
IEEE 2015 International Conference on Advanced Computing and Communication Systems, pp. 1-4,
2015.
12. Honglan Jiang, Jie Han, Fei Qiao, Fabrizio Lombardi, “Approximate Radix-8 Booth Multipliers for Low
Power and High-Performance operation,” in IEEE Transactions on Computers, vol. 65, pp. 2638-2644,
2015.
13. Jamal, K., & Srihari, P., “Low power TPC using BSLFSR,” in International Journal of Engineering and
Technology (IJET), 8(2), 759-e, 2016.
14. Jamal, K., Srihari, P., & Kanakasri, G., “Test Vector Generation using Genetic Algorithm for Fault
Tolerant Systems,” in International Journal of Control Theory and Applications (IJCTA), 9(12), pp.
5591-5598, 2016.
15. A. Kakacak, et al, “Fast multiplier generator for FPGAs with LUT based partial product generation and
column/row compression,” in Integr. VLSI J., vol. 57, pp. 147-157, 2017.
16. D. Kalaiyarasi, M. Saraswathi, “Design of an Efficient High Speed Radix-4 Booth Multiplier for both
Signed and Unsigned Numbers,” in AEEICB (IEEE, Chennai, India, 2018).
17. Jamal, K., Srihari, P., Chari, K. M., & Sabitha, B., “Low power test pattern generation using test-per-
scan technique for BIST implementation,” in ARPN Journal of Engineering and Applied Sciences, vol.
13(8), 2018.
18. Haroon Waris, Chenghua Wang, and Weiqiang Liu, “Hybrid Low Radix Encoding based Approximate
Booth Multipliers,” in IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 67, pp. 3367-
3371, 2020.
19. Salim Ullah, Hendrik Schmidl, Siva Satyendra Sahoo, Semeen Rehman and Akash Kumar, “Area-
optimized Accurate and Approximate Softcore Signed Multiplier Architectures,” in IEEE Transactions
on Computers, vol. 70, pp. 384-392, 2020.
20. Salim Ullah, Semeen Rehman, Muhammad Shafique, Akash Kumar, “High-Performance Accurate and
Approximate Multipliers for FPGA-based Hardware Accelerators,” in IEEE TCAD, vol. 41, pp. 211-
224, 2021.

You might also like