You are on page 1of 10

IET Circuits, Devices & Systems

Research Article

Low-power and area efficient binary coded ISSN 1751-858X


Received on 14th February 2015
Accepted on 3rd November 2015
decimal adder design using a look up table- doi: 10.1049/iet-cds.2015.0213
www.ietdl.org
based field programmable gate array
Zarrin Tasnim Sworna 1, Mubin UlHaque 1, Nazma Tara 1, Hafiz Md. Hasan Babu 1 ✉,
Ashis Kumar Biswas 2
1
Department of Computer Science and Engineering, University of Dhaka, Dhaka, Bangladesh
2
Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, TX, USA
✉ E-mail: hafizbabu@hotmail.com

Abstract: The binary coded decimal (BCD) system is suitable for digital communication, which can be designed by field
programmable gate array (FPGA) technology, where look up table (LUT) is one of the major components of FPGA. In
this study, the authors proposed a low power and area efficient LUT-based BCD adder which is constructed basically in
three steps: First, a new technique is introduced for the BCD addition to obtain the correct BCD digit. Second, a new
controller circuit of LUT is presented which is designed to select and send Read/Write voltage to memory cell for
performing Read or Write operation. Finally, a compact BCD adder is designed using the proposed LUT. Their
proposed 2-input LUT outperforms the existing best one providing 65.8% improvement in terms of area, 44.1% for
Read operation and 43.5% for Write operation in power consumption. The proposed BCD adder using FPGA gains a
radical achievement compared with the existing best-known LUT-based BCD adder providing prominent better
performance of 65.6% in area and 48.3% less power consumption.

1 Introduction architecture are proposed. Then, the construction of BCD adder


circuit is illustrated. In Section 5, the simulation and performance
Binary coded decimal (BCD) representation provides accurate analysis of the proposed circuits are elucidated. Finally, the paper
precision, avoids infinite error representation, conversion to a is concluded in Section 6.
character form can be done in linear [O(n)] time and addition–
subtraction does not require rounding [1]. Therefore, faster circuit
for BCD addition method is of concern. Hence, a (look up table 2 Basic definitions
(LUT)-based new BCD addition algorithm is proposed, which
requires less number of field programmable gate array (FPGA) In this section, basic definitions and ideas related to BCD addition
components, area, power and delay. The given algorithm is based method and LUT are presented with illustrative figures and
on the proposed pre-processing mechanism that improves the examples. Besides, we formally define the comparison parameters
existing methods used in [2–4]. such as area, power and delay along with the memory unit of LUT
The advancement in FPGA technology has emerged as a new (memristor).
horizon of technology progress due to long-time availability, rapid
prototyping capability, reliability and hardware parallelism. The Definition 1: BCD adder uses BCD numbers, where each decimal
cost of making incremental changes to FPGA designs is negligible digit is represented with a 4 bit binary code (with weights 8, 4, 2
when compared to the large expense of respinning an and 1). Since a 4 bit binary code has 16 different binary
application-specific integrated circuit [5]. An FPGA has three main combinations, the addition of two BCD digits may produce
elements: LUT, flip flops and the routing matrix. First, the basic incorrect result that exceeds the largest BCD digit (9)10 =
2-input LUT is targeted as it ultimately serves the betterment of (1001)BCD. In such cases, the result must be corrected by adding
3-input, 4-input and further larger input LUTs. Then, a 6-input (6)10 = (0110)BCD to guarantee that the result is a BCD digit. The
LUT architecture is also shown as the BCD adder is designed resultant decimal carry output generated by the correction process
using 6-input LUT. is added to the next higher digit of the BCD addends.
Three main contributions are addressed in this paper:
Example 1: Addition of (7)10 = (0111)BCD to (5)10 = (0101)BCD
(i) A BCD addition algorithm is proposed with the optimum time results into the non-BCD digit (12)10 = (1100)2. The result is
complexity, which outperforms the existing BCD addition corrected by adding (0110)BCD to (1100)2 which becomes (2)10 =
algorithms. (0010)BCD. The output (2)10 = (0010)BCD is the correct decimal
(ii) A novel compact controller circuit of the proposed LUT has sum of (7)10 and (5)10 with a carry 1.
been introduced with the minimum area and power.
(iii) A new architecture for the LUT-based BCD adder is presented Definition 2: An LUT consists of a block of memory cells that are
with the improvement of area, power and delay with respect to indexed by the inputs. The output of the LUT is the value stored
existing ones. in the indexed location of the selected memory cell. Since the
memory cells in the LUT can be set to anything, an n-input LUT
The organisation of this paper is as follows: in the next section, can implement any logic function.
basic definitions and properties of BCD adder and LUT are given.
In Section 3, the earlier approaches and their limitations are Example 2: To implement an AND gate in a 2-input LUT, the
described. In Section 4, BCD addition algorithm and a new LUT contents of the memory cells are the output of AND gate and

IET Circuits Devices Syst., pp. 1–10


& The Institution of Engineering and Technology 2016 1
memory locations of 2-input LUT are the input combination of AND delay [12]. A half adder, which is composed of an AND and
gate. Ex-OR gates, requires 0.160 ns of gate delay. Therefore, the
critical path delay of a half adder is 0.160 ns.
Definition 3: Two main components constituting the power used by a
complementary metal–oxide semiconductor (CMOS) integrated Definition 5: The Area of a logic circuit means the total area
circuit (IC) are: static power and dynamic power. Static power accumulated by the individual circuit elements. CMOS 45 nm
(Pstatic) consists of the power used when the transistor is not in the Open Cell Library [11] can be used to calculate the area of each
process of switching and is determined by the formula individual elements. Suppose, a circuit consists of n gates and area
of those n gates are A1, A2, …, An. Then by using the above
Pstatic = Istatic × Vdd (1) definition, area (A) of that circuit is given as follows

n
where Vdd is the supply voltage and Istatic is the total current flowing 
through the device [6]. A= Ai (3)
i=1

Dynamic power (Pdynamic) is the sum of transient power Example 5: Using CMOS 45 nm Open Cell Library [11], the area of
consumption (Ptransient) and capacitive load power (Pcap) a half adder is 2.36 μm2 as the area of an AND gate is 1.06 μm2 and
consumption. Ptransient represents the amount of power consumed Ex-OR gate is 1.30. Therefore, the total area becomes (1.06 + 1.3)
when the device changes logic states, i.e. ‘0’–‘1’ bit or vice versa. μm2 = 2.36 μm2.
Capacitive load power consumption represents the power used to
charge the load capacitance Definition 6: Memristor is a contraction for memory resistor. It was
first invented by Leon Chua (1971) [13] and first produced by
Pdynamic = Pcap + Ptransient (2.1) Hewlett Packard (HP) Labs (2008) [14]. A physical memristor
consists of a two-terminal device whose resistance material is
Ptransient = CL + C × Vdd 2 × f × N 3 titanium dioxide (TiO2). When the voltage is turned off, the
 
(2.2)
resistance remains as it did just before it was turned off, which
where CL is the load capacitance, C is the internal capacitance of the makes it a non-linear and non-volatile memory device. The
IC, f is the frequency of operation and N is the number of bits that are cross-section of a memristor cell and the symbol are shown in Fig. 1.
switching [7].
The memristor was originally defined in terms of a non-linear
Example 3: Suppose, we consider a half adder circuit which consists functional relationship between magnetic flux linkage Φm(t) and
of an AND gate and an Ex-OR gate that are constituted of six and the amount of electric charge that has flowed, q(t)
eight transistors, respectively. Using Microwind DSCH [8],  
threshold voltage for this circuit is found to be 0.5 V and the f Fm (t ), q(t ) = 0 (4)
current passing through the transistors is 0.1 mA. Hence, the
power consumed by a single transistor is (0.5 × 0.1) mW = 0.05 Each memristor is characterised by its memristance function
mW. Therefore, a half adder requires (14 × 0.05) mW = 0.7 mW of describing the charge-dependent rate of change of flux with
static power consumption. On the other hand, dynamic power charge. Substituting the flux by the time integral of the voltage;
consumption is data dependent. A 2-input Ex-OR gate dissipates and charge by the time integral of current, the more convenient
2.7–4.2 μW of dynamic power Vdd ranging from 0.8 to 3.6 V with form is
frequency ( f ) of 1 MHz and load capacitance (CL) of 5 pF [9]. A
2-input AND gate dissipates 17 μW of dynamic power with   dF/dt V (t)
frequency ( f ) of 1 MHz, load capacitance (CL) of 50 pF and Vdd M q(t ) = = (5)
dq/dt I (t )
is ranging from 3.0 to 3.6 V [10]. Therefore, total dynamic power
for a half adder is (4.2 + 17) μW = 21.2 μW. We have considered
static power consumption of the circuit since dynamic power Definition 7: Memristance normalised state parameter (NSP) is a
consumption is much less than the static power consumption on property of an electronic component. If charge flows in one
average cases. direction through a circuit, the resistance increases and if charge
flows in the opposite direction, the resistance decreases.
Definition 4: Delay represents the critical delay of the circuit, which
considers the following two assumptions: First, each gate performs
computation in unit time which means that every gate will take
same amount of time for internal logic operations. Second, all the 3 Design analysis of existing techniques
inputs are to be known to the circuit before the computation begins.
In this section, different types of the latest existing BCD adders [2–4]
Example 4: The delay of a half adder circuit is calculated using and the recent related works of construction of 2-input LUT [15–18]
DSCH [8], CMOS 45 nm Open Cell Library [11], interconnect are presented.

Fig. 1 Memristor
a Cross-section
b Symbol

IET Circuits Devices Syst., pp. 1–10


2 & The Institution of Engineering and Technology 2016
3.1 Existing BCD adders using LUTs Table 1 Truth table of 3 bit addition with pre-processing and addition
of 3
BCD adder with different architectures of carry chain is introduced (A3 A2 A1) + (B3 B2 B1) C1 Cout S3 S2 S1 Remark
in [4] to minimise the delay which is implemented on Virtex-4
FPGA device, whereas V’azquez and co-workers [3] used the 0 0 0 0 0 0 0 0 0 0 0 pre-processing
6-input LUTs to add the three most significant bit (MSB) of the 0 0 0 0 0 1 0 0 0 0 1 pre-processing
addends and made the correction of the output by the addition of 0 0 0 0 1 0 0 0 0 1 0 pre-processing
3 to the sum of the three MSBs of the addends when the sum is 0 0 0 0 1 1 0 0 0 1 1 pre-processing
0 0 0 1 0 0 0 0 1 0 0 pre-processing
≥4. A post-correction is required by replacing the three MSBs ⋮ ⋮ ⋮
(111) with (100) for correct output. Martín et al. [4] have 0 0 0 0 0 0 1 0 0 0 1 pre-processing
proposed two versions of adder using Virtex 5/6 input/dual output 0 0 0 0 0 1 1 0 0 1 0 pre-processing
LUTs and achieved improvement over conventional binary adders ⋮ ⋮ ⋮
1 0 0 0 0 0 1 1 0 0 0 pre-processing +
and decimal adders in terms of delay and area. However, in the add 3
carry-chain structure studied in paper [4], the propagate (P) and 1 0 0 0 0 1 1 1 0 0 1 pre-processing +
generate (G) functions are more complex and therefore more time add 3
and area consuming. de Dinechin and Vázquez [3] had developed 1 0 0 0 1 0 1 1 0 1 0 pre-processing +
add 3
a new decimal carry propagate addition algorithm which is more 1 0 0 0 1 1 1 1 0 1 1 pre-processing +
suitable for FPGAs. Although the algorithm makes a full use of add 3
the fast carry chain and slice logic of a state-of-the-art FPGA 1 0 0 1 0 0 1 1 1 0 0 pre-processing +
device, the post-correction mechanism requires additional circuits add 3
which consume more area. Gao et al. [2] proposed a method for
BCD addition where the addition of the first bit of the two
operands is implemented using a full adder; and the addition of CoutS3S2S1S0, where Cout represents the position of tens digit and
the three MSBs is performed using 6-input LUTs, where the S3S2S1S0 symbolises unit digit of BCD sum. Instead of
correction is also ensured by adding 3 if the sum is ≥5. However, post-processing, our new approach deals with pre-processing. The
if carry from the full adder is 1 and the sum of the three MSBs of least significant bit (LSB) of both addends is added first which
the operands are equal to 4, then extra circuitry is required outside produces the LSB of sum output, i.e. S0 and an intermediate carry,
of the LUT, where two bits are set to zero and thus creates extra C1. Then, the carry C1 is added with B3B2B1 to produce b3b2b1.
circuit overhead for the correction which can be omitted as shown Finally, A3A2A1 and b3b2b1 are fed to the FPGA’s LUT as inputs.
in our proposed method. In [2], the critical path of the adder has In Table 1, the truth table is designed with the three MSBs of each
been minimised by bypassing two multiplexers from incoming input of LUT A and B . The addition of carry C1 with B3B2B1
carry to outgoing carry in the 1-digit BCD adder. Their resulting b3b2b1 is remarked as pre-processing. A numeric 3
implementation has resulted improvements in terms of delay [(011)2] is added, if the sum of A and b is ≥5. Mathematically
reduction and number of LUTs used over [3].
Almurib Haider et al. [15] proposed a new READ and WRITE  
Cout S3 S2 S1 S0 = (A + b + 3)
operation in a memristor-based LUT. They proposed a memory (6)
block for an LUT of an FPGA using memristors as storage if A + b ≥ 5, where b = B3 B2 B1 + C1
elements and N-type MOS transistors for column selection.
Different from previous methods [16–18], their proposed WRITE Example 6: Suppose, the decimal values of the two variables A and B
operation does not require extra circuits to monitor the incoming are 9 and 5, respectively. For the BCD addition of these operands,
data and then propagate it to the corresponding line for the the BCD representation of them is taken. First, we add the LSB of
WRITE operation and thus requires three power lines [+Vdd, −Vdd A and B with the carry (Cin) from the previous BCD digit
and ground (GND)], whereas the methods [16–18] require four addition. If it is the first digit addition of BCD addends, then the
power lines. It has attained a significant improvement over the value of Cin is zero. The obtained sum (S0) is the first bit of the
existing static random-access memory (SRAM) and contemporary output and the carry is added to the three MSBs of B providing a
memristor technologies [16–18] in terms of energy dissipation value of b which is 011.
and delay reduction, but has suffered a huge drawback in terms
of area which consumed almost 50% of the total area as
Afterwards, we add the three MSBs of A and b. As the result is 111
compared with control logic block of FPGA due to a complex
which is >5, 3 is added with the sum to obtain the correct result. The
controller circuit.
resultant sum consisting of 4 bits represents the first digit of the BCD
value which is 4 in this case and the carry is the next resultant digit

4 Proposed design of BCD adder using LUTs

In this section, a BCD addition algorithm and an LUT architecture


are proposed. Then, a new BCD adder circuit is constructed.
Essential figures and lemmas are presented to clarify the proposed
ideas.

4.1 Proposed BCD addition method

The main problem in BCD addition is the need for correction, if the
result exceeds the permitted BCD range (decimal number 9). The
correction actually adds the binary number (0110)2 to the result.
This logic penalises high delay and extra level of circuit.
Therefore, our contribution in this work is to design a new area
efficient and high-speed BCD adder that can be employed in
different decimal applications.
Let A and B be the two addends of a 1-digit BCD adder, where
binary representations of A and B are A3A2A1A0 and B3B2B1B0, Fig. 2 Demonstration of the proposed BCD addition algorithm exhibited in
respectively. The adder’s output will be a 5 bit binary number Example 6.

IET Circuits Devices Syst., pp. 1–10


& The Institution of Engineering and Technology 2016 3
4.2 Proposed architecture of an LUT

An LUT consists of two basic parts: (i) controller circuit and (ii)
memory unit. Memristor is considered as memory unit due to its
non-volatility. Besides, being non-volatile, comparing with other
memories such as SRAM [15], dynamic random access memory
(DRAM), ferroelectric random access memory (FeRAM), magneto
resistive random access memory (MRAM) [19], nano random
access memory (NRAM) [20] conductive bridging random access
memory (CBRAM) [21] and phase change random access memory
(PCRAM) [22] memristor provides more area and power
efficiency [23]. In this paper, the controller part of the circuit is
proposed in compact way with optimal number of gates which
includes the selection of a memristor cell along with the Read/
Fig. 3 Algorithm of proposed BCD addition Write voltage passing to the cell. The internal memristance
changes with the applied voltages for the corresponding Read/
which is 1. The demonstration is provided in Fig. 2. The algorithm of Write/Reprogram operation is followed as in paper [15].
the BCD addition method with pre-processing technique is given in The Write voltage is considered as Data and the Read voltage is
Algorithm 4.1 (Fig. 3). Next, we propose an LUT to present an considered as Read Pulse. As only one memristor will be selected
LUT-based BCD adder. at a time, only one Write voltage (Data) is considered. It is never

Fig. 4 Architecture of proposed LUTs


a Proposed 2-Input LUT
b Proposed 6-Input LUT

IET Circuits Devices Syst., pp. 1–10


4 & The Institution of Engineering and Technology 2016
Fig. 5 Algorithm for the construction of proposed 2-input LUT

Table 2 Read and Write scheme using the proposed approach


Method Input Data path Memory Output

Operation Ren Wen Rpulse A B Data M00 M01 M10 M11 O1 O2 Out

Write ↓ ↑ ↓ 0 0 √ data — — — — — no output


↓ ↑ ↓ 0 1 √ — data — — — — no output
↓ ↑ ↓ 1 0 √ — — data — — — no output
↓ ↑ ↓ 1 1 √ — — — data — — no output
Read ↑ ↓ ↑ 0 0 √ data — — — data — data
↑ ↓ ↑ 0 1 √ — data — — data — data
↑ ↓ ↑ 1 0 √ — — data — — data data
↑ ↓ ↑ 1 1 √ — — — data — data data

Note: ‘—’, not selected; ‘↑’, high; ‘↓’, low; ‘ √’, data present on data path.

possible to run Read and Write operations simultaneously. The selection of the memory unit is performed depending on the two
Therefore, the Ex-OR and corresponding AND gates are used to inputs of the LUT such as A and B, as they refer to the corresponding
select only one operation at a time to avoid this ambiguity. Once memory addresses of the memory cells (memristor). Considering the
one operation is selected then either the Read or the Write voltage addresses of the memristors to be 00, 01, 10 and 11, the addresses
is passed from the AND or transmission gate, respectively, to the can be represented as A  B,
 A B, AB  and AB, respectively. A transistor
OR gate. The output of the OR gate is connected to the left of is activated through input B, then input A is sent from that transistor
each memristor to propagate the operational voltage either to Write to next transistor to activate the next one. Two transistors T9 and T10
1/0 to store in the memory or to Read the corresponding memory are connected to the output lines O1 and O2, respectively. The
unit when the memristor is selected. output from M00 and M01 passes through O1 and output from M10

Fig. 6 Algorithm for construction of proposed BCD adder circuit

IET Circuits Devices Syst., pp. 1–10


& The Institution of Engineering and Technology 2016 5
and M11 passes through O2. Although two memristors are connected to activated by input B (when B is 1) and sending input A as
single output line, as only one memristor is activated at a time so only column selection. Besides, a row selection voltage is sent using
value would pass through the line. As the output line is connected to the two inputs C and D in the same way. As there are four
the drain of the transistors so the passed voltage would not affect the layers, for selecting a particular layer, a layer selection voltage is
unselected memristor as unactivated transistor never passes voltage generated using inputs E and F. With every memristor, there are
from drain to source. The transistors are activated by R only when a three transistors where first one is being activated through layer
Read operation is being performed, the output voltage of the selection voltage and sending column selection voltage to next
selected memristor will be passed through the corresponding transistor. Second transistor being activated, sends row selection
transistor to the output line. During Write operation the transistors T9 voltage to next transistor and the third transistor being activated,
and T10 will not be activated, so no value will be passed to the and finally activates the corresponding memristor. In comparison
output multiplexer (MUX). The proposed LUT is shown in Fig. 4a with [24], a 6-input LUT requires 100 memristors: 64
and the algorithm for the construction is given in Algorithm 4.2 memristors are required for memory cell unit and 36 memristors
(Fig. 5). A lemma is also given in Lemma 1 in supporting the are required for reference cell. On the other hand, our proposed
generalisation of a 2-input LUT circuit. 6-input LUT requires a total of 64 memristors and no additional
Reset used in existing LUT [15] is omitted as it is never possible to memristors are required for reference cell.
select all of the memory cells to reset them altogether. Since Reset is
nothing but the Write 0 operation, there is no difference between 4.2.1 Working mechanism of the proposed 2-input LUT:
these two operations when performed on single memory cell In this section, two types of operations Read and Write are described.
which removes hardware complexity. Besides, instead of using the Write operation: For Write operation, Write Enable pulse voltage
conventional Wl (word line) and Bl (bit line), the direct use of the is high which is passed to the transmission gate to send the Write
LUT input with inverter has reduced the controller circuitry voltage through the gate. The two inputs A and B selects the
overhead. particular memory cell Mij, where i = {0,1} and j = {0,1}. The
Similarly, we can design a 6-input LUT using a 2-input LUT initial memristance of Mi is first considered ROFF and RON for
which is shown in Fig. 4b. A 6-input LUT has 26 = 64 memory Write 1 and Write 0 operation, respectively. Then, a pulse of +Vdd/
cells with a common controller circuit and two outputs Omux1 −Vdd is applied through Vin until the memristance changes the
and Omux2. To make area efficient, a three-dimensional layer state. Thus, a logic 1/0 is successfully written to Mij. Suppose, for
structure has been used. A total of 64 memory cells have been writing 1 to memristor M11, B = 1 activates transistor T1 and sends
arranged in four layers, where each layer has 16 memory cells. A = 1 from T1 to transistor T2 which selects M11 and the Write
For a particular layer, a column is selected by inputs A and B voltage (+Vdd) being applied to memristor until the memristance
using a selection circuit consisted of four transistors each being changes accordingly to 1, consequently writes 1 to M11. Since we

Fig. 7 Proposed BCD adder


a Block diagram of 1-digit BCD adder
b 1-Digit BCD adder circuit
c Block diagram of proposed n-digit BCD adder

IET Circuits Devices Syst., pp. 1–10


6 & The Institution of Engineering and Technology 2016
address M00 and M01) or O2 (the memristors of address M11 and
M10). Suppose, for reading 1 from memristor M11, B = 1 activates
transistor T1 and sends A = 1 from T1 to transistor T2 which selects
M11, as the Read Pulse is being applied to memristor, the value of
the memristor is propagated through transistor T2 to the transistor
T10 which is activated by the R value and so the output voltage is
transmitted from the memristor through the transistor T10 to output
line O2 and finally to the MUX gate. The input A being 1, selects
the value of O2 to the output terminal. For Read 0 operation,
assuming the NSP (memristance) of the memristor is zero, it will
slightly change the NSP of the memristor towards the value of
RON. To restore the NSP to its original value, a RESTORE pulse
of –Vdd is applied.
The total procedure is summarised in Table 2.
Fig. 8 Comparison of the time complexity of existing and proposed BCD
addition techniques
Lemma 1: An n-input LUT requires at least 2n−2 LUTs with 2-input,
where n ≥ 2.

have omitted reset operation described in [15], to re-programme a


memristor, if we want to perform Write 0 operation on a Proof: We prove the above statement by mathematical induction.
memristor already having performed Write 1 operation, we have to Basis: The basis case holds for n = 2 as (22–2) = 1.
perform Write 0 operation on that memristor. Moreover, to Hypothesis: Assume that the statement holds for n = k. Therefore,
perform Write 1 operation on a memristor already having a k-input LUT consists of 2k−2 LUTs with two inputs.
performed Write 0 operation, we have to perform Write 1 Induction: Now, we will consider n = k+1. Therefore, a (k +
operation on that memristor. 1)-input LUT requires 2k+1–2 = 2k−1 LUTs of two inputs.
Read operation: For Read operation, the Read Enable voltage and Now, we reduce the number of inputs by one to produce n = k.
Read Pulse (+Vdd) both are high and will be propagated to the Then, a k-input LUT requires 2k−1–1 = 2k−2 LUTs with two inputs
particular memory cell M by using inputs A, B and transistors which holds the hypothesis.
accordingly. To perform the Read 0/Read 1 operation, a positive Therefore, the statement holds for n = k + 1.
pulse of +Vdd (Read pulse) is applied to the memristor and the Therefore, for n ≥ 2, an n-input LUT consists of 2n−2 LUTs with
Read value is found at the output terminal O1 (the memristors of two inputs. □

Fig. 9 Simulation results of the proposed LUT and the BCD adder
a Simulation result of controller circuitry of 2-input LUT
b Simulation result of BCD adder with intermediate carry C1 = 0
c Simulation result of BCD adder with intermediate carry C1 = 1

IET Circuits Devices Syst., pp. 1–10


& The Institution of Engineering and Technology 2016 7
Table 3 Comparison of number of gates, time complexity, area-delay product and power-delay product among conventional binary addition then
conversion method and proposed BCD addition technique

Methods Number of gates Time Area × delay product, Power × delay


complexity μm2 ns product, mW ns
6-input 2-to-1 2-input 2-input 2-input
LUTs MUX AND OR EX-OR

existing-approach-1 [26] 9 8 — — 8 O (10n) 12,593,099.36 4,461,838.35


existing-approach-2 [26] 10 7 — — 8 O (9n) 13,968,196.33 4,950,104.6
proposed 4 — 4 2 4 O (5n) 2,803,339.88 992,733.88

‘—’ Represents that the design does not require the corresponding component.

Table 4 Comparison of number of gates, power and delay among proposed pre-processing and existing post-processing techniques for BCD addition
method

Methods Number of gates Power, mW Delay, ns

6-input LUT 2-to-1 MUX 2-input AND 2-input OR 2-input EX-OR NOT

post-processing technique existing [2] 4 3 4 1 5 2 38.105 454.32


existing [3] 5 4 — — 4 — 38.402 1,293.24
existing [4] 8 7 — — 8 — 54.055 1,294.12
proposed pre-processing 4 — 4 2 4 — 26.71 337.08

‘—’ Represents that the design does not require the corresponding component.

Example 7: For n = 5, a 5-input LUT requires 25–2 = 8 LUTs with The time complexity of the proposed addition method is
two inputs. mathematically proven in Lemma 2.

Lemma 2: An n-digit BCD adder requires at least O(5n) of time


4.3 Proposed BCD adder circuit using LUTs complexity, where n is the number of data bits.

An LUT-based BCD adder is designed using the proposed BCD Proof: We will prove the above statement by method of
addition algorithm and LUT. An algorithm for the construction of contradiction.
the proposed BCD adder circuit is presented in Algorithm 4.3 Suppose, an n-digit BCD adder does not require at least O(5n) of
(Fig. 6). According to the algorithm, the block diagram and the time complexity.
circuit are depicted in Fig. 7. For the addition of the least The critical path delay of our proposed n-digit BCD adder requires
significant 1 bit, a full adder; and for pre-processing, two half n full adders, 2n half adders, n OR gates and 4n LUTs with six
adders and an OR gate are used. We use the 6-input LUT to add inputs. Except the arrangement of 6-input LUTs, our proposed
the three MSB of the operands with the correction by adding 3 if design has a serial architecture which has a latency of O(4n).
condition (6) satisfies. Therefore, the addition circuit improves by Hence, the time complexity of the proposed BCD adder is O(5n).
removing extra circuitry overhead than the existing circuits [2–4]. This contradicts the supposition. Hence, the supposition is false
Using the proposed 1-digit BCD adder circuit, we can easily create and Lemma 2 is true. □
an n-digit BCD adder circuit, where the Cout of 1-digit adder
circuit is sent to the next digit of the BCD adder circuit as a Cin. A comparison of time complexity of BCD addition techniques
So, the generalised n-digit BCD adder computes sequentially using between the best-known existing [2] and the proposed methods is
the previous carry which is shown in Fig. 7. shown in Fig. 8.

Table 5 Comparison of hardware complexity between existing [15] and 5 Simulation results and performance analysis
proposed controller circuits of 2-input LUT
The 2-input LUT is simulated using Microwind DSCH [8] and
Parameter Methods LTSPICE IV [25]. The simulation using DSCH is exhibited in
Fig. 9a, where we can see that, up to ∼100 ns, both A and B are
Existing [15] Proposed
low and thus M00 is selected but no operational voltage is applied
Total Total Total Total to M00 since both Ren and Wen are low. From ∼100 ns, A is high
gates transistors gates transistors and B is low and thus M10 is selected. Similarly, from ∼200 to
∼300 ns, both A and B are high and M11 is selected. Wen is high
inverter 6 6 3 3 during ∼400 to ∼500 ns and data is written to M01 (B is high and
OP-AMP 2 12 1 6
2-o-1 MUX 3 60 1 20
transmission 2 8 1 4 Table 6 Comparison between existing [15] and proposed LUTs with
gate different sizes for WRITE operation in terms of area
2-input Ex-OR 1 8 1 8
2-input AND 1 6 3 18 LUT size Area, μm2
3-input AND 4 32 — —
2-input OR 4 24 1 6
Existing [15] Proposed
4-input AND 1 10 — —
transistor 2 2 10 10
total 26 168 21 75 2-input 37.68 12.89
4-input 150.72 51.56
6-input 602.88 206.24
‘—’ Represents that the design does not require the corresponding 8-input 2411.52 824.96
component.

IET Circuits Devices Syst., pp. 1–10


8 & The Institution of Engineering and Technology 2016
Fig. 10 Comparative analysis of existing [15–18] and proposed LUTs
a Power consumed in Write operation
b Power consumed in Read operation
c Delay in Write operation
d Delay in Read operation

A is low). Read Pulse is high from ∼700 ns to the rest of the timing for BCD output. Table 3 ensures the improvement of the proposed
diagram and Ren is also high during that time period, so read BCD addition technique over conventional binary addition and
operation is performed on selected memory cell M01 and read conversion technique. In BCD addition algorithm new
value is propagated to the MUX which is shown as out. In pre-processing technique is introduced which is advantageous over
Figs. 9b and c, the simulation results of the proposed BCD adder existing post-processing technique as elucidated in Table 4.
are demonstrated. Table 4 shows the improvement of number of gates, power and
In spite of using BCD adder first performing binary addition and delay only due to adopting pre-processing technique instead of
then converting the binary output to BCD format is another approach post-processing technique.

Table 7 Comparison among different 1 to n-digits existing [2–4] and proposed BCD adders

Methods 1-digit BCD 2-digit BCD

Area, μm2 Power, mW Delay, ns Area, μm2 Power, mW Delay, ns

existing [2] 2434.3 862.04 56.61 4876.6 1724.08 113.22


existing [3] 3033.44 1074.9 56.78 6066.88 2149.8 113.56
existing [4] 4833.92 1714.08 85.78 9667.84 3428.16 171.56
proposed 836.52 445.52 50.85 1673.04 891.04 101.7
Methods 3-digit BCD n-digit BCD
Area, μm2 Power, mW Delay, ns Area, μm2 Power, mW Delay, ns
existing [2] 7302.9 2586.12 169.83 2434.3n 862.04n 56.61n
existing [3] 9100.32 3224.7 170.34 3033.44n 1074.9n 56.78n
existing [4] 14,501.8 5142.24 257.34 4833.92n 1714.08n 85.78n
proposed 2509.56 1336.56 152.55 836.52n 445.52n 50.85n

IET Circuits Devices Syst., pp. 1–10


& The Institution of Engineering and Technology 2016 9
For comparison of the 2-input LUT in terms of numbers of gates 3 de Dinechin, F., Vázquez, A.: ‘Multi-operand decimal adder trees for FPGAs’, 2010
and transistors with the best-known existing LUT [15] is shown in 4 Vasquez, M., Sutter, G., Bioul, G., Deschamps, J.-P.: ‘Decimal adders/subtractors
in FPGA: efficient 6-input LUT implementations’. Int. Conf. on Reconfigurable
Table 5. The area comparison of LUT among the existing and the Computing and FPGAs, 2009, ReConFig’09, 2009
proposed circuits are elucidated in Table 6. The performance of 5 Saeid, G., Jaberipur, G., Asl, R.H.: ‘Efficient ASIC and FPGA implementation of
Write operation and Read operation in terms of power and delay is binary-coded decimal digit multipliers’, Circuits Syst. Signal Process., 2014, 33,
exhibited with the existing ones [15–18] in Fig. 10. In Table 7, the (12), pp. 3883–3899
6 Sah, C.-T.: ‘Fundamentals of solid-state electronics’ (World Scientific, 1991)
scalable BCD adder’s performance with the performance of 7 http://large.stanford.edu/courses/2010/ph240/iyer2
existing ones [2–4] is elucidated considering the proposed BCD 8 ‘The simulation software Microwind’, can be found at http://microwind.net/
addition algorithm along with proposed LUT. download/Setup-Packages/ MicrowindLite35.exe
9 Product data sheet 74AUP1G86-Q100, low-power 2-input EXCLUSIVE-OR gate,
Rev. 1, 20 October 2014. Available at http://www.nxp.com/documents/data_sheet/
74AUP1G86_Q100.pdf
6 Conclusions 10 Product data sheet 74AHC_AHCT1G08-Q100, low-power 2-input AND gate, Rev.
1, 13 July 2012. Available at http://www.nxp.com/documents/data_sheet/
The outstanding beneficiary features and advancement have made 74AHC_AHCT1G08-Q100.pdf
11 ‘CMOS 45 nm Open Cell Library’. Available at http://www.si2.org/openeda.si2.
today’s world the era of FPGA. LUT being the most important org/projects/nangatelib
and complex element of FPGA is the main concern for the 12 Saraswat, K.: ‘Interconnect scaling’. Available at http://www.stanford.edu/class/
improvement of it. The proposed 2-input LUT has prominent ee311/NOTES/InterconnectScalingSlides.pdf
enhancement on average 50.9% in terms of area and power 13 Chua, L.O.: ‘Memristor-the missing circuit element’, IEEE Trans. Circuit Theory,
1971, ct-18, (5), pp. 507–519
comparing with the best existing one [15]. Besides, BCD addition 14 Strukov, D.B., Snider, G.S., Stewart, D.R., et al.: ‘The missing memristor found’,
being the most basic arithmetical operation, it is the main focus as Nature, 2008, 453 (7191), pp. 80–83, Bibcode:2008Natur.453…80S, doi: 10.1038/
an application of LUT-based FPGA. It is shown by the nature06932
comparative analysis that the proposed BCD adder is constructed 15 Almurib Haider, A.F., Nandha Kumar, T., Lombardi, F.: ‘A memristor-based LUT
for FPGAs’. 2014 Ninth IEEE Int. Conf. on Nano/Micro Engineered and Molecular
with the optimum ‘area’ (proposed circuit has 836.52n μm2 as Systems (NEMS), 2014
compared with 2434.3n nm2 [2]), ‘power’ (proposed circuit has 16 Ho, Y., Huang, G.M., Li, P.: ‘Dynamical properties and design analysis for
445.52n mW as compared with 862.04n mW [2]) and ‘delay’ nonvolatile memristor memories’, Circuits and Systems I: Regular Papers, IEEE
(proposed circuit has 50.85n ns as compared with 56.61n ns [2]), Transactions on, 2011, 58, (4), pp. 724–736
17 Haron, N.Z., Hamdioui, S.: ‘On defect oriented testing for hybrid CMOS/
where n is the number of BCD digits. These improvements in memristor memory’, In Test Symposium (ATS), 2011 20th Asian, pp. 353–358,
FPGA-based BCD addition will consequently influence the IEEE, 2011.
advancement in all other arithmetic operations as well as 18 Dong, X.X., Jouppi, N.P., Xie, Y.: ‘Design implications of memristor-based
computation and manipulation of decimal digits, as it is more RRAM cross-point structures’, Proc. Des. Autom. Test Eur., 2011, pp. 1–6
19 Yuan X.: ‘Modeling, architecture, and applications for emerging memory
convenient to convert from decimal to BCD than binary. technologies’, IEEE Comput. Des. Test., 2011, 28, (1), pp. 44–51
Moreover, it is utilitarian for exact decimal calculations, which is 20 Sohrab, K., Rosendale, G., Manning, M., et al.: ‘A 3D stackable carbon
often a requirement for financial applications, accountancy etc. It nanotube-based nonvolatile memory (NRAM)’. 2010 IEEE Proc. of the
also makes things like multiplying/dividing by powers of 10 easier European Solid-State Device Research Conf. (ESSDERC), 2010
21 Thomas, M., Salinga, M., Kund, M., Kever, T.: ‘Nonvolatile memory concepts
with ‘fixed pitch’ format, making it easy to find the nth digit in a based on resistive switching in inorganic materials’, Adv. Eng. Mater. 2009, 11,
particular number, such arithmetic operations can easily be (4), pp. 235–240
chunked into multiple threads, for example parallel processing [1]. 22 More John, T., Gilton, T.L.: ‘PCRAM cell manufacturing’. US Patent No.
6,348,365, 2002
23 do Sul, G.: ‘Non-volatile memory: emerging technologies and their impacts on
memory systems. Available at http://www3.pucrs.br/pucrs/files/uni/poa/facin/pos/
7 References relatoriostec/tr060.pdf
24 Chen, Y.-C., Zhang, W., Li, H.: ‘A look up table design with 3D bipolar RRAMs’
1 Morteza, D., Jaberipur, G.: ‘Low area/power decimal addition with carry-select ASP-DAC 2012, 73–78
correction and carry- select sum-digits’, Integr. VLSI J., 2014, 47, (4), pp. 443–451 25 ‘The simulation software Ltspice IV’. Available at http://http://ltspice-iv.en.lo4d.com
2 Gao, S., Al-Khalili, D., Chabini, N.: ‘An improved BCD adder using 6-LUT 26 Bioul, G., Vazquez, M., Deschamps, J.P., et al.: ‘Decimal addition in FPGA’. Fifth
FPGAs’. IEEE Tenth Int. New Circuits and Systems Conf. (NEWCAS), 2012., Southern Conf. on Programmable Logic, SPL 2009, 1–3 April 2009, pp. 101–108,
2012 doi: 10.1109/SPL.2009.4914894

IET Circuits Devices Syst., pp. 1–10


10 & The Institution of Engineering and Technology 2016

You might also like