You are on page 1of 2

Low-power high-performance CMOS 5-2 Cin1 Cin2

compressor with 58 transistors X5


X4

D. Balobas✉ and N. Konofaos X3


Sum
X2 FA FA FA
Compressors are the fundamental components in the partial product X1
reduction stage of CMOS multipliers. A new design is presented for
the CMOS 5-2 compressor with 58 transistors, which is the lowest Cout1 Cout2 Carry
reported device count for such a circuit. Simulation results show that
the proposed 5-2 compressor has significantly improved power–delay Fig. 1 Conventional implementation of 5-2 compressor with full adders
performance compared to previously proposed approaches.

Introduction: Multiplication is a very common and important operation Proposed circuit: A block diagram of the proposed 5-2 compressor is
in general purpose microprocessors and digital signal processors. Its shown in Fig. 2a. It consists of six XOR and three MUX blocks,
speed is often a determining factor of how fast the processors can whose generalised schematics are shown in Figs. 2b and c, respectively.
operate [1]. A fast multiplication process usually uses an array multiplier This compressor is characterised by (4)–(7), similar to the conventional
and consists of three stages: partial product generation, partial product one. The functionality of the full adder was decomposed into two XOR
reduction and final carry-propagating addition [2]. In large multipliers, gates and a 2-1 MUX and these basic cells were rearranged to achieve a
the second stage usually contributes the most in delay, power and smaller critical path. Consequently, this structure requires four gate
area, therefore improvement in this field is deemed essential. delays to produce the Sum and Carry outputs.
The partial product reduction can be efficiently implemented with
compressor trees, which commonly use 4-2 and 5-2 compressors as X1 X2 X3 X4 Cin1 X5 Cin2
X
building blocks. Improving the compressor cell directly affects the
overall performance of the multiplier. A 5-2 compressor was initially XOR XOR XOR Y
used in [2] for the design of a high performance 16 × 16-bit multiplier,
X
however specific transistor-level implementation was not provided. XY+XY
Several follow-up publications proposed new designs for the 5-2 com- MUX XOR
Y
pressor with different structures and performance trade-offs [3–8]. X
Cout1
In this Letter, we propose a new low-power and high-performance
CMOS 5-2 compressor with a low transistor count. A comparative b
MUX XOR
study is also conducted to evaluate the new circuit against prior works. S
Cout2 VDD

A
5-2 Compressor functionality: The 5-2 compressor receives seven XOR MUX S SA+SB
inputs of equal weight, including five primary inputs X1–5 and two
B
inputs Cin1-2 that receive their values from the neighboring compressor
of 1 binary bit order lower. It generates an output Sum of the same Sum Carry
S
weight as the inputs and three outputs Cout1-2, Carry weighted 1 a c
binary bit order higher. Its functionality is described by the following
binary arithmetic equation: Fig. 2 Architecture of the proposed 5-2 compressor
a Block diagram
X1 + X2 + X3 + X4 + X5 + Cin1 + Cin2 = b Schematic of 2-input XOR cell
(1) c Schematic of 2-1 MUX cell
Sum + 2 × (Cout1 + Cout2 + Carry)

Fig. 1 shows a conventional implementation of the 5-2 compressor using The complete transistor-level schematic of the proposed 5-2 compres-
a cascade of three full adders. In its general form, a full adder receives sor is shown in Fig. 3. It uses full-swing transmission gate logic and
three inputs (a, b, c) and generates two outputs (SFA, CFA), which can be static inverters so that no threshold drops occur in intermediate nodes
expressed by the following Boolean equations: and output nodes are fully restored. These traits make it suitable for
voltage/process scaling and cascading. Furthermore, unlike previous
SFA = a ⊕ b ⊕ c (2) designs, this one completely avoids the use of complex static CMOS
gates and dual rail XOR-XNOR logic, which require more transistors
for their implementation. As a result, it comprises a total of 58 transis-
CFA = (a ⊕ b) · c + (a ⊕ b)′ · a (3) tors, which is the lowest reported for a 5-2 compressor, to the best of the
authors’ knowledge. Fig. 4 illustrates a layout implementation of the
Consecutively, by applying (2) and (3) to the 5-2 compressor of proposed circuit using a 32 nm CMOS n-well process. It utilises three
Fig. 1, we obtain the following set of Boolean equations: metal levels and yields a total area of 5.6 μm2.

Sum = X1 ⊕ X2 ⊕ X3 ⊕ X4 ⊕ X5 ⊕ Cin1 ⊕ Cin2 (4) Simulation setup: The proposed 5-2 compressor was implemented on
transistor level along with several designs from the relevant literature
Cout1 = (X1 ⊕ X2 ) · X3 + (X1 ⊕ X2 )′ · X1 (5) [3–8], for the sake of comparison. In case that an examined work pro-
posed several new designs, we chose the one that was deemed more effi-
cient by its authors. A 32 nm predictive technology model was used for
Cout2 = (X1 ⊕ X2 ⊕ X3 ⊕ X4 ) · Cin1 the simulations, incorporating high-k/metal gate and stress effect [9].
(6) Post-layout simulations were also performed for our circuit, including
+ (X1 ⊕ X2 ⊕ X3 ⊕ X4 )′ · X4
the parasitic capacitances extracted from the layout.
The same sizing methodology was used for all circuits to ensure fair
and unbiased comparison. All transistors have the minimum length (Ln,
Carry = (X1 ⊕ X2 ⊕ X3 ⊕ X4 ⊕ X5 ⊕ Cin1 ) · Cin2 Lp) of 30 nm. Regarding transistor width, transmission gates use equally
(7)
+ (X1 ⊕ X2 ⊕ X3 ⊕ X4 ⊕ X5 ⊕ Cin1 )′ · X5 sized transistors (Wn = Wp = 60 nm), whereas in every other case, k
stacked nMOS (pMOS) have Wn = k × 60 nm (Wp = k × 120 nm). In
This conventional implementation is not very efficient, since it has a case of [3, 6], we added some inverters (along with proper modifi-
critical path of six gate delays for the Sum and Carry outputs. Several cations) to avoid outputs driven by pass transistors. Therefore, their tran-
alternative architectures have been proposed, using fewer logic stages sistor count is slightly higher than originally reported.
and in some cases producing output equations that differ from (4)–(7). The evaluation procedure was realised with BSIM4-level SPICE
Nonetheless, all designs must abide by (1) for valid operation. simulations, which were used to calculate the maximum delay and

ELECTRONICS LETTERS 8th March 2018 Vol. 54 No. 5 pp. 278–280


average power dissipation of examined circuits under typical conditions According to Table 1, the proposed 5-2 compressor has the lowest
(VDD = 1 V, T = 70°C). All inputs were buffered by a pair of inverters transistor count, achieving 23.6–38.3% improvement compared to
and all outputs were loaded with 0.72 fF of capacitance. Delay corre- [3–8]. Furthermore, it has the lowest power dissipation and PDP, achiev-
sponds to the worst-case transition that a circuit can perform (50% of ing 35.6–64.9% and 48.1–78.8% improvement, respectively. Regarding
input switching to 50% of corresponding output switching), which delay, it is 1.8% slower than [7] and 4.8–46.3% faster than [3–6, 8]. The
was found after iterative simulations in each case. Regarding power, post-layout results for our circuit add an overhead of 32.8% in power
the circuits were fed randomly generated input bit sequences with and 38.5% in delay, compared to pre-layout simulation. Nonetheless,
1 GHz frequency and 50% switching probability, for a period of power and PDP are still lower than schematic level implementations
100 ns. Power consumed by the input buffers was not taken into of [3–8] and delay is lower than [3, 5, 8].
account.
Conclusion: We presented a new design for the CMOS 5-2 compressor
X1 X1 X3 X3 X4 X4 X5 X5 which has the lowest reported device count of 58 transistors. The pro-
X5 Z2
posed circuit uses full-swing logic with fully restored outputs and is
Z3
thus suitable for process scaling and cascading. Furthermore, according
Cin2 Z4 Z5 to the simulation results, it achieves the best overall power–delay per-
Z3
X5 Z2 Z3 Carry formance compared to previously proposed designs. It is concluded
Z3 Z5
Cin2 Z4 X5 that the new 5-2 compressor can be utilised as an efficient component
X5 Z2 Z3
of low-power and high-performance CMOS multipliers.
X4 X3 Z3
Acknowledgments: This work was supported by the Research Projects
Cin1 Z1 Z5
for Excellence IKY/SIEMENS.
Z2
X4 X3 Z3
Z2 Z4 Sum
Cin1 Z1 Z5 © The Institution of Engineering and Technology 2018
X4 X3 Z3 Submitted: 31 August 2017 E-first: 31 January 2018
X1 Z1 Z2 doi: 10.1049/el.2017.3339
X2 X3 Z4 One or more of the Figures in this Letter are available in colour online.
X1 Z1 Z1 Cout1 Z2 Cout2 D. Balobas and N. Konofaos (Department of Informatics, Aristotle
Z1
X2 X1 X4 University of Thessaloniki, 54124 Thessaloniki, Greece)
X1 Z1 Z2 ✉ E-mail: dmpalomp@csd.auth.gr

Fig. 3 Transistor level implementation of the proposed 5-2 compressor References


1 Hsiao, S.F., Jiang, M.R., and Yeh, J.S.: ‘Design of high-speed low-power
X1 X2 X3 X4 Cin1 X5 Cin2 VDD GND 3-2 counter and 4-2 compressor for fast multipliers’, Electron. Lett.,
1998, 34, (4), pp. 341–343, doi: 10.1049/el:19980306
2 Kwon, O., Nowka, K., and Swartzlander, E.E.: ‘A 16-bit × 16-bit MAC
design using fast 5:2 compressors’. IEEE Int. Conf. Application-Specific
Systems, Architectures, and Processors, Boston, MA, USA, July 2000,
pp. 235–243, doi: 10.1109/ASAP.2000.862394
3 Prasad, K., and Parhi, K.K.: ‘Low-power 4-2 and 5-2 compressors’. 35th
Asilomar Conf. Signals, Systems and Computers, Pacific Grove, CA,
USA, November 2001, vol. 1, pp. 129–133, doi: 10.1109/
ACSSC.2001.986892
4 Chang, C.H., Gu, J., and Zhang, M.: ‘Ultra low-voltage low-power
CMOS 4-2 and 5-2 compressors for fast arithmetic circuits’, Trans.
Circuits Syst. I, Regul.Pap., 2004, 51, (10), pp. 1985–1997, doi:
Sum
Cout1
10.1109/TCSI.2004.835683
5 Veeramachaneni, S., Krishna, K.M., Avinash, L., et al.: ‘Novel architec-
Cout2 Carry
tures for high-speed and low-power 3-2, 4-2 and 5-2 compressors’. 20th
Int. Conf. VLSI Design, Bangalore, India, January 2007, pp. 324–329,
Fig. 4 Layout of the proposed 5-2 compressor (3.015 μm × 1.860 μm) doi: 10.1109/VLSID.2007.116
6 Tohidi, M., Mousazadeh, M., Akbari, S., et al.: ‘CMOS implementation
of a new high speed, glitch-free 5-2 compressor for fast arithmetic oper-
Comparative results: Table 1 summarises the results of our comparative
ations’. 20th Int. Conf. Mixed Design of Integrated Circuits and Systems,
study, namely transistor count, power, delay and power-delay-product Gdynia, Poland, June 2013, pp. 204–208
(PDP). Power and delay are calculated according to the preceding 7 Najafi, A., Timarchi, S., and Najafi, A.: ‘High-speed energy-efficient 5:2
section and given in nanowatts (nW) and picoseconds (ps), respectively. compressor’. 37th Int. Convention on Information and Communication
The PDP results from multiplying corresponding delay and power and is Technology, Electronics and Microelectronics, Opatija, Croatia, May
given in attojoules (aJ). 2014, pp. 80–84, doi: 10.1109/MIPRO.2014.6859537
8 Najafi, A., Najafi, A., and Mirzakuchaki, S.: ‘Low-power and high-
Table 1: Comparison of 5-2 compressors performance 5:2 compressors’. 22nd Iranian Conf. Electrical
Engineering, Tehran, Iran, May 2014, pp. 33–37, doi: 10.1109/
5-2 compressor Transistors Power (nW) Delay (ps) PDP (aJ) IranianCEE.2014.6999498
Proposeda 58 2172 335 728 9 University of California, Berkeley: ‘Predictive technology model
Proposedb 58 2885 464 1339 (PTM)’, available at http://ptm.asu.edu/, accessed August 2017
[3], 2001 76 5510 624 3438
[4], 2004 76 3987 352 1403
[5], 2007 90 6184 515 3185
[6], 2013 94 3375 450 1519
[7], 2014 82 5045 329 1660
[8], 2014 78 4519 518 2341
a
Pre-layout.
b
Post-layout.

ELECTRONICS LETTERS 8th March 2018 Vol. 54 No. 5 pp. 278–280

You might also like