You are on page 1of 5

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 55, NO.

12, DECEMBER 2008 1239

Fast Low-Cost Implementation of Single-Clock-Cycle


Binary Comparator
Stefania Perri, Member, IEEE, and Pasquale Corsonello, Member, IEEE

Abstract—This paper presents a new efficient architecture for As always happens in the design of digital circuits, achieving
the design of fast low-cost single-clock-cycle binary comparators. high speed is not the only concern. In fact, also ensuring low-
The proposed 64-bit circuit requires only 1051 transistors and, power consumption and reduced silicon area occupancy is an
when implemented by using the ST 90-nm 1-V CMOS technology,
it exhibits a running frequency higher than 4 GHz with an average important design goal. The novel single cycle comparator pro-
power dissipation of only 4 mW. Comparison with the fastest posed in this paper was designed bearing this issue in mind.
comparator known in the literature demonstrates that, at a parity A 64-bit comparator designed as described here requires only
of technology used, the novel architecture is 12% faster and 1051 transistors, thus reducing the transistor count by about
requires 69% less transistors. 44%, 36%, 66% and 69% with respect to the circuits described
Index Terms—CMOS dynamic circuits, comparator, digital in [1]–[4], respectively. Moreover, the proposed circuit is 12%
arithmetic, VLSI circuits. faster than that in [4], which is the fastest single cycle com-
parator known in the literature.
A very important feature of the novel comparator is that its ar-
I. INTRODUCTION
chitecture can be easily implemented by using both full-custom
and standard-cell-based design approaches. In order to demon-
N THE LAST few years, the design of high-speed, low- strate the efficiency of the new circuit, several implementations
I power, and area-efficient binary comparators has received
a great deal of attention, since, as is well known, comparison is
were carried out with the STMicroelectronics (ST) 90-nm 1-V
CMOS process and the Austria Micro Systems (AMS) 0.35- m
a fundamental operation in almost all digital processors. Exam- 3.3-V CMOS technology.
ples of efficient architectures of binary comparators are demon- The paper is organized as follows. In Section II, a brief back-
strated in [1]–[4]. Among hardware implementations existing in ground is given to describe the design principles of existing
the literature, the latter are very representative solutions. In the comparators; the circuit here proposed is then introduced in
following, these circuits are named full comparators to indicate Section III; implementation results and comparisons with ex-
that, and being two -bit numbers, they are able to sepa- isting comparators are presented in Section IV; finally, conclu-
rately recognize the three possible conditions in which , sions are drawn.
, and .
In order to reach very high throughput, the approach proposed II. DESIGN PRINCIPLES OF EXISTING BINARY COMPARATORS
in [1] uses two-phase clocking dynamic logic with all-N-tran-
sistor (ANT) blocks. Such a 64-bit comparator requires 1890 Let and be
transistors and produces the correct result within 3.5 clock cy- two 64-bit numbers. As is well known, the comparison between
cles. and can be performed by preliminarily comparing
Single-cycle two-phase clocking architectures are presented their corresponding bits. In order to enhance throughput, the ap-
in [2] and [3]. To realize this kind of comparators, a priority proach proposed in [1] adopts the strategy of comparing two
encoding algorithm is used in [2], whereas a parallel MSB bits of each input simultaneously through thirty-two 2-bit com-
checking method is exploited in [3]. The latter is 22% faster parators. The results obtained in this way are then input to fur-
than [2], but it requires 88% more transistors. ther sixteen 2-bit comparators, and so on till the final result is
produced. The comparator presented in [1] is realized using six
In order to further increase achievable speed, a modifica-
tion of the MSB checking algorithm used in [3] was recently levels of 2-bit comparators and exploiting the half-cycle stealing
proposed in [4], where a MUX-based structure specifically of the ANT logic. As a consequence, the comparison between
and is computed within 3.5 clock cycles.
designed for high fan-in comparators is presented. The circuit
In order to provide the comparison result within only
demonstrated in [4] exhibits the highest computational speed
one clock cycle, the comparator proposed in [2] divides the
but it also requires 3386 transistors to be realized.
logic functions required for comparing and
into two stages. In the first one, eight 8-bit comparators
Manuscript received April 01, 2008; revised June 20, 2008. Current version
process eight groups of 8 bits coming from the inputs,
published December 12, 2008. This paper was recommended by Associate Ed- such that the th 8-bit comparator, with , re-
itor M. Anis. ceives the subwords and
The authors are with the Department of Electronics, Computer Science and as inputs and produces
Systems, University of Calabria, Arcavacata di Rende, 87036 Italy (e-mail:
p.corsonello@unical.it). three output signals , and to rec-
Digital Object Identifier 10.1109/TCSII.2008.2008063 ognize the cases in which ,
1549-7747/$25.00 © 2008 IEEE

Authorized licensed use limited to: UNIVERSIDAD DE ALICANTE . Downloaded on February 19,2020 at 20:15:17 UTC from IEEE Xplore. Restrictions apply.
1240 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 55, NO. 12, DECEMBER 2008

and . that, as shown in (3), indicates if the inputs are different. The
Logic equations implemented in this approach are given in signals obtained in this way are input to the priority en-
coder that operates as described by

(4)

The final step of operation is performed by the second stage,


which uses an 8-to-1 multiplexer as the basic component to im-
plement

(5)

Note that and if ,


whereas, when , and .
Finally, both and are equal to 0 if
. Also, in this case, the condition in which
(1)
cannot occur.

where . The signals , III. NOVEL COMPARATOR


and are then sent to the 8-bit comparator used in the The comparison between the two -bit numbers and
second stage that also implements (1), considering , can be performed through an addition operation. In fact,
and in place of , when , the addition between and the
and , respectively. 2’s complement of generates a carry-out signal equal
The approach proposed in [3] determines the location of the to 1, whereas it produces a zero carry-out if .
most significant bit where the inputs are different through a par- Unfortunately, as is well known, low-cost addition architectures,
allel MSB checking method instead of the priority encoder. Also such as ripple-carry, can significantly limit the operating speed.
this technique leads to a two-stage architecture, in which 8-bit On the other hand, high-speed adders, such as the carry-look-
comparators are used as the basic components to implement ahead, can drastically increase the hardware complexity. For
logic equations given in these reasons, the design of efficient comparators known in the
literature does not usually employ addition logic.
The main consideration on the basis of the approach here pro-
posed is that the comparison between and actu-
ally requires a special addition operation that does not produce
the sum bits since only the carry-out signal is necessary to give
the result.
In the following, it is demonstrated how this approach makes
(2) the carry-look-ahead addition (CLA) logic useful to design
high-speed comparators with reduced hardware complexity.

where . Equation (2) leads A. Basic Design Principle


to a full comparator assuring that and In the well-known CLA logic, the basic terms to perform an
if , whereas addition operation are the propagate and the generate signals,
when , and typically computed ORing and ANDing corresponding bits of
. Finally, both and are equal to the input data. These signals are then elaborated to fast compute
0 if . The used logic makes the all intermediate carries and sum bits.
occurrence of the case in which both and are As stated above, a comparison operation requires the addi-
equal to 1 impossible. tion between and the 2’s complement of . The
An alternative high-performance priority encoding algorithm latter can be computed summing 1 to the 1’s complement of ,
was recently proposed in [4] that also presents a two-stage 64-bit which in the following is named . Since
architecture: just the carry-out signal is required to give the result, propagate
and generate terms are not needed for each bit position and, as
(3) an example, they can be computed considering two bits of each
input simultaneously, as shown in

The generic 8-bit comparator used in the first stage performs


logic equations given in (2) and computes a further output signal (6)

Authorized licensed use limited to: UNIVERSIDAD DE ALICANTE . Downloaded on February 19,2020 at 20:15:17 UTC from IEEE Xplore. Restrictions apply.
PERRI AND CORSONELLO: FAST LOW-COST IMPLEMENTATION OF SINGLE-CLOCK-CYCLE BINARY COMPARATOR 1241

TABLE I
COMPUTATIONAL STEPS OF COMPARED 64-bit COMPARATORS

where . The signals and can be then makes the occurrence of the case in which
grouped four by four using the CLA logic again, as shown in impossible.1
The novel comparator exploits logic equations completely
different and much simpler than those used in [1]–[4]. There-
fore, it is expected that it will exhibit significantly reduced
hardware complexity. This expectation comes from two basic
considerations: 1) the 32 instantiations of the 2-bit CLA circuits
(7) required to implement (6) are certainly not more complex than
the 64 2-input EXOR (or EXNOR) gates used within conventional
where . The comparison result is finally obtained comparators to compare corresponding bits of the operands
applying and and 2) the eight 4-bit CLA circuits required
to implement (7) are certainly much less complex than the
eight 8-bit comparators used in the conventional implemen-
tations. Simplifications introduced by the novel comparator
can be better appreciated by analyzing the computational steps
required to perform a comparison. In the example reported
in Table I and . For each
(8) computational step elaborated input data and produced output
data are shown. It can be seen that, in comparison with the
approaches [1]–[4] the proposed comparator requires less
It can be seen that can recognize the cases in which computational steps and each step has less input and output
and . In fact, it is equal to signals, thus providing a reduced hardware complexity.
the carry-out signal obtained by summing and .
However, it is not enough to provide the functionality of a full B. Circuit Implementation
comparator. Therefore, the signal is also computed to es- The top-level architecture of the 64-bit comparator here pro-
tablish if . Thus, when , the posed is illustrated in Fig. 1. It uses three levels of operations.
output signals and are both equal to 1, whereas 1This condition differs from existing comparators [2]–[4], in which the case
they are both equal to 0 if . Finally, when Out = Out = 1 cannot occur. However, this does not affect the overall
, is 0 and is 1. The used logic architecture at system level.

Authorized licensed use limited to: UNIVERSIDAD DE ALICANTE . Downloaded on February 19,2020 at 20:15:17 UTC from IEEE Xplore. Restrictions apply.
1242 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 55, NO. 12, DECEMBER 2008

Fig. 1. Novel 64-bit comparator.

The first and second levels perform their evaluation phase when
the clock signal is high, whereas they perform the precharge
phase when the clock is low. The third level of the comparator
operates in the opposite manner, thus making the circuit able to
compare two 64-bit inputs within only one clock cycle.
The first stage consists of 32 CLA blocks that generate the
signals and defined in (6). These signals are then
input to the eight 4-bit CLA modules of the second stage, in Fig. 2. Transistor-level implementations of the: (a) 4-bit CLA and (b) 8-bit
CLA.
which (7) is implemented. Finally, the 8-bit comparator of the
third level completes the comparison operation as given in (8).
Transistor-level implementations of the basic components
in this case precharging only the central node is enough to re-
used in the novel comparator are depicted in Fig. 2.
move charge sharing. Finally, for the computation of the output
It can be seen that both 4-bit and 8-bit CLA blocks exploit
signal defined in (8), a static CMOS 3-input AND gate
Manchester-Carry-Chain-based architectures. The latter were
is used. The latter was chosen to elaborate simultaneously low-
chosen to achieve sufficiently high speed with a good time-
precharged and high-precharged signals, without requiring any
area-power tradeoff [5]–[8]. It can also be noted that the output
additional circuitry.
signals produced by the generic 4-bit CLA block are latched.
Latches are used to make the comparator able to fast operate
within one clock cycle. Remaining modules implemented to IV. COMPARISON RESULTS
compute the signals , , , , and use Full-custom implementations of the new 64-bit comparator
conventional domino circuits. Therefore, their schematics are were carried out using the AMS 0.35- m 3.3-V CMOS tech-
not reported here. nology and the commercial ST 90-nm 1-V CMOS process.
In an AMS 0.35- m implementation, nMOS transistors Post-layout simulations have been performed considering for
forming the carry chains depicted in Fig. 2 were progressively both implementations typical processes and operation condi-
sized using a tapering factor equal to 1.5 and a minimum tions (i.e., 25 C and nominal supply voltage). Obtained results,
transistor width equal to 1 m. Moreover, all of the precharging summarized in Table II, show that the 0.35- m implementation
transistors are 1.6 m wide. For all other gates, the minimum of the new architecture reaches a maximum running frequency
sizing approach has been used, thus making the nMOS tran- higher than 720 MHz by dissipating 38 W/MHz. When the
sistors in a series m wide. Taking into account the effective ST 90-nm technology is used, the new comparator achieves a
loads on the signals and , the two nMOS 4.3-GHz running frequency with an average power consump-
transistors of each latch, used to correctly interface the second tion of only 1 W/MHz. It can be easily verified that the worst
and third stages, are made 6 m wide, whereas a channel case delay of the proposed comparator occurs when
width equal to 4.8 m is set for the pMOS transistor. All level and (or vice versa), which is the case previously
restorers are minimum sized. analyzed in Table I.
In order to avoid the usage of large transistors, the eight sig- The AMS 0.35- m (ST 90 nm) process has a baseline fan-
nals are ANDed through two 4-input AND gates. More- out-of-4 (FO4) minimum-sized inverter delay of 145 ps (26.8
over, to limit the effects due to the charge sharing occurring in ps), a silicon area of 36 m (3.3 m ), and a power dissi-
high fan-in dynamic gates, intermediate nodes of the 4-input pation of 0.17 W/MHz (0.0027 W/MHz). In Table II, de-
AND gates are also precharged. Simulations demonstrated that lays normalized to FO4 are also reported for the 0.35- m and

Authorized licensed use limited to: UNIVERSIDAD DE ALICANTE . Downloaded on February 19,2020 at 20:15:17 UTC from IEEE Xplore. Restrictions apply.
PERRI AND CORSONELLO: FAST LOW-COST IMPLEMENTATION OF SINGLE-CLOCK-CYCLE BINARY COMPARATOR 1243

TABLE II instead of the dynamic full-custom cells described above. In


RESULTS FOR DYNAMIC FULL-CUSTOM IMPLEMENTATIONS order to do this, Verilog net-list describing the top-level archi-
tecture shown in Fig. 1 was synthesized by the RTL-Compiler
available within the Cadence suite tools [9]. The layout was
then carried out with SOC-Encounter. Obtained characteristics
are summarized in Table III that also gives a comparison with
the synthesizable 64-bit comparator presented in [10]2 and with
a comparator inferred from the Verilog description language
and contained in the Cadence tool library. The static CMOS
90 nm implementation of the novel comparator operates at a
2.7-GHz running frequency dissipating only 4.4 mW.
From Table III, it can be seen that, at a parity of compu-
tational delay, the 90-nm implementation of the novel com-
parator requires 11% and 28% less silicon area than [10] and the
Cadence reference circuit, respectively. Moreover it dissipates
10% and 30% less power. Similar results are achieved also with
the 0.35- m implementation. In this case, 21% and 10% reduc-
tions of the computational time are also achieved with respect
to [10] and the Cadence reference circuit.
TABLE III
RESULTS FOR STATIC CMOS STANDARD-CELL-BASED IMPLEMENTATIONS V. CONCLUSION
Binary comparison is a fundamental operation for most dig-
ital systems. Therefore, the design of comparators offering good
area-time-power tradeoff is strongly desired.
This paper demonstrates that binary comparators can benefit
from the carry-look-ahead logic to achieve high-speed with rea-
sonable area occupancy and power consumption. Results pre-
sented here demonstrate that the proposed approach can be ef-
ficiently applied to both dynamic full-custom and static CMOS
standard-cell-based implementations.

REFERENCES
[1] C. C. Wang, C. F. Wu, and K. C. Tsai, “1 GHz 64-bit high-speed com-
parator using ANT dynamic logic with two-phase clocking,” IEE Proc.
Comput. Digit. Tech., vol. 145, no. 6, pp. 433–436, Nov. 1998.
[2] C. H. Huang and J. S. Wang, “High-performance and power efficient
90-nm implementations. The 0.35- m technology was chosen CMOS comparators,” IEEE J. Solid-State Circuits, vol. 38, no. 2, pp.
254–262, Feb. 2003.
for purposes of comparison with the comparators described in [3] H. M. Lam and C. Y. Tsui, “High-performance single clock cycle
[1]–[4]. In Table II, the two implementations available for the CMOS comparator,” Electron. Lett., vol. 42, no. 2, pp. 75–77, Jan.
comparator presented in [2] are named “[2]_C1” and “[2]_C2.” 2006.
[4] H. M. Lam and C. Y. Tsui, “A MUX-based high-performance single-
Analogously, the implementations [3]_C1 and [3]_C2 are ref- cycle CMOS comparator,” IEEE Trans. Circuits Syst. II, Exp. Briefs,
erenced for the comparator described in [3]. Obtained results vol. 54, no. 7, pp. 591–595, Jul. 2007.
demonstrate that the novel circuit improves speed performance [5] V. Kantabutra, “A recursive carry-lookahead/carry-select hybrid
adder,” IEEE Trans. Comput., vol. 42, no. 12, pp. 1495–1499, Dec.
with respect to the fastest comparator described in [4] by 12% 1992.
and reduces the transistors count by 69%. Even though av- [6] J. H. Lou and J. B. Kuo, “A 1.5-V bootstrapped pass-transistor-based
erage power consumption of the comparator [4] is not available, Manchester carry chain circuit suitable for implementing low-voltage
carry look-ahead adders,” IEEE Trans. Circuits Syst. I, Fundam. Theory
it can be reasonably expected that such a large reduction of tran- Appl., vol. 45, no. 11, pp. 1191–1194, Nov. 1998.
sistors count achieved with the novel circuit also leads to a sig- [7] S. Perri, P. Corsonello, F. Pezzimenti, and V. Kantabutra, “Fast and
nificant reduction of power consumption. energy-efficient Manchester carry-bypass adders,” IEE Proc. Circuits,
Devices Syst., vol. 151, no. 6, pp. 497–502, Dec. 2004.
The proposed architecture is found advantageous in terms of [8] P. Corsonello, S. Perri, and M. Margala, “Efficient addition circuits for
both computational speed and transistors count also over the modular design of processor-in-memory,” IEEE Trans. Circuits Syst. I,
comparators presented in [1]–[3]. When compared to [2], the Reg. Papers, vol. 52, no. 8, pp. 1557–1567, Aug. 2005.
[9] Cadence Documentation [Online]. Available: www.cadence.com
new circuit is 37% faster and requires 36% less transistors. [10] J. E. Stine and M. J. Schulte, “A combined two’s complement and
Furthermore, it reduces the computational delay and the tran- floating-point comparator,” in Proc. IEEE Int. Symp. Circuits Syst.
sistor count by 20% and 66% with respect to [3]. ISCAS 2005, Kobe, Japan.
Equations (6)–(8) discussed in Section III can be imple- 2For purposes of comparison, only the portion required for comparing 2’s
mented in hardware also by using static CMOS standard cells complement numbers has been synthesized and laid out.

Authorized licensed use limited to: UNIVERSIDAD DE ALICANTE . Downloaded on February 19,2020 at 20:15:17 UTC from IEEE Xplore. Restrictions apply.

You might also like