You are on page 1of 7

Adv. Radio Sci., 9, 289295, 2011 www.adv-radio-sci.net/9/289/2011/ doi:10.5194/ars-9-289-2011 Author(s) 2011. CC Attribution 3.0 License.

Advances in Radio Science

A modied prex operator well suited for area-efcient brick-based adder implementations
I. Rust and T. G. Noll Chair of Electrical Engineering and Computer Systems, RWTH Aachen University, 52062 Aachen, Germany

Abstract. The implementation of integrated circuits becomes more and more difcult in the Ultra-Deep-Submicron regime due to sub-wavelength lithography issues. An approach called Brick-Based Design was recently proposed to eliminate the disadvantages of staying with the classical approach to layout design. Prex adders are a core component in a wide variety of applications due to their high speed and regular topology. In this paper, a modied prex operator for prex adders is proposed which is well suited for brick-style layout implementation and, in addition, offers an increase in efciency. The proposed operator makes it possible to use a mirror gate for the generation of both generate and propagate signals, which exhibits a forbidden input signal combination. This forbidden state causes an increase in power dissipation due to transient short circuit currents. The effect of the forbidden state was quantied as part of a comparison against the classical prex operator, based on 64-bit Sklansky adders implemented in a 40-nm CMOS technology. The effects of the forbidden state were found to be well acceptable. The implementation of the adder based on the proposed prex operator reduces the area by 29% while increasing the power by 13% compared to one based on the classical operator.

Introduction

The continuous reduction of feature sizes in nanometer scale technologies makes high-yield manufacturing at these technology nodes a growing challenge. This is because classical minimum-space design rules can not guarantee the effective usage of Resolution Enhancement Techniques (RETs) that are necessary to tackle the subwavelength lithography problems (Jhaveri et al., 2007). As a Correspondence to: I. Rust (rust@eecs.rwth-aachen.de)

result, the yield is compromised and the circuit performance is reduced due to variability issues (Tong et al., 2006). A straight-forward way to solve this problem is to use Design-For-Manufacturing (DFM or Recommended) rules, which allow to keep the classical approach to layout design, but usually cause a high area penalty. As an alternative approach, it is possible to restrict allowed patterns for layout generation to a small subset highly optimized for lithography (Restricted Patterning) and, on the other hand, keep or even relax the corresponding layout design rules (Restricted Design Rules with SRAM design rules as a well-known example (Jhaveri et al., 2007; Liebmann et al., 2009)). Using this design style, area savings of 1525% compared to standard cells using DFM rules are expected (Goering, 2007). Figure 1 shows a layout comparison between minimumspacing (MS) rules and DFM rules for an operator cell used in carry prex adders, the area saving is as high as 35%. Note: Layout pictures do not comply to real design rules to protect proprietary information. In 2005, Kheterpal et al. (Kheterpal et al., 2005) proposed the Brick-Based Design approach, where a small set of logic primitives is mapped on cells called bricks, which are implemented using the Restricted Patterning approach. To ensure high yield manufacturability, the brick layout must comply to the following rules (Jhaveri et al., 2007, 2009; Tong et al., 2006; Liebmann et al., 2009; Kheterpal et al., 2005; Taylor and Pileggi, 2007): unidirectional use of poly-, diffusion and metal layers, no jogs in poly layer, xed pitch on poly/metal, high periodicity of layout structures. Fast prex adders based on the operator proposed by Brent&Kung (Brent and Kung, 1982), layout shown in Fig. 1, are frequently used in a wide variety of applications due to their

Published by Copernicus Publications on behalf of the URSI Landesausschuss in der Bundesrepublik Deutschland e.V.

al comn power e effect parison Sklangy. The acceptoposed sing the al oper-

nals for every weight i based on the corresponding input bits are expected (Richard Goering). ai and bi . In the prex tree these signals are used to comFig. 1 shows a layout comparison between minimumpute carry bits ci in every weight i. These carrys are fed to spacing (MS) rules and DFM rules for an operator cell used the post-processing block which uses them together with the in carry prex adders, the area saving is as high as 35%. 290 I. Rust and T. G. Noll: A modied prex operator well suited for area-efcient brick-based adder i implementations propagate signals to generate the sum bits s .
DFM rules minimum spacing rules

ai preprocessing prefix tree

stage 1

H.Sche Arm an p- and ing the produc As a re output withou tances delay a The we

ci+1 =
A: 35%

with

er scale e techpostprocessing s i+1


Fig. 2. General building blocks of prex adders. adders Fig. 1. B&K prex operator: MS vs. DFM rules. rules

g i = ai

les can nhancehe sub2007). perforg et al.,

describ depend the inc eralized

Gi:k =

high speed and high regularity compared to carry-lookahead adders. All prex adders consist of a pre-processing block, prex tree and post-processing block (see Fig. 2). Different variants differ only in the topology of their prex tree (Patil et al., 2007). The pre-processing block computes generate (g i ), transmit (t i ) and propagate (pi ) signals for every weight i based on the corresponding input bits a i and bi . In the prex tree these signals are used to compute carry bits ci in every weight i . These carrys are fed to the post-processing block which uses them together with the propagate signals to generate the sum bits s i . In Kheterpal et al. (2005), a Kogge-Stone adder was realized with a design-specic brick set and compared to an implementation based on a commercial 90-nm standard cell library. Results show that by using 8 different bricks, delay is only increased by 6% and area only increased by 15% compared to the standard-cell-based implementation without layout pattern restrictions despite the much more limited layout pattern set allowed in a brick based approach. In this paper, a modied prex operator, inherently, well suited for implementation in a brick-style fashion is proposed. Based on the properties of the brick-realization of the operator, a complete brick set for the exible realization of arbitrary prex adders is developed and implemented. The paper is organized as follows: in Sect. 2 the mirror gate used for the implementation of the proposed operator is described and the motivation for its usage for brick-based adder implementations is presented. Section 3 presents the derivation of the modied prex operator. A complete brick set for the implementation of arbitrary adders based on the proposed operator is developed in Sect. 4. In Sect. 5 a quantitative comparison between the proposed operator and the one from Brent&Kung is presented, based on 64-bit Sklansky adder implementations. The effect of the forbidden state Adv. Radio Sci., 9, 289295, 2011

of operator at the presence of variability is evalIn the (V. proposed Kheterpal et al., 2005), a Kogge-Stone adder was reuated. 6 concludes this work. alized Section with a design-specic brick set and compared to an implementation based on a commercial 90-nm standard cell library. Results show that by using 8 different bricks, de2 Gate for brick implementation lay is only increased by 6% and area only increased by 15% The implementation of carry-generating gates in carry-lookahead adders using Branch-Based Logic (BBL) is a wellknown and frequently used approach (Neve et al., 2002; Masgonty et al., 1991). In BBL gates, the topologies for p- and n-channel networks are created separately by forming the gates pull-up and pull-down equations in a sum-of-product fashion (Masgonty et al., 1991). As a result, every connection from supply to output and from output to ground forms a simple, straight transistor stack without internal branches, so intermediate parasitic capacitances are reduced to a minimum. This leads to a smaller delay and possibly to a smaller power dissipation. The well known formula ci +1 = g i + t i ci with g i = a i bi , t i = a i + bi (2) (1)

where group c j , resp equatio

ci+1 =

describes the computation of the carry in bit position i + 1 depending on the local generate and transmit condition and the incoming carry in bit position i . Equation (1) can be generalized to the form Gi :k = Gi :j + T i :j Gj 1:k , (3)

where Gi :j and T i :j denote the group carry generate and group carry propagate signals for weight i down to weight j , respectively (Zimmermann, 1997). The inverted form of Eq. (1) ci +1 = g i + t i ci (4) www.adv-radio-sci.net/9/289/2011/

1111 1111 0000 29% A comparison using a 90-nm technology shows that theGnd pair 0 11 1 00 0000 00 11 0 00 11 00 11 ferent versions of complex aoi and oai gates. Fig. 3 c) shows Gnd Gnd Gnd 1 0 1 00 11 00 11 0 1 00 11 00 11 wig and G. Hellner, 2002), Fig. 3). The fact represented in 0 1 00 11 00 11 0 1 00 11 00 11 delay of the mirror gate is approximately 20% smaller comdelay-enhanced versions which trade an additional transistor I. Rust, T.G. Noll: A Modied Prex Operator Well Suited Brick-Based Adder Implementations equation 5 motivates the usage of transmit conditions ti (A. for Area-Efcient j1:k i:j i:j i:j j1:k i:j pared to aTcascade of 6-transistor aoi and gates, dependfor the absence of additional branches in the nor p-channel G G T oai G G Weinberger and J.L. Smith, 1958) instead of the more coming on the fanout (post layout simulation/slow conditions). topologies, which is inherent in BBL mirror gate version. i the prex Rust and T. G. Noll: A modied well suited for area-efcient brick-based implementations 291 mon is, propagate conditions p (R. Brent and Kung, 1982) The rst three properties can be adder proved in analogy to , so d H. Kung, I. 1982) for any Boolean variables g ,operator t, g H. and Fig. 4. Mask layout diagrams i i i i where (g = 1,p = a b = 0) is possible, preventing the proves are omitted. The last property can be proved by ened these I. as Rust, T.G. A Modied Prex Operator Well Suited for Area-Efcient Brick-Based Adder Implementations usageNoll: of a mirror gate. 00 11 contradiction as follows j1:k i:j i:j i:j 00= poly 11 G G G G 00 11 the ) = (gMoreover, 00 11 t) ( g, t +tg ,t t ). dual expression (7)

earlier, it is always possible to use the11 mirror 0 1 0 aoi and0oai gates. Fig. 3 c) shows As mentioned 0000 1111 0000 1111 29% 0 00 11 00 11 0 00 11 00 ferent 0 versions0of complex Gnd 1 Gnd Gnd 1 Gnd 0 1 00 00 11 0 1 00 11 00 11 0 00 1 0 0 0 1 0 0 0 1 k+1 gate k+1 k11 in the rst stage of the prex graph directly after the 0 1 00 11 00 11 0 1 00 11 00 11 g + t t = 0 , delay-enhanced versions which trade an additional transistor Fig. 3. Gate schematic comparison: (a) mirror gate, (b) 6-transistor Fig. 3. Gate schematic comparison: 4 Derivation of brick set 0 00 1 0the1absence 1 1 of additional 0 1 0 1 in the n- or p-channel j1:k i:j i:j i:j j1:k i:j preprocessing Fig. 2), because the forbidden inputGstate k kG k (seeG branches complex gates, (c) 7-transistor complex gates. T G a)for mirror gate b) 6-transistor complex gates gk = a b present = 1,tT =the ak signal + bk = 0 qed. 0 10 1 1 1 0 0 g ,t , respectively is not in pairs g ,t and i i i i 0 0 isgates 1 0 BBL mirror gate version. which inherent in the c)topologies, 7-transistor complex Now a brick set for the realization of arbitrary adders 0 11 1 0 0 0 1 0 1 0 1 (eq. and the inputs of all prex operators prex in this stage are Fig. 4. layout diagrams. Fig.5) 4.Mask Mask layout diagrams is derived based on the presented modied prex operator. connected only to those signal pairs (wired to the operators 0 1 1 can 0 1 0 1 using gate (aoi). Be1 be implemented 1 1 1 an and-or-invert 1 4 It Derivation of brick set i:i the operator is evident that the implementation of according to the identities g i = G and ti = T i:i ). cell(s) is j1:k i:j i:j i:j 0 1 cause 1a 1 conditions 1 ofshows the fact that the Gcomparison G Gimplementation G Fig. 14 of the layout of The -operator introduced by mirror Brent and Kung (R. Brent supply nodes adjacent gates, hence, reducing crucial for the adderbetween efciency, which strongly suggests the 1 the 0 0 i:j 1 0j1:k i:j i:j ble 1. 1 Application of -operator to output of preprocessing j1:k gate andt ithe six-transistor variant of the aoi gate. i Ta T a brick set for the realization of arbitrary prex adders G0 G Now g i =mirror a i bi = 1G and = + bi = (5) the area per gate signicantly ( 29% in 40-nm). The resultlayout of the remaining cells to the layout of an op1 1 0 1 i:j 1 i:k 1 i:k i:k aoi adaption G G j1:k G j1:k is derived based on the presented modied prex operator. T ing cell rows are not only very dense, but also highly regular G G 3 Modied Prex Operator 1 1 can never 1 1 simultaneously 1 i:j 1 at the outputs erator cell which is implemented as efciently as possible. happen of the adders i:j G G that the implementation of the operator cell(s) is varii:j i:j It is evident without any poly jog. CMOS In contrast, the seven-transistor Tdelay T modern deep-sub-micron technologies for which a preprocessing block (neglecting effects for now),In the w marked in gray, because a carry would be generated at The main disadvantage of the described mirror gate is the crucial for the adder efciency, which strongly suggests the ants of aoi and oai can also share supply contacts over cell 1. Application of -operator to output of preprocessing n-channel topology of the aoi gate can also be used in the pbrick-based design style is advantageous, thestate. ratio If ofthe wiring ight i, but none would be transmitted from weight i 1. existence of a so-called forbidden input forbidlayout adaption of the remaining cells to the layout of an opboundaries, but not without poly jogs. j1:k i:j :i i:i1 Ti:j mirror gate which channel part, symmetric also is capacitance to gate capacitance is ever increasing (VeenG 1 andaT T king a closer look at Giyielding , it is obvious that i:j den combination applied, a short ows from i:j A input comparison using is a 90-nm technology shows that the erator cell which is implemented as efciently as current possible. G et al., 2002, Fig. G3). The fact a BBL-style gate (Neve reprei 1 i:j j1:k j1:k drick, 2000). To accommodate this fact, the bit slice width is irrelevant for the process of carry generation due supply to ground. Taking a closer look at the mirror gate T G G pair delay of the mirror gate is approximately 20% smaller i:k i:k i:k In modern deep-sub-micron CMOS technologies for which a arked in gray, a carry would be generated sented in Eq. (5) motivates usage of transmit G because G the G at conditions oai be kept in atto aaminimum. Therefore, and due to thefor shown Fig. 3 a), it of becomes clear aoi that this happens the fact that the carry isi:j already generated because of j1:k should i:j j1:k i:j i (Weinberger compared cascade 6-transistor and oai gates, deG T G T G brick-based design style t and Smith, 1958) instead of the more comt i , but none would be transmitted from weight i 1 . i:j i:j is advantageous, the ratio of wiring i1 = 0 ,G = 1 in the version corresponding to an aoi gate T shareable supply contacts of the mirror gate depicted in Fig. = 1 (R.mon Brent and H. Kung, 1982). i pending on the fanout (post layout simulation/slow condii:j i:j that 1982) i:j propagate conditions p (Brent and G Kung, where to for gate capacitance is ever increasing (Veen- to an g a closer look at Gi:i1 and T i:i1 , it is j = 1,Gi:j = 0 G obvious G capacitance T i: the version and 4,of the operator cell contains twoin mirror gatescorresponding on top of each exploit this fact, a imodied prex operator, denoted the as i = 1,p i bi = tions). (g = a 0 ) is possible, preventing usage drick, 2000). accommodate this fact, the bit slice width is irrelevant a) for the process of carry generation oaiTo gate. b) c) due other as shown in Fig. 5. To show the effect of supply contact , is introduced andgate. dened as follows a mirror should be kept at a minimum. and due to the the fact that the carry is already generated because of As mentioned it Therefore, is always possible to use mirror sharing, two operator earlier, cells are cascaded horizontally, which Moreover, the dual expression gate in the rst stage of the prex graph directly after the shareable supply contacts of the mirror gate depicted in Fig. = 1 (R. Brent and H. Kung, 1982). ) = (gFig. ). comparison: 3 Modied prex operator t) ( g, t + t 3. g Gate ,g +schematic tt (8) is indicated by the dashed boxes in the the operator examples. preprocessing (see Fig. 2), because forbidden input state a) mirror gate b) 6-transistor complex gates 4, the operator cell contains two mirror gates on top of each ploit this fact, operator, denoted as 1 a modied ci + = g i (t i +complex ciprex ) (6) Implemented in a state-of-the-art 40-nm technology, the rec) 7-transistor gates g , respectively is not present the signal pairs gi ,t i and i ,ti as shown in Fig. 5. in To show the effect of supply contact The main disadvantage of the described mirror gate is the s introduced and dened as follows mparing the truth table of the modied operator with other sulting cell pitch of the proposed version is only 0.48m at a (eq. 5) and the inputs of all prex operators in this are can be implemented using the mirror gate by interchanging existence ofcells a so-called forbidden input state. Ifstage the forRust and T. G. Noll: A modied prex operator suited for area-efcient brick-based adder implementations sharing, two operator are cascaded horizontally, which truth I. table of the -operator in Tab. 1, the only dif- well cell height of 1.33m, using minimum-sized transistors. connected only to those signal pairs (wired to the operators i i i i ) = (g + bidden input combination isthe applied, a short current ows i ( g, t twith g ,g t + t ):.i1 ginand (8) of the , tt with the absence is input indicated by the dashed boxes in operator examples. i:i ence is theg value of T theutilizing marked row, which Fig. 4 shows a comparison of the layout implementation of according to to theground. identities g i = Ga and ti = T i:iat ). i i from supply Taking closer look = 0,t = and 1). Normally it forbidden would be necessary to gate. imstate (g Implemented in a state-of-the-art 40-nm technology, thethe re-mirror one for the modied operator. So the state the mirror gate the six-transistor variant of the aoi The -operator introduced by Brent and Kung (R. Brent 1. Application of -operator to output of preprocessing. 3a, it becomes clear that this happens gate shown in Fig. aring the truth table of the modied operator with plement Eq. (6) using an or-and-invert gate (oai). i:i1 i:i1 sulting cell pitch of the proposed version is only 0.48m at a for = 1 ,T = 0) is replaced by the allowed state i : j i : j I. Rust and T. G. 3 Noll: Athe modied prex operator suited for area-efcient brick-based adder implementations T = 0,G = 1 in the version corresponding to an aoi gate uth table of in corresponding Tab. 1, the only dif- well Figure shows transistor schematics, i:i1 i 1 -operator i:the 1.33m, using minimum-sized transistors. = 1, T = 1)i:which regarding carry cell height of i :j = 1,Gi :j = 0 in the version corresponding to an i 1 1 is equivalent i i i 1 i i : i 1 i : i 1 and for T depicting two alternative implementations for an oai and an e is g the value T t in the marked row, t fact gofthat G pair Ti:i 1 iwhich 1 neration. The the signal (G , T :imirror ) can gate with oai gate. aoi gate, respectively. Figure 3a shows the for the modied operator. So the forbidden state Application of -operator to output of preprocessing. w the (0 , 0), (0, 1) (1 ,0 1) (sharing this 0 take 0 0 0 and 1 only i:i0 1 values As mentioned earlier, it is always possible to use the mirror signal names corresponding to the use as a replacement for = 1,T = 0) is replaced by the allowed state perty with ( g ,t ) ) means that input and output pairs of 0 0 0 1 0 0 i i gate in the rst stage of the prex graph directly after the 3b and Fig. 3c an aoi or an oai gate, respectively. Figure 1 i:i1 = 1) which is equivalent regarding carry = 1, T i 1 i 1 i : i 1 i : i 1 0 t i do 0depict 1 1 0 0 b. g 1i both not contain the forbidden state. This is a prepreprocessing (see Fig. 2), because the forbidden input state different versions of complex aoi and oai gates. Figg that the t signal G pair (G Ti:i1 , T i:i1 ) can ation. 0The 1 fact 0shows gate 0 at any 0 position 0 inside uisite to use the mirror a prex ure 3c delay-enhanced versions which trade an addiis not present in the signal pairs g i ,t i and g i ,t i , respectively nly values (00 , 0)1 , (0 , 1) (1,0 1) (sharing this 0 take 0 the 0 0 0 and 0 1 0 1 tional transistor for the absence of additional branches in the e. (Eq. 5) and the inputs of all prex operators in this stage are ty0with ,ti ))p-channel means input output pairs in ofthe BBL mir0 1 that 0 1 and 0 i 0 0 (g1 1 1topologies, 1 inherent nor which is connected only to those signal pairs (wired to the operators e modied operator, , to be used as a replacement for 0 1 0 not 1gate 1 0 0 1 state. 0 This both do contain the forbidden 1ror 0 version. 0 is a preaccording to the identities g i = Gi :i and t i = T i :i ). exhibits the following properties 1 the 0 0 0 0 ite0to 1 use mirror gate at any position inside a prex 4 shows a comparison of the layout implementaFigure 1 0 1 1 1 -operator A: 29% by Brent and Kung (Brent The introduced 0 11 1 1 gate 0 and 1 1 i 0 of variant of the aoi , 1tion 1 the six-transistor and Kung, 1982) is, for any Boolean variables g , t , g and t ci+1 =G i 1the mirror 0 1operator, 1 The 1 to be 1 of 1 gate. nodes the gate arefor located directly Fig. :: comparison of layouts (MS (MS rules) rules). odied supply , used as a mirror replacement Fig.5. 5. vs. vs. comparison dened as 1 must 1 following 0 associative 0 1in a symmetrical 0 at the cell properties boundary fashion, which is not the hibits be 1 1 the0case for 1 the aoi gate. 1 ) = (g +A: ). ( g, 29% This 1 makes it possible to share the (g,t) t t g,t t (7) i+1 i 1 1 1 1 1 1 i the 5 shows the fact = G must be idempotent 4Fig. Derivation ofcomparison brick set of implemented using mirthat carry is already generated because of Fig. 5. vs. : of the layouts (MS rules) rules). aoi gate for Gi Fig. 5. gates vs. : comparison comparison (MS 1 = 1 (?). www.adv-radio-sci.net/9/289/2011/ ror and using seven-transistor Adv. Radio Sci., 9, 289295, 2011

i:j i:j i:k j1:k i:j i:j i:j j1:k i:j 00 11 i:j j1:k i:k Gproperties :k T T, g iT 00 11 three can beG proved in analogy to , . Kung, 1982) is, for any variables g, t and Gj1:k The rstG G Boolean G G T G so G = 1, T =0 00= metal 11 i +1 i i i 00 11 i:k i:k i:k i:k (t (6) = g on i:j+ c ) Gc G pairs j1:k G j1:k these ned as -operator proves The property can be proved by 00 aoi :j are i:j j Prex k Operator Tthe signal ng the G i3 omitted. G generated in the G Modied G + T G 1: = last 1 11 i:j i:j G Gthe inputs contradiction as follows i:j i:j -processing block creates the forbidden state at = diffusion can be implemented using the T mirror gate by interchanging T i:j + T i:j T j 1:k = 0 ) = (g + t ( g ,following t g ,t t). t with (7) of the input G The main disadvantage of the described mirror gate is the aoi mirror gate he operator (Tab. 1). utilizing the absence g i and g i i with ti ,cells i : k i : k = =0 = contact j 1 1: j 1:k G ,k T i : i 1 i : i 1 existence of a so-called forbidden input state. If the forbid G = 1 , T = 0 state (( gG ,t =1 1) . Normally it appears would be at necessary to imhe forbidden state ,T = 0) the i= j1:k i:j i:j i=0 = supply G signal pairs generated T i:j 1 1:k 11 0000000 1111111 the -operator on the i:j in the T i:j + jcombination 0 00 11 00 0 1 00 11 00 11 den ows from i:j G T G = 1 is applied, a short current 0000000 1111111 plement equation 6G using an or-and-invert gate (oai). 0 1 00 11 00 11 0 1 00 11 00 11 G j k+ 2 : input 0000000 1111111 0 1 00 11 00 11 0 1 00 11 00 11 i:j j1:k j1:k 0000 1111 0000 1111 ocessingiblock creates the forbidden state at the inputs 0000000 1111111 supply to ground. Taking a closer look at the mirror gate T G G 0 1 00 11 00 11 0 1 00 11 00 11 i shows 1 1 corresponding i:i1 i1 i:k i:j k+1 1:k k 1111 Fig. g i 3 the schematics, dei:k i:j k j 0000 0000 1111 0000000 1111111 +1 g ti:k ti G T i:transistor 0 1 00 11 00 11 0 1 00 11 00 11 G + T T = 0, it becomes clear that this G G G 0000 1111 0000 1111 0000000 1111111 g + t g = 1 0 1 00 11 00 11 0 1 00 11 00 11 shown in Fig. 3 a), happens for following operator cells (Tab. 1). 0000 1111 0000 1111 i:j i:j j1:k i:j j1:k 0000000 1111111 oai picting two G alternative implementations for an oai aoi 0 00 11 00 11 0 00 11 00 11 G G and an Vdd 1 Vdd Vdd 0000 0000 1111 1: k i:j k1 j k 1111 i:1: jk 0 1 00 11 00 11 0 1 00 11 00 11 j k 0 0 0 0 iT 0 T +1 +1 i :i1 :i10 0000 1111 0000 1111 G = 1 , T = 0 i:k = 0 ,G = 1 in the version corresponding to an aoi gate T 0 1 00 11 00 11 0 1 00 11 00 11 forbidden state (Grespectively. = 1,T Fig. 3= appears at the gate, a) 0) shows the mirror gatei:jwith signal g +t 1 t = 0 , 0000 1111 0000 1111 G 0 00 11 00 11 0 1 00 11 00 11 i:j i:j 0000 0000 1111 i:11 j = 11111 i:j = 0 in the version corresponding i:k 0 0 1 0 the G 0 a replacement 0 1 00 00 11 0 1 00 11 00 11 G G T ,G to an and for 0000 1111 0000 1111 names0corresponding to use as for an aoi k: k 1 k 11 k k k 0 00 00 11 0 1 00 11 00 11 G j k+ 2 0000 1111 0000 1111 g = a b = 1 ,t = a + b = 0 qed. 0 1 00 11 00 11 0 1 00 11 00 11 0000 1111 0000 1111 1 gate, 0 Fig. 0 oai gate. 0 1 00 11 00 11 0 1 00 11 00 11 i 0 i 0 i 1oai i1 1 i:i1 i :i 1 Area: or an respectively. 3 b) and Fig. 3 c) depict dif0000 1111 0000 1111 (g a) (c) 0 00 00 11 0 1 00 11 00 g t t G (b) T 0000 1111 0000 g k+1 + tk+1 1 g k11 = 1, 0 1 00 11 00 11 0 1111 1 00 11 11 00 11

i must iassociative be G =1 , T = 0 must never occur in combination a nand-gate producing T i . prex Layouts are Now a brick set with for the realization of arbitrary adders exploit this fact, a modied prex operator, denoted as is derived on the presented modied prex operator. s introduced and dened as follows 5 showsbased the must be idempotent 4Fig. Derivation ofcomparison brick set of implemented using miract that the carry is already generated because of i

= 1, T =0 sharing, two operator cells are cascaded horizontally, which G ) = ).of two operator contains mirror gates on Operator top of each (g,t ( gthe ,t ( g4 + t cell g ,g + t t (8) s is indicated byArea-Efcient the boxes inin the examples. his is) a4, preDerivation brick set ( g,t ) ( g , t ) = ( g + t g ,g + t t ) . (8) is indicated by dashed the dashed boxes theoperator operator examples. 4 I. Rust, T.G. Noll: A Modied Prex Well Suited for Brick-Based Adder Implementations i:j j j 1:k other i:shown G + T G = 1 as in Fig. 5. To show the effect of supply contact Implemented inAdder a in state-of-the-art 40-nm thererede Noll: a prex Implemented a state-of-the-art 40-nmtechnology, technology, the G. A Modied Prex Operator Well Suited for Area-Efcient Brick-Based Implementations i:j iComparing :j 1: Comparing truth table of modied operator withwhich the truth table of the modied operator with sharing, the joperator ak brick setthe for the realization of arbitrary prex adders two are cascaded horizontally, sulting cell pitch of the proposed version is ataa G +T Now T = 0 cells sulting cell pitch of the proposed version isonly only 0.48m 0.48m at 292 I. Rust and T. G. Noll: A 1, modied prex operator well The suited for three area-efcient brick-based adderimplementations implementations properties can beadder proved in analogy to , so and H. Kung, 1982) is, for any Boolean g ,t , well g and I. Rust and T. G.dashed Noll: A modied prex operator suited for rst area-efcient brick-based truth table of the -operator invariables Tab. 1, the only difthe truth table of the in Tab. the only dif) is derived based on the presented modied prex operator. j 1: k the 1: k-operator indicated by the boxes in the operator examples. cell height of 1.33m, using minimum-sizedtransistors. transistors. cell height of 1.33m, using minimum-sized cement for is j G = 1 , T = 0 i : i 1 The rst three properties can be proved in analogy to , so is, for any Boolean variables g , t , g and i : i 1 t, dened asisin these ference isof of T in the marked row, which ference is the value T value in the marked row, It evident that the implementation ofwhich the operator is proves are omitted. The last property can be proved by Implemented athe state-of-the-art 40-nm technology, the re- cell(s) these proves are omitted. The last property be proved by is one for the modied operator. So the forbidden state k + 2 : contradiction as can follows h is one for Table the modied operator. So the forbidden state at to output A: 29% suggests 1. Application of topreprocessing. output of preprocessing. crucial for efciency, which strongly the sulting cell pitch of the version is only 0.48m a ocessing i :i 1-operator i:the iproposed 1 adder ble 1. Application of of )-operator ( G = 1 ,T = 0) is replaced by the allowed state ( g,t ) ( g , t ) = ( g + t g ,t t . (7) contradiction as follows k +1 k +1 k i : i 1 i : i 1 r well suited for area-efcient brick-based adder implementations (G =+ 1t ,T i = replaced by the allowed state g g 11.33m, =0) 1,is layout adaption remaining cells to the layout cell height of using transistors. : i 1 of minimum-sized i:k = 1, T i:k = 0 (G = 1 , vs. Tii:i =1 1) the which is equivalent regarding carry of an op- G g ,t it ) . (7) i: iFig. :i 1k i1)vs. i i 1 of i :regarding i 1 ik :i carry 1 5. : comparison layouts (MS rules). Fig. 5. : comparison (MS rules) h k1 +1 k +1 (G = 1 , T = which is equivalent i : i : k g t g t G T :i1 i:i 1 as possible. which is as iefciently g Using t = 0, i erator i 1 cell 1 on i :implemented i 1signal :pair i 1 ( G =, T 1, T = 0 The fact that the G can the -operator the signal generated in) the i:j + T i:j G j 1:k = 1 g i+ t tgeneration. g t i G Ti:ipairs i 1 i:i1 G e generation. The fact that the signal pair ( G , T ) can In modern deep-sub-micron CMOS technologies for k k k k k k enerated at now only take the values (0 , 0) , (0 , 1) and (1 , 1) (sharing this i:j i:j inputs j 1:kwhich a 0 0 0 = 0 0 state 0 or on the signal pairs generated inthe the pre-processing block creates forbidden at the g = a b = 1 ,t = a + b 0 qed. G +T G =1 i:j i:j j 1:k = 0 e now only take the values (0 ,i 0) ,0 (0 , 1) and (1 ,0 1) (sharing this 0 0 00with 0 0 property (0 g ,t ) ) inputs means that input and 0 output pairs of of wiring G + T T brick-based design style is advantageous, the ratio 0 1 eight iof 1 . forbidden i creates the state at the the following operator cells (Tab. 1). i: j i:j j using 1:k Fig. 5 shows the comparison of implemented mir 4 of brick G T of 0 G ecause of y property ( g )0 ) means that and output pairs j 1:k = 1, T j 1:k = 0 0 not 1 1 set 0 0+ Tab. 1Derivation both do contain the forbidden state. This is T a pre- = 0with 0 0 1 to 0 capacitance 0 i ,t i i :i 1input i:i 1 capacitance gate is appears ever increasing (Veenbvious that i ator cells (Tab. 1). The forbidden state (G =01,T = 0) at the ror gates and using the seven-transistor aoi gate for G j 1: k j 1: k 0 1 0 0 0 requisite to use mirror gate at any position inside a prex n 0 0 1 1 0 0 Tab. 1 both do not contain the forbidden state. This is a prei:i1 :drick, i 1 this G fact,= 1, T =width 0 k+2 : 2000). To accommodate the bit slice eration due 4 Derivation of brick set (G = 1,T i = 0) appears at the j in combination a nand-gate producing T i . prex Layouts are 0 brick 1 0with 1 0 1arbitrary tree. Now set for the realization of adders s 0 1 0a 0 0 01 enoted as requisite to use the any position inside prex i mirror i be gate i1 at ia 1minimum. i:i i:a i 1 should kept at Therefore, and due to the because of j k 2: gmodied g t , G T+ 0t 1 1 on 1 to 1 as 1 The operator, be used a replacement for g k+1 + tk+1 g k = 1, is derived based the presented modied prex operator. 0 1 set 1 ifor 0 1 0 1 f tree. i1 a brick i : i 1 i : i 1 Now the realization of arbitrary prex adders shareable of 1 the mirror gate depicted in Fig. k+1 t G 10 T the 1supply 0 contacts 0 0 exhibits following properties 0 that 0 0 1 g k+1 00 + tfor g k = 1, cell(s) k+1 k 0based 1 1 1 1 Itthe is evident the implementation of the operator + tk+1 The modied operator, 1 , to0cell be used as prex a replacement s derived on the presented modied operator. 4, operator contains two mirror gates on top of each g 1 1 1 1 denoted as t = 0, A: 29% (8) 0 0 0 0 k +1 k +1 k 0 0 0 1 0 0 1 that 1 is 01 0 the 1 0 g cell(s) + tis t =0 , k k k k k k i+1 iproperties x crucial for adder efciency, which strongly suggests evident exhibits the following 1 1 15. 1 1 t0 is the implementation of the operator other as shown in Fig. To show the effect of supply contact c = G i g = a b = 1 ,t = a + b = 0 qed. 1 0 0 0 0 0 11 1 01 0 k k 1 the1adder 1 the 0qed. A: 29%(MS k cells k 5. Fig. vs. :comparison comparison of of layouts layouts (MSrules) rules). Fig. 5. vs. r1 with the the layout adaption of remaining of an crucial for efciency, which strongly suggests g = a the to b the = 1layout ,tk = which a + bk = sharing, two operator cells are cascaded horizontally, 1 0 0 0 1 0 0 0 0 i+1 i 1 1 1 1 1 1 remaining must bewhich associative c dif= G operator i the r only cell is implemented as efciently as possible. ayout adaption of cells to the0 layout in of an op(8) is boxes operator examples. 0 0 0 0 indicated 1 0 0by the 1dashed 1the Fig. :: comparison of layouts (MS rules) rules). Fig.5. 5. 4 vs. vs. comparison (MS Derivation of brick set w, erator cell which is implemented as efciently as possible. In modern deep-sub-micron CMOS technologies for Implemented in a state-of-the-art 40-nm technology, the rei :k i: k 0 which 1 0 1 Fig. 5 shows the comparison of implemented using mir 0 1 1 1 1 1 must be idempotent G = 1 , T = 0 must bethe associative -operator CMOS A: 29% which Using on the signal pairs generated in brick the set 4 Derivation of erator with den state n modern deep-sub-micron technologies for a which a brick-based design style is advantageous, the ratio sulting cell pitch of the proposed version is only 0.48m at a ror gates and using the seven-transistor aoi gate for Gi 1 1 pre-processing 1 1 0 creates 1 1 block 0 the forbidden 1 0 at the inputs i :j j 1:k a i :j for state G + T G = 1 x operator well suited for area-efcient brick-based adder implementations Now brick set the realization of arbitrary prex i i i e0 only dif brick-based design style advantageous, the ratio of1wiring combination with a nand-gate T . Layouts are adders G = T = 0 must occur wed state of wiring capacitance to gate capacitance is transistors. ever increasing cell height of 1.33m, using 05. 1 0 1 11,is 0 1never 1 minimum-sized Fig. 5in shows the comparison producing implemented using operator. mirFig. vs. :: comparison of layouts (MS rules). i :j i :j j 1:prex k of adders Fig. 5. vs. comparison (MS rules) must be idempotent of the following operator cells (Table 1). 4 Derivation of brick set the fact that the carry is already generated because of Now a brick set for the realization of arbitrary is derived based on the presented modied prex G + T T = 0 ow, carry which to ( i capacitance gate capacitance is ever (Veending 1 1 1 i :1 i 1this ? ). To fact, the bit slice width should be 1 1 accommodate 1i :i 1 :i0 1 = 11 ror gates and the seven-transistor aoi gate for G = 1increasing ,T =0 )1 appears at the jmodied 1: k using j 1:k (?). The forbidden state (G is derived based on the presented prex operator. It is evident that the implementation of the operator cell(s) is i : i 1 idden state G = 1 , T = 0 i 1 row ikept i drick, 2000). To this Therefore, fact, the bit slice 1 ) 1accommodate 1 can at a and due to the shareable sup- set with cessing. marked inminimum. gray, because a carry would bewidth generated at in combination a nand-gate producing Tstrongly . prex Layouts are the G =1 ,T = 0 must never occur Now a brick for the realization of arbitrary adders It is evident that the implementation of the operator cell(s) is To exploit this fact, a modied prex operator, denoted as crucial for the adder efciency, which suggests owed state Table 1. Application of of would -operator to output of preprocessing j k +2 : hould be kept at ia minimum. Therefore, and due to the aring this ply contacts the mirror depicted in Fig. 4,1. the operator weight , but none be transmitted from weight i mir Fig. 5 shows the comparison of gate implemented using 4 Derivation of brick set is derived based on theof presented modied crucial for the adder efciency, which strongly suggests thecellsprex , is introduced and dened as follows layout adaption the remaining to theoperator. layout of an op -operator to output of preprocessing rding carry i : i 1 i : i 1 i hareable supply contacts of the mirror gate depicted in Fig. Taking a contains closer look at G and T on ,top it is obvious that g k +1 + t k +1 g k = 1, t ipairs ofgates cell two mirror gates of each other as shown ror and using the seven-transistor aoi gate for G remaining :i1 It is evident that the implementation of the operator cell(s) layout adaption of the cells to the layout of an op erator cell i :i 1 iscontains T ) igeneration k+ 1 which k +1 kis implemented as efciently as possible. 4, the operator cell mirror gates on top of each T (g irrelevant for the the process of carry due combination ,t) (Now g, t ) + tFig. g,g with + tTo two t ). (8) adders This iscan a= in a nand-gate producing T . Layouts are in ?? . the show effect of supply contact sharing, two g + t t = 0 , a brick set for realization of arbitrary prex erator cell which is implemented efciently as possible. is foras the adder efciency, whichtechnologies strongly suggests In modern deep-sub-micron CMOS for which a row marked in 5. gray, because a carry would be generated at crucial sharing this to the fact that the carry is already generated because other as shown in Fig. To show the effect of supply contact k k k n inside athe operator cells are cascaded horizontally, whichof is indicated gk = a b =the 1,tstyle = a kis +advantageous, bk = 0 qed. is derived based on the presented modied prex operator. i : i 1 In modern deep-sub-micron CMOS technologies for which a mparing truth table of the modied operator with the the layout adaption of remaining cells to the layout of because a carry would be generated at brick-based design the ratio ofan wiring weight i , but none would be transmitted from weight i 1 . put pairstwo ofGoperator = 1 (Brent andcascaded Kung, 1982). haring, cells are horizontally, which by the dashed boxes the operator examples. Implemented i:i 1in i : i 1 It is evident that the implementation of the operator cell(s) brick-based design style is advantageous, the ratio of wiring th table of the -operator in Table ?? , the only difoperator cell which is implemented as efciently as possible. would be transmitted from weight i 1 . capacitance to gate capacitance is ever increasing (VeenTo exploit this fact, prex operator, denoted as a the closer look at Ga modied andoperator T , it is obvious that his is aTaking pre- by s indicated dashed boxes in the examples. placement in state-of-the-art 40-nm technology, theto resulting cellmodern pitch 2000). i:i i:a i 1 adder i :i obvious 1 efciency, i1 :the i 1 value is crucial for the which strongly suggests capacitance gate capacitance is deep-sub-micron ever To increasing (Veen , is introduced and dened as follows at G and T , it is that ence is of T in the marked row, which drick, accommodate this fact, the bit slice In CMOS technologies forwidth T is irrelevant for the process of carry generation due ide a prex mplemented inof a state-of-the-art 40-nm technology, the re4 Derivation of brick set the proposed version is only 0.48 m at a cell height of the layout adaption ofcarry the remaining cells to the layout an drick, 2000). To of accommodate this fact, the bit slice width or the process of carry generation due should be kept at a minimum. Therefore, and due to the one for modied operator. So the forbidden state to the fact that the is already generated because of which a brick-based design style is advantageous, the ratio ulting cell pitch of proposed 0.48m at a (8) ) = ).is onlytransistors. (g,t) g, the t (g + t minimum-sized g,g version + tadder t 1.33 m, using i :i 1 operator 1( i :already i 1i :icell suited for area-efcient brick-based implementations which is implemented as efciently as possible. should be kept at a minimum. Therefore, and due to the carry is generated because of shareable supply contacts of the mirror gate depicted in = 1 ,T = 0 ) is replaced by the allowed state G for = 1 (R. Brent H. Kung, 1982). of wiring to gate capacitance is ever increasing cement Now capacitance a brick set for the realization of arbitrary prex adders Fig. cell height of 1.33m, usingand minimum-sized transistors. i : i 1 i : i 1 Figure ?? shows the comparison of implemented us shareable supply contacts of the mirror gate depicted in Fig. In modern deep-sub-micron CMOS technologies for and H. Kung, 1982). Comparing the truth ofprex the regarding modied operator with 4, the operator cell gates on top be of each To this a modied operator, denoted as = 1,exploit T = 1)fact, which is table equivalent carry is derived based on thecontains presented modied operator. (?). To accommodate this fact, thetwo bit mirror slice prex width should i: i 1 i 1) ing operator, mirror ( the seven-transistor aoi gate and pair A: 29% ratio 4, the cell contains two mirror gates on top ofshow each the truth tablethe ofgates the -operator in Table 1, the only difusing i :operator a modied prex denoted as which a introduced brick-based design style is advantageous, the other as shown in Fig. 5. To the effect of supply contact , is and dened as follows It is evident that the implementation of the operator cell(s) neration. The fact that signal G , T can kept at a minimum. Therefore, and due to shareable supithe i . Layi :i with 1 in the ference is value of T marked row, which for G in combination a nand-gate producing T other as shown in Fig. 5. To show the effect of supply contact d dened as follows sharing, two operator cells are cascaded horizontally, which is crucial for the adder efciency, which strongly suggests of wiring capacitance to gate capacitance is ever increasing w only take the values (0, 0), (0, 1) and (1, 1) (sharing this ply contacts of the mirror gate depicted in Fig. 4, the operator Fig. 5. vs. :: comparison of layouts (MS rules). Fig. 5. vs. comparison (MS rules) is one for the modied operator. So the forbidden state outs are shown for minimum spacing (MS) rules. Advansharing, two operator cells are cascaded horizontally, which ( g,t ) ( g , t ) = ( g + t g ,g + t t ) . (8) the layout adaption of the remaining cells to the layout of an is indicated by the dashed boxes in the operator examples. (?with ). To (g accommodate this the bitoutput slice width be contains two mirror gates on top of each other as shown operty ,t1i )) means that fact, input and pairs should of cell i :i =1 ,T i :i 1 = 0area ) is (8) replaced by the are allowed state )at g ,g ?? + t both t . (G operator which isstate-of-the-art implemented as efciently as possible.the retages regarding and regularity evident. The area is indicated by the dashed boxes in the operator examples. Implemented in a 40-nm kept a minimum. Therefore, and due to the shareable supble do not contain the forbidden state. This is a in Fig. ?? . Tocell show the effect of supply contacttechnology, sharing, two i :i 1 = i :i 1 = (G 1 , T 1 ) which is equivalent regarding carry Comparing the truth table of the modied operator with In modern deep-sub-micron CMOS technologies for at a Implemented in a state-of-the-art 40-nm technology, the re-version saving of 29% clearly over-compensates the area increase sulting cell pitch of the proposed is is only 0.48m ply contacts of the mirror gate depicted in Fig. 4, the operator i : i 1 i : i 1 erequisite to use the mirror gate at any position inside a operator cells are cascaded horizontally, which indicated to , so generation. The fact that the signal pair , T only ) can Fig. 5 shows comparison of ( G1, implemented using mir- a brick-based 4 the I. Rust and T. G. Noll: Aset modied prex operator well suited for area-efcient brick-based adder implementations h table of truth the of modied operator with 4 Derivation of brick which design style is a advantageous, the ratio ecause table of the the -operator in Tab. the difsulting cell pitch of the proposed version is only 0.48m at 15% using the Brick-Based Design methodology which cell height of 1.33m, using minimum-sized transistors. cell contains two mirror gates on top of each other as shown i ex tree. by the dashed boxes in the operator examples. Implemented now only take the values ( 0 , 0 ) , ( 0 , 1 ) and ( 1 , 1 ) (sharing this T. G. Noll: A modied prex operator well suited for area-efcient brick-based adder implementations i: i 1 proved by ror gates and using the seven-transistor aoi gate for G of wiring capacitance to gate capacitance is ever increasing e -operator in Tab. 1, the only difference is the value of T?. Compared in the marked row, which cell height of 1.33m, using minimum-sized transistors. was reported to the classical implementai ,t iin in Fig. ??operator, . Tocombination show effect of supply sharing, i two The modied the , be used as a contact replacement with (g )to ) with means input and output pairs of i:i 1 property in a state-of-the-art 40-nm the resulting pitch in a that nand-gate producing T state . prex Layouts are (Veendrick, 2000). To technology, accommodate this fact, thecell bit slice of T infor the marked row, which Now a brick set for the realization of arbitrary adders is one the modied operator. So the forbidden denoted as tion based on DFM rules (Fig. of 1), the This area saving is as high Table 1. Application of -operator to output preprocessing. Table 1following both do not contain the forbidden state. is a preoperator cells are cascaded horizontally, which is indicated exhibits the properties of the proposed version is only 0.48 m at a cell height of i : i 1 i : i 1 width should be kept at a minimum. Therefore, and due to ed operator. So the forbidden state is derived based onthe thearea presented modied prex operator. (G requisite = 1 ,T = mirror 0) is replaced by the allowed state -operator to output ofuse preprocessing. as 54%. Typically of prex adders is dominated by to the gate at any position inside a prex by the dashed boxes in the operator examples. Implemented the shareable supply contacts of the mirror gate depicted in 1.33 m, using minimum-sized transistors. i:i i:the i1 allowed state 1 i1 is = is replaced by (G 1It ,T =1 1) that which is equivalent regarding evident the implementation of the carry operator cell(s) 0) ci + = G = itheir tree. implementation. that a Fig. bricki iprex i graph i 1 i 1 iThis 1 means in a state-of-the-art 40-nm the resulting 4, the operator contains two mirror gates on top usof (8) g t g ttechnology, Gi :pair Ti:ii: 1 i:cell i1 pitch Figure ?? shows the cell comparison of implemented = 1) which is equivalent regarding carry generation. The fact that the signal ( G , T ) can is crucial for the adder efciency, which strongly suggests The modied operator, , to be used as a replacement 1 i 1 i : i 1 i : i 1 oriented implementation based on can be realized with each other as shown in Fig. 5. To show the effect of suptof G Tversion the proposed only 0.48 m at a cell height ing of mirror gates and using the seven-transistor aoi gate i:i1 i:is i1 the now must be associative that signal pair ( ,A: Tfollowing can only take the values (00 ,) 0) , (0 ,based 1) and (1,0 1) (sharing this 29% on r with the the layout adaption of the remaining the layout of an sharing, two operator cells are cascaded horizon0 0G 0 0 for exhibits the properties ply contact a smaller area than one cells in to combination with i 1.33 m, using minimum-sized transistors. for G in combination with a nand-gate producing T i . Laylues (0 , 0) , (0 , 1) and (1 , 1) (sharing this 0 0 0 property with ( g ,t ) ) means that input and output pairs of 0 i icell 0 approach 1 is implemented 0 0as efciently operator which asthe possible. i tally, which is indicated by the dashed boxes in the operator only difmust be idempotent i +1 the classical to layout generation despite much c = G i ?? do shows the comparison of state. implemented usig. 5. vs. :: both comparison of (MS rules). outs are shown for minimum (MS) 40-nm rules. technolAdvanFig. 5. vs. comparison rules) ) means that input and output pairs of 1 Figure 0 0 1 layouts 1(MS 0 0 This is Tab. 1 not contain the forbidden a prew, which examples. in aspacing state-of-the-art In0 modern deep-sub-micron CMOS technologies for Implemented more limited layout features available in the brick approach. i i ing mirror gates and using the seven-transistor aoi gate tages regarding area and regularity are evident. The area 1 1, 0 G state = T = 0 use must occur ontain the forbidden state. This is a pre0 0 0 0 position 0 inside requisite to the mirror gate at any a prex ogy, the resulting cell pitch of the proposed version is only 1 must be associative dden which a never brick-based design style is advantageous, the ratio i in combination To build a complete prex adder, the following additional for G with a nand-gate producing T i . Lay0 0 0 1 0 1 0 1 saving of 29% clearly over-compensates the area increase irror gate at any position inside a prex 0.48 m at a cell height of 1.33 m, using minimum-sized tree. wed state of wiring capacitance to analogy gate capacitance ever increasing e rst three properties can be proved in to1rules. , sois cells are required: shown comparison 1 must be idempotent Fig. 5outs shows the of , implemented using mirare for minimum spacing Advan1 1 0 1 1to 1 (MS) Derivation of brick set transistors. of should 15% using The modied operator, be used as a replacement for ding carry ( ? ). To accommodate this fact, the bit slice width be the Brick-Based Design methodology which i ese proves are omitted. The last regularity property can be proved by or gates and using the seven-transistor aoi gate for G 1 1 1 1 0 0 1 0 tages regarding area and are evident. The area Figure 5 shows the comparison of implemented using i i or, 1 ), be used as a for occur and due to the shareable i :i was reported to exhibits the following properties = replacement G 1, T = 0 must never can kept at a minimum. Therefore, sup- in ?. Compared to the classical implementai preprocessing: ntradiction as follows 0 0 n combination with a nand-gate producing T . Layouts are 1 1 0 1 1 1 using A: 29% gate high mirror gates andrules aoi for saving of 29% clearly over-compensates the area increase ow a properties brick set for the realization of arbitrary prex adders wing tion based on DFM (Fig. the 1),seven-transistor the area saving is as haring this The ply contacts of the mirror gate depicted into Fig. 4, the operator iA: i . Layouts i +1 i i : k i : k 1 1 1 1 1 1 1 1 29% rst three properties can be proved in analogy , so G in combination with a nand-gate producing T of = using c = G i 15% the Brick-Based Design methodology derived based on the presented modied prex operator. which G ,T = 0 as 54%. Typically the area of prex adders is dominated by t pairs of 1 cell contains two mirror gates on top other shown Fig. 5. :: comparison of layouts (MS rules). Fig. 5. vs. vs. comparison (MS rules) 1 1 1 Compared these proves are omitted. The last property can of be each proved by as are shown for minimum spacing (MS) rules. Advantages was reported in ? . to the classical implementai : j i : j j 1 : k It is evident that the implementation of the operator cell(s) www.adv-radio-sci.net/9/1/2011/ their prex graph implementation. This means that a brick is+ in Fig. ?? G T G = 1 This a . To show the effect of supply contact sharing, two Fig. 5. vs. : comparison of layouts (MS rules). Fig. 5. vs. : comparison (MS rules) contradiction as follows regarding area and regularity are evident. The area saving the must be associative tion based on DFM rules (Fig. 1), the area saving is as high crucial for adder which strongly suggests oriented implementation based on the can be realized with i :j a i :j operator j 1:k efciency, n inside of 29% clearly over-compensates area increase of 15% cells are cascaded horizontally, which is indicated Gas + T Typically T = 0 area of prex adders is dominated by ociative 54%. he layout adaption of thethe remaining cells to the layout of an a smaller area than the one basedset on combination using with mir5 shows of in implemented the in the generated operator examples. Implemented j 1: k fact j 1the :be kdashed 1 ,by must idempotent 4Fig. Derivation ofcomparison brick to the that carry is already because of their graph G = T = 0 boxes prex implementation. This means that a brickperator cell which is implemented as efciently as possible. the classical approach to layout generation despite the much Fig. 5 shows the comparison of implemented using miri :ialready 1 = 1 ( gates and using the seven-transistor aoi gate for Gi mpotent placement 4 Derivation of brickcell setror in a state-of-the-art 40-nm the resulting pitch rry G is generated because of2011technology, ? ). Adv. Radio Sci., 9, 289295, www.adv-radio-sci.net/9/289/2011/ i the brick approach. oriented based on can be realized with i implementation i 2: Ink + modern deep-sub-micron CMOS technologies for more limited layout features available ror gates the aoi for Gin G = 1 ,the T fact, = 0 must never occur inseven-transistor combination agate nand-gate producing T i . prex Layouts are of proposed version is only 0.48 and m at a using cell of Now a brick set with for the realization of arbitrary adders To exploit this a modied prex operator, denoted as height i a smaller area than one based on in combination with k + 1 k + 1 k hich a brick-based design style is advantageous, the ratio To build a complete prex adder, the following additional must never occur in combination with a nand-gate producing T . Layouts are g + t g = 1 , Now a brick set for the realization of arbitrary prex adders 1.33 m, using minimum-sized is derived based on the presented modied prex operator. a modied operator, denoted asfollows transistors. the , classical is prex introduced and dened as approach to layout generation despite theon much fdened wiring capacitance to gate capacitance isisever increasing cells are required: k +1 as k +1 k derived based the presented modied prex operator. follows Figure ?? shows the comparison of implemented usIt is evident that the implementation of the operator cell(s) g +t t = 0,

or to output of preprocessing. , dened as t these proves are omitted. The last replaced by the allowed state contradiction as follows ch is equivalent regarding carry 1 i :i 1 i :i 1 ( g,t ) ( g , t ) = ( g + t g ,t t ) . (7) G T i : i 1 i : i 1 signal pair I. (G ,T can A modied prex operator well suited for area-efcient brick-based adder implementations i:k = 1, T i:k = Rust and T. G. ) Noll: 293 G 0 I. Rust, T.G. Noll: A Modied Prex Operator Well Suited for Area-Efcient Brick-Based Adder Implementations 5 0), (0, 1) 1) (sharing this 0 and (1,0 i:j i:j j 1:k Using the -operator on the signal pairs generated in the G +T G =1 that input and 0 output pairs of Design 0 using the Brick-Based methodology which creates was re-the forbidden state at the inputs pre-processing block i:j i:j j 1:k shown for minimum spacing (MS) rules. Advantages regard G +T T =0 0 ported 0 in e forbidden state. This is a preKheterpal et al. (2005). to the classical of evident. theCompared following operator cells ing area and regularity are The area saving of 29%(Tab. 1). j 1: k j 1: k 0 position 0 e at any inside a prex i: i1 the area savimplementation based on DFM (Fig. 1),of G = 1, T =0 The forbidden state (G = 1,T i:i1 = 0) appears at the clearly over-compensates therules area increase 15% using the 0 ing isBrick-Based 1as high as 54%. Typically the area of prex adders is Design methodology which was reported in (V. j k+2 : 1 as 1 dominated by their prex graph implementation. This to be used a replacement for i classical means 1 1 i:i1 Kheterpal et al., 2005). Compared to g the implementi g i ti1 Gi:i T Buffer g k+1 + tk+1 g k = 1, 1 that atation 0 brick-oriented implementation based on can based on DFM rules (Fig. 1), the area saving isbe asrehigh operties 0 0 0 29% 0 1 alized 1with 0 in A: 0 combinaa smaller area basedadders on as 54%. Typically thethan areaone of prex is dominated by g k+1 + tk+1 gn/g tk = 0, 1

Now a brick set for the realization is derived based on the presented m is derived based on the presented modied prex operator. It is evident that the implementatio as follows and two inverters, i It is i that the implementation of the operator cell(s) evident crucial for the adder efciency, w i i Application of nand, -operator t , requiring nor to output of preprocessing generation gTable , g , t1.and ). (8) of i from either g i and t i or g i and t i , generation of p is crucial for the adder efciency, which strongly suggests layout adaption of the remaining c and two inverters, requiring nor or nand, respectively, he modied operator with the erator cell which is implemented a the layout adaption i of the remaining cells to the layout of an generation of pi from either g i and t or g i and ti , NAND In modern deep-sub-micron CMO r in Table ?? , the only difoperator cell which is implemented as efciently as possible. row marked in gray, because a carry would be generated at requiring nor or nand, respectively, prex graph: 1 in the marked row, which brick-based design style is advanta weight i , but none would be transmitted from weight i 1 . In modern deep-sub-micron CMOS technologies for i:i1 i:i1 prex graph: buffer(s), capacitance to gate capacitance Taking closer lookdesign at G style and , it is obvious that tor. So the forbidden state which aa brick-based is T advantageous, the ratio i:i1 drick, 2000). To XOR accommodate th T is irrelevant for the process of carry generation due a single mirror-gate operator cell to use prior to placed by the allowed state of wiring capacitance to gate capacitance is ever increasing buffer(s), should be kept at a minimum. T post-processing, to the fact that the carry is already generated because of is equivalent regarding carry (?). Tooperator accommodate this fact, a single mirror-gate cell to use prior to the bit slice width should be i:i1 i : i 1 i : i 1 shareable supply contacts of the m G at a= 1 (R. Brent and H. Kung, 1982). gnal pair (G post-processing: , T post-processing, ) can kept minimum. Therefore, and due to the shareable supsingle Operator 4, the operator cell contains two m Tocontacts exploit of this fact, a modied prexin operator, denoted as (0, 1) and (1, 1) (sharing this ply the mirror gate depicted Fig. 4, the operator post-processing: other as shown in Fig. 5. To show t xor or xnor gate. , is introduced and dened as follows hat input and output pairs of cell contains two mirror gates on top of each other as shown sharing, two operator cells are cas he forbidden This is in Fig. ??. To show the effect of supply contact sharing, two xor or a xnor gate. All state. additional bricks that were implemented are shown in ) = (g + ). ( g,t ) ( g , t t g ,g + t t (8) is indicated by the dashed boxes gate at any position inside a operator cells are cascaded horizontally, which is indicated 6. Fig. 6.6. Additional Fig. Additionalbricks. bricks All additional bricks that were implemented are shown in Fig. Implemented in a state-of-the-art by the dashed boxes the operator Implemented In consequence of the very small cell the pitch in in combinaComparing truth table of the examples. modied operator with Fig. 6. sulting cell pitch of the proposed v tion a strict horizontal wellandI. substrate orientation, to be used as with a replacement a state-of-the-art 40-nm technology, resulting pitch 4inthe Rust and T. G. Noll: A modied prex operator suited for area-efcient brick-bas In consequence of the very small cell pitch in combinatruth table of the -operator in the Tab. 1, the cell only dif- wellcell height of 1.33m, using minim and a well contacts are not needed in every single roperties substratei :i1 0.48 of the proposed version is only m at a cell height of is determined by the width of the operator cell. Apart from tion with strict horizontal welland substrate orientation, ference is the value of T in the marked row, which cell. substrateTo use this as well an advantage andnot at the same time keepA cell 64-bit Sklansky adderare was implemented for each operand contacts are needed in every single the width, all bricks identical in both adders except 1.33 m, using minimum-sized transistors. is one for the modied operator. Sovariant the forbidden state ator using a physically oriented design ow and ing the brick set small, dedicated substrateand well columns Table 1. Application of -operator to output of preprocessing. cell. To use this as an advantage and at the same time keepfor the operators. All bricks feature minimum-sized transisi : i 1 i : i 1 Figure = ??1shows the comparison of the implemented us- In both cases the bit slice width ( G ,T = 0) is replaced by allowed state minimum-space design rules. are used. ing the brick set small, dedicated substratecolumns tors (see Fig. 6). To drive the high-fanout wires, multiple i:i1 1 well mirror i:iand ing gates and which using the seven-transistor aoi gate (G = 1, T = 1) is equivalent regarding carry is determined by the width of the operator cell. Apart from are used. i i i 1 i 1 i :i 1 i :i 1 instances of buffers were used. i i. g t g t G T i:iminimum-sized 1 all ibricks :i 1 Lay the cell width, are identical in both adders except for G in combination with a nand-gate producing T generation. The fact that the signal pair ( G , T ) can Figure 7 shows the corresponding layouts, thetransisversion 5 Comparison of operators for and operators. All bricks minimum-sized outs shown forvalues minimum (MS) rules. Advannoware only (00 , 0)spacing , (0 , 1) (1 ,0 1) (sharing thisfeature 0 take 0 the 0 0 the based on the proposed operator exhibits an area 29% smaller tors (see Fig. 6). The To drive the high-fanout wires, multiple 5 comparison of operators tages regarding area regularity evident. occur property )and )the means input output pairs of 0with0 (g 0 1 thatthan 0 are 0 i ,tof i theand one on thearea classical Brent&Kung The decision to build the quantative comparison proinstances ofbased minimum-sized buffers were used. approach. If saving of 29% clearly over-compensates the area increase 0 0 1 1 0 0 Tab. 1 both do not contain the forbidden state. This is a prethe inuence of the larger bit slice pitch the convenposed prex operator with the B&K type on the SklanFig. 7 shows the corresponding layouts, inside the version based The decision e proved in analogy to , to sobuild the quantative comparison of the pro0 1 0 0 0 0 of 15% using the Brick-Based Design methodology which requisite to use the mirror gate at any position inside a prex tional adder on the area of the bricks used in both versions sky topology is based on its advantages in the ultra deepposed prex operator with the B&K type on the Sklanon the proposed operator exhibits an area 29% smaller than ast property can be proved by 0 1 0 ultra deep1 to is 0 1 the eliminated, BBL-based has still an area advantree. was reported in ?.modied Compared the classical implementathe one based on the classicalversion Brent&Kung approach. If the sky topology based on its advantages in the submicron domainis (Patil et al., 2007) were the op0 on 1operator, 1 1 1 of 1the larger 23%. submicron domain (Patil etbrick-style al., 2007) were the rules modied opinuence of slice pitch inside the conventional erator is benecial regarding a implementation. It The modied (Fig. , to1), betage used as a replacement for tion based DFM the area saving is asbit high 1 1 0 0 1 0 eratorthe is benecial regarding a brick-style implementation. It adder on the area of the bricks in both versions is elimcombines minimum possible logical depth with the minWhile featuring almost identical delays (slow exhibits the following as 54%. Typically the area properties of prex adders is dominated by usedslow-corner C, inated, the BBL-based version has still an area advantage of combines minimum possible logical depth with min1 1 0 ofthe 1 1 10.9 V) of 1121 A: 2 imum amount the of wiring (paying the price in terms max), the model, 0 ps ( ) and 1128 ps ( their prex graph implementation. This means that a brickimum amount of wiring the price of max23%. i+1 i in terms 1 1 1 1 1 1 imum fanout), which tackles(paying the aforementioned increasing power dissipation of the BBL-based adder was about 12.5% c implementation = G i oriented based on can be realized with While featuring identical slow-corner delays (slow fanout), which tackles the aforementioned increasing Fig. :: comparison Fig.5. 5. vs. vs. comparison ratio imum of wiring capacitance to gate input capacitance. In adhigher (222.8 W almost vs. 198.0 W, simulated at 500 MHz un- of layouts a smaller area than one based on in combination with ratio of wiring capacitance to gate input capacitance. In admodel, 0 C,0.9V) of 1121ps ( ) of and 1128ps (), the dition, the increased delay per gate caused by the increased der typical conditions). In terms ATE efciency, the must be associative the increasedvoltage delay per gate caused by theto increased power dissipation of the BBL-based adder was about one. 12.5%If the classical approach layout despite the much ratio dition, between threshold magnitudes and supply volt- generation BBL version is 26% better than the conventional higher (222.8 Wapproach. vs. 198.0 W, simulated at 500MHz unratio between threshold voltage magnitudes and supply voltmore limited layout features in the brick age makes a small logicalto depth the generated aforementioned net logic area 4 of the bricks is set of Fig. 5peripheral shows the comparison desirable. desirable. must be idempotent Derivation of brick the fact that the carry is available already because age makes a small logical depth der typical conditions). Inof terms of ATE efciency, the BBL A 64-bit Sklansky adder was implemented for each optaken into account for the conventional adder, an advantage 1 = build a ror gates and using the seven Gi :iTo 1 (? ).complete prex adder, the following additional irequired: i design erator variant using a physically oriented ow never and occur 2). However, the of 15% in ATE efciency remains (Table cells are G = 1 , T = 0 must in combination a nand-gate p Now a brick set with for the realization To exploit this fact, a modied prex operator, denoted as minimum-space design rules. In both cases the bit slice width adder based on is 11% less efcient with respect to the is derived based on the presented m , is introduced and dened as follows bk = 0 qed. preprocessing: It is evident that the implementa ) = (g + t g,g ). (g,t) (g, t +t t (8) www.adv-radio-sci.net/9/289/2011/ Adv. Radio Sci., 9, 289295, 2011 efciency is crucial for the adder Comparing the truth table ofwww.adv-radio-sci.net/9/1/2011/ the modied operator with the the layout adaption of the remaini truth table of the -operator in Table ??, the only difoperator cell which is implemented

0 generation 0 that 0despite 1 0 0 1 the tion with classical approach to layout their prex graph implementation. This means a brick0 0 1 1 0 Fig. 5. vs. comparison of with layouts (MS (MS rules) rules). 0 Fig. 5. on vs. comparison oriented implementation based available ::can bein realized the much more limited layout features the brick 0 1 0 0 0 0 a smaller area than one based on in combination with approach. the classical approach to layout generation despite the much 0 1 0 1 0 1 To build a complete prex adder, the following additional more limited layout features available in the brick approach. 0 the 1comparison 1 set of 1 implemented 1 1 using mircells are because required: of 5 shows 4Fig. Derivation of brick lready generated To build a complete prex adder, and the additional 1 following 0 0 1 0 gate for Gi ror gates 1 using the seven-transistor aoi preprocessing: cells are required: NOR i 1 the 1 producing in combination a0 nand-gate T .1prex Layouts are er Now a brick1set with for realization of 1 arbitrary adders ed occur prex operator, denoted as 1 1 1 1 1 1 i i i i generation preprocessing: of g , g , t and t , requiring nand, nor

g k = ak bk = 1,tk = ak + bk 4 Derivation of brick set

tn/t

H. Kung, 1982). It is evident that thethe implementation of the the operator is with Comparing truth table of modiedcell(s) operator sulting cell pitch the proposed versi shareable supply contacts of theof mirror gate depicted Gi:i1 = 1 (R. Brent and H. Kung, 1982). 4, the operator cell contains two mirror gates on top each odied prex operator, denoted as 4 I. Rust and T. G. Noll: A modied prex operator well suited for area-efcient brick-based a crucialthe for truth the adder which strongly suggests table efciency, of the -operator in Tab. 1, thethe only dif-of put of preprocessing cell height of 1.33m, using minimum 4, the operator cell contains two mirror gates on top i : i 1 this fact, a modied prex operator, denoted as other as shown in Fig. 5. To show the effect of supply contact ned as follows To exploit layout ference adaptionis ofthe the value remaining the layout of an op- of of Tcells to in the marked row, which to model the effects V variability, called DELVTO thshown 3 other as in Fig. 5.V To show (BSI). the effect of supply , is introduced and dened as follows sharing, two operator cells are cascaded horizontally, which al. (2006) show that the ratio of the distribution erator cell which is implemented as efciently as possible. th is one for the modied operator. So the forbidden state This parameter was used to model aggressive variability sce-for Table 1. Application of -operator to output of preprocessing. ). + t t (8) sharing, two operator cells cascaded :deep-sub-micron i1is indicated :i1 by theis dashed boxes the operator examples. nominal devices at nodes below 90nm can beare up to 30%. horizontally In modern technologies for which a (GiArea = 1,T i =CMOS 0) replaced by in the allowed state would be generated at Sklansky narios based on the layout-derived netlists of both adders. Fig. area comparison. Fig.7. 7. 64-bit 64-bit comparison ) adder: (g,tSklansky ) ( g , t = ( g + t g ,g + t t ) . (8) is indicated by the dashed boxes inadder the operator ex As a rst step, all transistors in the netlists of both i : i 1 i : i 1 Implemented in a state-of-the-art 40-nm technology, the re brick-based style is advantageous, the ratio of wiring The research results published (G design = 1, T = 1) which is equivalent regarding carry in (Bernstein, K. et al., 2006) ed from weight i 1. i i i 1 i 1 i :i 1 i :iwhich 1 3 ble with versions could affect a transient occurrence of the g t g t G T Implemented in a state-of-the-art 40-nm i : i 1 i : i 1 1 of the modied operator sulting cell pitch of the proposed version is only 0.48m at a the show that ratio the Vth distribution for nominal de-FStechnology capacitance tosuited gate capacitance is signal ever increasing (Veengeneration. The fact that the pair (G ,T ) of can , it isA obvious that implementations . Noll: modied prex operator well for area-efcient brick-based adder the table of the modied operator with -operator in Tab. Comparing 1, the only dif- truth were biased regarding V a worst-case way by acceleratth in sulting cell pitch ofto the proposed version is only 0.48 cell height of 1.33m, using minimum-sized transistors. vices nodes below 90nm be up 30%. drick, 2000). To accommodate this fact, the bitat slice width now only take values (0 , 0) , (0 , 1) and (1 ,0 1) (sharing this carry generation 0 0 the 0 modied 0 0 energy-delay-product (EDP). This is due to increased power 4 due I. Rust and T. G. Noll: A prex operator well suited forcan area-efcient brick-based adder the truth table of the -operator in Tab. 1, the only difing transitions which could possibly lead to the FS the impleme T i:i1 in the marked row, which cell height of in 1.33m, using transist version is 26% better than the conventional one. Ifmeans the Therefore, aforeAs aand rst step, all transistors the netlists of minimum-sized both (e.g. adder should be kept at a and due to the property with ( g ,t ) ) that input output pairs of generated because of dissipation that is discussed below. 0 0 0 1 0 0 i : j i : j i: iminimum. 1 i i rise transition of G affect or the fall transition of T of ) and at the ference is area the value of T bricks in the marked row, which operator. So the forbidden state mentioned net logic of the peripheral is taken into versions which could a transient occurrence the FS perator to output of preprocessing. shareable supply of mirror gate depicted Fig. 03 contacts 0 not 1 the 1 state 0 same 0 in Tab. 1 both do the forbidden state. This is adown pre- transitions that are needed to exit 2). It was already shown in Sect. that the contain forbidden time slowing is one for the modied operator. So the forbidden account for conventional adder, an advantage of 15% in were state biased regarding Vth in a worst-case way by acceleratis replaced by the allowed state i :k Table ithe :k Application 1. of -operator to output of preprocessing. 0 1 0 0 0 0 4, the operator cell contains two mirror gates on top of each (G = 1 ,T = 0 ) can not appear when ideal switching conrequisite to use the mirror gate at any position inside a prex operator, denoted as i:i1 i:i1 2). However, the adder based it. transitions All Vth variations had apossibly magnitude of of the respecATE efciency remains ing which could lead to30% the FS (e.g. the (G =1 ,T (Tab. = 0) is replaced by the allowed state which is equivalent regarding carry i 1 i :i 1 i :i 1 0 Fig. 1to 0 1 far 0rise 1 contact ditions are assumed. In reality, switching conditions are i:j As a result, die power dissipation i: j other as shown in 5. To show the effect of supply tree. ws tive default values. of the t G T i : i 1 i :i 1 i( :i 1 i : i 1 on is 11% less efcient with respect the energy-delaytransition of G or the fall transition of T ) and at the G ,T = 1, T =1 1) which is equivalent regarding carry the signal pair (G can i ) between iThe i 1 cells :i 1cascaded i :ibe 1horizontally, from ideal, so delays corresponding G and T sig0 t i 1operator, 1i 1 1 1 sharing, two operator are which BBL adder was further increased by 6%, while the conveng t g G T modied , to used as a replacement for i : i 1 i : i 1 product (EDP). This is due to increased power dissipation same time slowing down transitions that are needed to exit The that the pair (G ,T ) can (00 , 0), (0, 1) and (1generation. ,occur, 1) (sharing thisfact 0 0 nals temporarily, resulting in the forbidden state 1 signal 0 boxes 0 1it. 0 (8) tional version only suffered an increase 0.5%. The current is indicated by 1the the dashed in the operator examples. that can is discussed below. All V had a magnitude of by 30% of the respec values exhibits following properties th variations now only take the (0 , 0) , (0 , 1) and (1 , 1) (sharing this eans and output pairs0ofmirror 0 0 cell. 0 0 0 1 that input 0 0 at the input of a receiving Compared to a regular 1 1 the forbidden 0 1 1tive 1 the ), of A: 29% respecpeaks were increased by 12% die ( power ) anddissipation 7% ( It was already shown in section 3athat state technology, default values. result, the Implemented in state-of-the-art 40-nm re- As a property with ) ) means input and output pairs of 0 (g 0 1 0power 0 1 forbidden 0 0 n the is a can prei:k 0 i ,ti i+1 i that CMOS gate under the inuence of glitching, the dissi 1 1 1 1 1 1 died operator with tively. Surprisingly, the power dissipations of both adders (Gi:kstate. = 1 ,TThis = 0) not appear when ideal switching BBL adder was further increased by 6%, while the conven c = G i sulting cell pitch of 1 the proposed version is only 0.48m at a 0 0 for 1 forbidden 0 0 0 1, 0 0 Tab. 1assumed. both do not contain the forbidden state. This is a prepation of the mirror gate in reality, the state can be much r gate at any position inside a prex died prex operator well suited area-efcient brick-based adder implementations Fig. vs. :: comparison of layouts (MS (M Fig.5. 5. vs. comparison were much higher when applying the Vth variconditions are In switching conditions are tional version only suffered an increase bybest-case 0.5%. The current Tab. the only difcell height of 1.33m, using minimum-sized transistors. 0 1 0 0 0 0 1 0 1 higher due to the stronger short-circuit current. requisite to use the mirror gate at any position inside a prex ation. In this case, the power by(12% for BBL far from ideal, so delays between corresponding G and T sigpeaks were increased by 12% ( increased ) and 7% ), respec must be associative marked row, which 0 fraction 1 0the power 1 0 caused 1 1 to be used 1 nals 1 To determine the increase by can occur, temporarily, resulting in the forbidden state tively. Surprisingly, the power adder. dissipations of both adders in and 18% for the conventional The greater increase , as atree. replacement for of o the forbidden state ut of preprocessing. 0 1 1 1 1 1 0 1 0 at the input of a receiving mirror cell. Compared to a regular were much higher when applying the best-case V vari-set the denser wiring of the BBL version, an additional adder the conventional version is due to the decreased fraction of of im Fig. 5 shows the comparison The modied operator, , to be used as a replacement for th g properties must be idempotent 4 Derivation of brick to the fact that the carry is already generated because of by the allowed state 1the 1 0 of 0 0 1 1 1 CMOS gate under inuence glitching, power ation. In this case, the power increased by 12% for BBLthe -version dissiA: 29% caused based on with the bit slice pitch from the the1 power dissipation by the FS, which is observable at seven-tran i : i 1 ror gates and using exhibits the following properties G = 1 (?). alent regarding carry i 1 1 i i state pation of 1 theand mirror the forbidden can much 18% forpeaks the conventional adder. The greater increase 1 gate 1 in 0 1 1 be 1occurand 1 Ti:ii: A: 29% prod of was realized characterized. The power dissipation of this the current ( : 15%, : 9%). This observation in combination with a nand-gate G = 1 , T = 0 must never 1 i : i 1 Now a brick set for the realization Toshort-circuit exploit this fact, modied operator, denoted as is due r (G ,T ) can i+1 5. vs. ::acomparison of layouts (MS rules). higher due to c the stronger current. in the (MS conventional to the decreased Fig. 5.1 vs. comparison rules) 1 1i 1Fig. 1 1 prex version is still 10% higher than the one based on the classical shows that, at least version for the considered topology andfraction transistor = G i derived based on the presented , introduced andcaused dened nd (1,0 1) (sharing this To determine the fraction ofis the power by as thefollows of powerthe dissipation caused by the FS, which observable Fig. 5. vs. ::is comparison of is layouts (MS rules). mod Fig. 5. vs.can comparison (MS rules) approach, so it becomes apparent that increase the transient presence widths, effect of the FS be clearly dominated by other tive It is ephasized evident thepower implementation t and 0 output pairs of ofstate denser wiring the BBL version, additional adder based at the current peaks ( : -15%, :-9%). This that observation of the forbidden is responsible for most increase. sources of power dissipation. This is by the must be associative )an ). (g,t) (g, t = (g + t of g,g the + t t (8) on with the bit slice pitch from the -version was realshows that, at least for the considered topology and transistor is crucial for the adder efciency, w 0 n state. This is a preresults of ve simulations based on normally-distributed ran5 shows implemented using mirThe peak current in the network is anthe indicator for tent 4Fig. Derivation ofcomparison brick set of is already generated because of supply i ized and characterized. The power dissipation of this version widths, the effect of the FS can be clearly dominated by other 0 Comparing the truth the modied the osition inside prex layout of the remaining c domoperator V covering transistors, the Fig. 5 shows the comparison of power implemented us ror gates and of using the seven-transistor aoi gate for all G th variations theaimpact offact state because oftable the aforemen the forbidden must be idempotent 4with Derivation of brick adaption setwere to the that the carry is already generated because of is still strong 10% higher than the one based on the classical apsources ofthe power dissipation. This is ephasized by the power i gates 1 increase compared to classical CMOS was between 13.1% i : i 1 truth table of the -operator in Table ?? , only difoperator cell which is implemented as ror and using the seven-transistor aoi gate tioned short circuit current it may cause. The results in combination with a nand-gate producing T . Layouts are never occur Now a brick set for theof realization of arbitrary prex adders G so it = 1 (?). apparent died1prex operator, denoted as proach, becomes that the transient presence results of ve simulations based on normally-distributed rani :on i 1 in i ishow and 13.2% (max. current: 18% to 22%). summarized in Table 2 that the adder based denoted as a replacement for ference is the value of T the marked row, which G = 1 , T = 0 must never occur in combination with a nand-gate producing T i . pre Lay In modern deep-sub-micron CM Now a brick set for the realization of arbitrary is derived based on the presented modied prex operator. To exploit this fact, a modied prex operator, as ned as0follows the forbidden state is responsible for most of the increase. dom V variations covering all transistors, were the power th To determine which maximum supply currents can be exexhibits a 21% higher maximum VDD current than the one is one for the modied operator. So the forbidden state which a brick-based design style isop a is derived based on the presented modied prex , is introduced andnetwork dened The peak current in the supply isas an follows indicator for the increase compared to classical CMOS was between is(4.34 evident that the implementation of cell(s) pected tothe theoperator FS, special stimuli which favor its 13.1% transient It A: 29% the due compared to occurring in the adder based i :i 1 on i :i 1 mA ). 1 +t t (8) (G = 1 ,T = 0 ) is replaced by allowed state of wiring capacitance to gate capacita impact of the forbidden state because of the aforementioned and 13.2% (max. current: 18% to 22%). It is evident that the implementation of the operato issimulation crucial for the adder efciency, which strongly suggests occurrence were simulated. The rst stimulus (i) is based on 1 3.60 mA) in a 500-cycle at typical condi) = The :i 1= (g,t) (g, t (g + t1 g,g 1 + ). (8) irandom ti :it ( G = , T 1 ) which is equivalent regarding carry strong short circuit current it may cause. results summaTo determine which maximum supply currents can be ex( ? ). To accommodate this the bit is crucial for the adder efciency, which strongly of the modied operator with the the layout adaption of the remaining cells to the layout of an the fact that, i along paths between weights, the difference infact, Fig. :: comparison (MS rules) rules). Fig.5. 5. vs. vs. comparison of layouts (MS tions. i: i 1 to :i 1 ) special rizedComparing in Tab. 2 show that the adder based on exhibits a pected due the FS, stimuli which favor its transient generation. The fact that the signal pair ( G , T can kept at a minimum. Therefore, and du the truth table of the modied operator with the layout adaption of the remaining cells to the layo path the delay between Tas and G can accumulate, possibly leadrator in Table ??, the only difoperator cell which is implemented as efciently possible. A strong of supply current amplitudes can be a 21% higher increase maximum VDD current than the one ( occurring occurrence were simulated. The rst stimulus of (i) the is based on gate depicted i :i 1 in the now only take the values 0 , 0 ) , ( 0 , 1 ) and ( 1 , 1 ) (sharing this ply contacts mirror ing to FS withoperator increasedcell duration. On other hand, switchtruth criterion table of the -operator in to Table ?? , in the only difwhich isthe implemented as efciently as row, In deep-sub-micron CMOS technologies for knock-out for the usage of modern , therefore the maxinmarked the adder basedwhich on (4.34mA compared 3.60mA) the fact that, along paths between weights, the difference in i :i(g 1 ing activity is limited because propagate conditions need to technolo property with ,t ) ) means that input and output pairs of cell contains two mirror gates on top o i i Fig. 5 shows the comparison of implemented using mirference is the value of T in the marked row, which imum possible increase must be determined. In the conIn modern deep-sub-micron CMOS perator. So the of forbidden state 4 Derivation brick set a conditions. brick-based design style is advantageous, ratio nerated because a 500-cycle random simulation which at of typical path delay between T andthe G can accumulate, possibly leadisecond be fulllled. The stimulus (ii) tries to establish the FS Table ?? both do not contain the forbidden state. This is a in Fig. ?? . To show the effect of supp ror gates and using the seven-transistor aoi gate for G sidered nanometer-scale technologies the variability of the is one for the modied operator. So can the be forbidden state which a brick-based design style is advantageous, s replaced A bystrong the allowed state increase of supply current amplitudes to FS with increased duration. On the other hand, switchof wiring capacitance toa gateing capacitance is ever increasing at the input of all operator bricks in the rst row of the adders i i : i 1 i : i 1 prerequisite to use the mirror gate at any position inside a threshold voltage V can no longer be neglected and can operator cells are cascaded horizonta in combination with a nand-gate producing T the . Layouts are th knock-out criterion for usage of replaced , therefore the maxing activity is limited because propagate conditions need to (G = 1,T = set 0 ) ). isTo by the allowed state Now a the brick for the realization of arbitrary prex adders of wiring capacitance to gate capacitance is ever in hich is equivalent regarding carry (? accommodate this fact, slice width should be maximizing operator, denoted as prex bit graph (Fig. 2) at once, thereby the gates in i :i1 1 i :i 1 have aG large impact on delay. This makes it necessary to i : i i : i 1 prex tree. imum possible increase must be determined. In the conbe fulllled. The second stimulus (ii) tries to establish the by the dashed boxes in the operator ( = 1 , T = 1 ) which is equivalent regarding carry on the presented modied prex operator. (?).shareable To accommodate this fact,without the bit V slice width ex s , T is derived ) can based kept at a minimum. Therefore, and due to the sup- were se signal pair (G FS simultaneously. The simulations made th take a closer look at the The consequences for the forbidden state i : i 1 i : i 1 sidered nanometer-scale technologies the variability of the FS at the input of all operator bricks in the rst row of the modied operator, , to be used as a replacement in a state-of-the-art 40-nm technology, generation. The fact that the signal pair (mirror G , T the ) can kept at a minimum. Therefore, and due to the sharea It is evident that the implementation of operator cell(s) , 0), (0, 1) and ( 1 , 1 ) (sharing this ply contacts of the gate depicted in Fig. 4, the operator modications as well as under the inuence of worst-case Vth (FS). BSIM version 4.5 transistor models feature a paramthreshold Vth can no longer be neglected and can adders prex graph (Fig. 2) at once, thereby maximizing (8) voltage for for exhibits the following of the proposed version is Fig. only4, 0.48 now only take the values (0contains , 0) , (0efciency, , 1two ) and (1properties ,which 1) gates (sharing thisof ply contacts of the mirror gate depicted in the is crucial the adder strongly suggests ns that input and output pairs of settings. Again, the conventional adder was simulated using cell mirror on top each other as shown eter to model the effects of V variability, called DELVTO have a large impact on delay. the gates in FS simultaneously. The simulations were made th This makes it necessary to 1.33 m, using minimum-sized transis i + 1 i the same setup (Table 3). as well property with (g ))was means that input and output pairs of cell two mirror on top of of each other a d operator witha the the layout of the remaining cells to the layout of ancontains for ain the forbidden state.look This is a i ,t iadaption in . model To show the effect of supply contact sharing, two (BSI, 2005). This parameter used aggressive c =Fig. G ?? ito take closer at the consequences the forbidden state without Vth modications as under gates the inuence Figure shows the comparison of ?? both do not contain forbidden state. This isas a possible. in Fig.is ?? . To show the ?? effect of supply contact shar e ??gate , theatonly difoperator isthe implemented as efciently ror any Table position inside a cell variability scenarios based on which the layout-derived netlists of operator cells are cascaded horizontally, which indicated (FS). BSIM version 4.5 transistor models feature a parameter worst-case Vth settings. Again, the conventional adder was must be associative ing mirror gates and using the se prerequisite to use the mirror gate at any position inside a both adders. The research results published in Bernstein et operator cells are cascaded horizontally, which is i marked row, which In modernbydeep-sub-micron technologies for Implemented the dashed boxes CMOS in the operator examples. for Gi in the combination with a nand-ga prex tree. by the dashed boxes operator examples. Impl the , forbidden to be usedstate as a replacement which a brick-based design style is advantageous, the ratio in a state-of-the-art 40-nm technology, resulting cell pitch must be idempotent outs are shown for minimum spacing The modied operator, , to be used as a replacement in a state-of-the-art 40-nm technology, the resulting c the allowed state ng properties of capacitance gate capacitance is ever increasing of the proposed version is only 0.48 m at a cell height of Adv. Radio Sci., 9,wiring 289295, 2011 www.adv-radio-sci.net/9/289/2011/ i = 1, T i = 0 must never occur tages regarding area and regularity a G for exhibits the following properties ofbe the proposed version is only 0.48 m at a cell h ent regarding carry (?). To accommodate fact, the bit slice width should 1.33 m, this using minimum-sized transistors. i :i 1 , T i :i 1 ) can i +1kept saving of 29% clearly over-compens 1.33 m, minimum-sized transistors. (G at a minimum. Therefore, and due to the shareable supi Figure ?? shows comparison of The properties canthe be proved in analogy toimplemented , so using us c =G i rst three of 15% using the Brick-Based Desig d ( 1 , 1 ) (sharing this ve Figure ?? shows the comparison of impleme ply contacts of the mirror gate depicted in Fig. 4, the operator

1 +T 0 T 1 =0 0 layout adaption of the remaining cells 0G 1 1 1 1 4 Derivation of brick set j 1: k j 1 1:k 0 1 1 1 1 1 0 erator cell which is implemented as ef 1 0 G = 1, T a brick =0 = 0) appears at the Now set for the realization of arbitrary prex adders 1 row 1 marked 0 0 1 0 1 1 1 In for modern deep-sub-micron CMOS tec in gray, because carry would be generated at Now a brick set the realization of arbitrary prex j k + 2 : is derived based on the a presented modied prex operator. 1 1 0 1 1 1 1 1 1 294 I. Rust and T. G. Noll: A modied prex operator well suited for area-efcient brick-based adder implementations brick-based design modied style is advantageo 1 weight i , is but none would transmitted from weight i 1 . based k +1 Ak +1 T i:i1 6 I. Rust, T.G. Noll: Modied Prex Operator Well Suited for Area-Efcient Brick-Based Adder Implementations is derived the presented prex ope It thebe implementation of the operator cell(s) ison evident gk = 1, that 1 g Taking 1 +t 1 1 look 1Gi:i1 and 1 T i:i1 , it is obvious that capacitance to gate capacitance is e a closer at It is evident that the implementation of the operator c crucial for the adder efciency, which strongly suggests the 0 to output of preprocessing k+1 +1 k -operator 1 g T i:i+ tkis t = 0, for the process of drick, 2000). To accommodate this fa BBL Conv. irrelevant carry generation due BBL Table 2. 64-bit adder comparison (40-nm CMOS). crucial for the adder efciency, which strongly sugg adaption of k the cells to the layout of an op0 Table 1. Application of layout -operator tokoutput ofremaining preprocessing should at cells a minimum. Ther Area /m2 of 388.2 be kept 546.2(500.9) to is 0 already because gk =the ak fact bk =that 1cell ,tk the = a carry +b = qed. generated layout adaption of the remaining to the layout o 0 erator which is implemented as efciently as possible. i:i1 Delay (w.c.) /ps 1121 1128 shareable supplyConv. contacts of the as mirro G = (R. Brent and H. Kung, 1982). BBL 0 erator cell which is implemented as efciently pos In 1 modern deep-sub-micron CMOS technologies for which a ause a carry would be generated at max. Supply Current/mA 4.34 operator 3.60 4, the cell contains two mirro To exploit this a fact, a modied prex operator, denoted as 2 the W 1 Inratio modern deep-sub-micron CMOS rowweight marked in gray, because carry would be generated at brick-based design style is advantageous, of wiring d be transmitted from i 1 . Area/m 388.2 546.2(500.9) Energy per Cycle / 0.446 0.396 technologies for M Hz 4 Derivation of brick set other as shown in Fig. 5. To the show theo e , is introduced and as weight follows 1 T i:i1 , it Delay 1121 brick-based design style is advantageous, ratio ATE-Efciency (normalized) 1.26(1.15) 11128 weight i, butthat none would be transmitted iis ever 1.(w.c.)/ps capacitance to dened gate from capacitance increasing (VeenGi:i1 and is obvious aoi/oai two operator cascad i:i1 i:i1 Max. supply current/mA 4.34 3.60 0 of carry capacitance tosharing, gate capacitance is cells ever are increasing Taking a closer look at G and T , it is obvious that drick, 2000). To accommodate this fact, the bit slice width he process generation due Now a (brick for realization arbitrary prex adders ). 1 g,t) set ( g, t ) =the (g + tg ,g + t of t (8) Table 2. 64-bit adder comparison (40-nm CMOS) by the is indicated dashed boxes t Energy per cycle/W MHz 0.446 0.396 i:i1 1 drick, 2000). To accommodate this fact, the bit in slic T is irrelevant for the process of generation due should be kept atcarry a minimum. Therefore, and due to the y is already generated because of based is derived on the presented modied prex operator. ATE-Efciency (normalized) 1.26(1.15) 1 Implemented in a state-of-the-art 40-n 1 bein kept to the fact that the carry is already because of gateshould shareable supply generated contacts of the mirror depicted Fig.at a minimum. Therefore, and du

I. Rust and T. G. Noll: A modied prex operator well suited for area-efcient brick-based adder implementations
Table 3. Adder comparison: worst-case stimuli.
BBLnc Imax,(i) /mA Ecyc,(i) /W MHz1 Imax,(ii) /mA Ecyc,(ii) /W MHz1 3.27 0.394 6.94 0.541 Conv.nc 3.33 0.386 6.55 0.541 BBLwc 5.17 0.502 9.14 0.547 Conv.wc 3.08 0.399 7.12 0.528

295

Manufacturability and Design-for-Yield is increasingly important. References


Bernstein, K., Frank, D. J., Gattiker, A. E., Haensch, W., Ji, B. L., Nassif, S. R., Nowak, E. J., Pearson, D. J., and Rohrer, N. J.: High-performance CMOS variability in the 65-nm regime and beyond, IBM J. Res. Dev., 50, 433449, 2006. Brent, R. and Kung, H.: A Regular Layout for Parallel Adders, IEEE Transactions on Computers, C-31, 260264, 1982. BSIM 4.5.0: Manual, available at: http://www-device.eecs. berkeley.edu/bsim3/bsim4.html, last accessed: 30 March 2011, 2005. Goering, R.: PDF CEO calls for restricted layouts, available at: http://web.archive.org/web/20071201153818/www. scdsource.com/article.php%?id=32, last accessed: 30 March 2011, 2007. Jhaveri, T., Rovner, V., Pileggi, L., Strojwas, A. J., Motiani, D., Kheterpal, V., Tong, K. Y., Hersan, T., and Pandini, D.: Maximization of layout printability/manufacturability by extreme layout regularity, Journal of Micro/Nanolithography, MEMS and MOEMS, 6, 031011, doi:10.1117/1.2781583, 2007. Jhaveri, T., Rovner, V., Pileggi, L., Strojwas, A. J., Motiani, D., Kheterpal, V., Tong, K. Y., Hersan, T., and Pandini, D.: OPC simplication and mask cost reduction using regular design fabrics, SPIE, 7274, 727417, doi:10.1117/12.814406, 2009. Kheterpal, V., Rovner, V., Hersan, T., Motiani, D., Takegawa, Y., Strojwas, A., and Pileggi, L.: Design Methodology for IC Manufacturability Based on Regular Logic-Bricks, Proc. DAC, 42, 353358, 2005. Liebmann, L., Pileggi, L., Hibbeler, J., Rovner, V., Jhaveri, T., and Northrop, G.: Simplify to survive: prescriptive layouts ensure protable scaling to 32 nm and beyond, in: SPIE Conference Series, vol. 7275, 2009. Masgonty, J., Arm, C., and Piguet, C.: Technology- and Power Supply-Independent Cell Library, IEEE Custom Integrated Circuits Conference, pp. 25.5.125.5.4, 1991. Neve, A., Flandre, D., Schettler, H., Ludwig, T., and Hellner, G.: Design of a Branch-Based 64-bit Carry-Select Adder in 0.18 um Partially Depleted SOI CMOS, Proc. ISLPED, pp. 108111, 2002. Patil, D., Azizi, O., Horowitz, M., Ho, R., and Ananthraman, R.: Robust Energy-Efcient Adder Topologies, Computer Arithmetic, IEEE Symposium on, 0, 1628, doi:http://doi. ieeecomputersociety.org/10.1109/ARITH.2007.31, 2007. Taylor, B. and Pileggi, L.: Exact Combinatorial Optimization Methods for Physical Design of Regular Logic Bricks, Proc. DAC, 2007. Tong, K., Rovner, V., Pileggi, L., and Kheterpal, V.: Design Methodology of Regular Logic Bricks for Robust Integrated Circuits, Proc. ICCD, 2006. Weinberger, A. and Smith, J. L.: A Logic for High-Speed Addition, National Bureau of Standards Circular 591, pp. 312, 1958. Veendrick, H.: Deep-Submicron CMOS ICs, Kluwer Academic Publishers, 2000. Zimmermann, R.: Binary Adder Architectures for Cell-Based VLSI and Their Synthesis, PhD thesis, Swiss Federal Inst. of Technology, Zurich, 1997.

While the supply current peaks are considerably increased in the BBL-based adder when passing over to the worst-case Vth distribution, the magnitudes of energy per cycle as well as peak currents are perfectly acceptable. Stimulus (i) shows a notable relative increase in energy per cycle and peak current, but both parameters have low amplitudes. Stimulus (ii), on the other hand, causes a high energy per cycle of the BBL-based adder in absolute terms, but without any important increase when applying the worst-case Vth distribution and without much difference to the classical implementation. Considering the variability-limiting nature of the brick-style layout approach, it can be expected that, in reality, the increase in power dissipation caused by the forbiddden state will be much lower compared to the presented values, because the results are based on extremely pessimistic assumptions regarding variability. In addition, because the brickstyle approach could make the usage of restricted layout rules possible, the area savings regarding the operator cell could exceed the 54% reduction based on minimum space rules in comparison to DFM rules.

Conclusions

To improve the realization of arbitrary prex-based carrypropagate adders in a brick-based design style, a modied version of the prex operator introduced by Brent&Kung was presented. This operator is usable for any prex tree, using a 3-input mirror gate as a replacement for the classical aoi and oai gates. It was shown that this gate is very well suited for realization as a brick. Based on the resulting operator layout, a complete brick set for prex adder implementations was created and used to implement a 64-bit Sklansky adder. To carry out an in-depth comparison of the proposed operator with the classical one from Brent&Kung, a version of this adder based on the classical approach was also realized. Both adders were comprehensively characterized and compared, showing that the ATE efciency of the adder based on the proposed operator is 15% higher. We have also shown that the forbidden state inherent to the modied prex operator does not signicantly increase power dissipation or current peaks. In conclusion, the presented approach to prex adders offers a highly efcient alternative implementation for nanometer-scale technologies where Design-forwww.adv-radio-sci.net/9/289/2011/

Adv. Radio Sci., 9, 289295, 2011

You might also like