MIT Reversible Computing Project Memo #M3

Voltage Scaling and Limits to Energy E ciency for CMOS-based SCRL WORKING DRAFT Revision: 1.27
Michael P. Frank MIT AI Lab, Rm. 747 545 Technology Sq. Cambridge, MA 02139

http://www.ai.mit.edu/~mpf

Started Thu., Dec. 19, 1996. Revision Date: 1997/01/10 22:33:42 GMT. Formatted April 25, 1997.
http://www.ai.mit.edu/~mpf/rc/memos/M03 scrllimits.html

A current version is available at

Abstract
This document explains in detail a simple analysis of the maximalenergy e ciency of the SCRL 2, 1] adiabatic circuit technique when implemented using ordinary MOS devices, and of how its e ciency scales with varying threshold voltages and temperatures. The analysis is somewhat crude, and needs further development, but as a preliminary result, we nd that the minimum energy per operation in SCRL circuits decreases as threshold and supply volt1

ages increase, in contrast to standard CMOS where the opposite relation holds.

1 Brief Overview of SCRL
This document is not intended to introduce the reader to SCRL. For an introduction, the reader should refer to references 2, 1]. However, for illustrative purposes, gure 1 shows an example of an SCRL gate, in this case a NAND. It can be seen that the gate consists

and also in the transistors of the transmission gate attached to the gate's output. For reference. we will lump together all the turnedon transistors within which dissipation occurs during a transition. as in gure 4. to cascade data through the pipeline while ensuring that two stages do not both try to drive the bus running between them at the same time. The reader should keep in mind that the overall structure of a complete SCRL circuit consists of a number of such gates organized into a series of paired forward and reverse stages. of a normal CMOS NAND structure. Each stage may contain a number of SCRL gates in parallel. but the reader should consult 2. having di erent numbers of distict phases. In addition to its logical inputs. to simplify the analysis. gure 3 shows an example of a timing diagram for an SCRL inverter in a 3-phase pipeline. driving a bus of wires running between the stages. L. This largescale structure is illustrated in gure 2. There exist a number of di erent timing disciplines for SCRL. with the output fed through a transmission gate. and treat them as if they were a single transistor. ply rails whose timing is phase-shifted relative to each other. Figure 2: SCRL pipeline. Adjacent SCRL stages are driven by sup2 Switching energy is dissipated in an SCRL circuit whenever the voltages on some gate's power supply rails H . PL . 1] for a detailed description. Figure 1: A typical SCRL gate: NAND. However. L change. A gate's output node voltage 2.φ H φH PL in0 PH in1 φL out φ L PH PL in out Figure 3: Timing diagram for an SCRL inverter. PH.1 Model 2 SCRL Switching Energy . the gate requires four variable supply rails H. We can consider a number of di erent cases for switching. Energy is dissipated within the transistors of the gate's pullup/pulldown networks.

all these cases are symmetrically similar to each other with regards to how their energy dissipation scales with speed. In adiabatic charging. However. and we assume that the PFET and NFET threshold voltages are also equal. referring back to the NAND gate in gure 1. For example. and so there will be some small dissipation through it.2 Analysis To determine the energy dissipation of our model circuit ( g. compared to the dissipation that we are including. and thence the instantanous power. which we could plug into the device's current-voltage relation to give us the instantaneous current I(t). we would like to know the voltage on the load at each moment during the transision. 4). So the total dissipation we are ignoring should not be large. which we could integrate over time to nd the total energy dis- . then the transistor attached to in1 will be turned on. because this would tell us the instantaneous drain-to-source voltage VDS (t) across the transistor. VL(t). 2. its voltage goes from a valid level (0 or Vdd ) to the neutral value Vdd =2. there is a quadratic dependence of dissipation on the capacitance being driven. so that the analysis of the dissipation through the pulldown network comes out the same for the pullup network.VL V dd φ (on) + - CL φ 0 0 Vdd 2 tr Figure 4: Circuit model for SCRL analysis. Therefore. may be switched either through the gate's pulldown network of NFETs or through its pullup network of PFETs. Later we will see that ignoring these dissipations is a simpli cation that is fairly well justi ed. we will ignore any dissipation that occurs during switching in tran3 sistors along paths that do not actually connect all the way through to the gate's output. and temperature. Vt0n = Vt0p = Vt0 . In our analysis. we ignore energy dissipation that occurs when switching with the transmission gate turned o . rather than analyzing them all separately. through a turned-on NFET which represents the gate's pull-down network and N pass transistor. kn = kp = k (matching the rise/fall delay times). we can see if in0 is low and in1 is high. And the switching activity may either be to clear the output or to set the output. when it is set. we will just consider one case: where the voltage VL on the load capacitance CL on the output node is charged up from 0V to Vdd =2. because these dissipations involve driving relatively small capacitances. threshold voltage. even though it does not connect through to the output. its value goes from Vdd =2 to 0 or Vdd . We assume that the PFETs and NFETs in the SCRL circuit have been sized so that their gain factors are equal. As another example. When an output node is cleared.

We note that if the input rises slowly. Then. I will be the quotient of the time tr is very large compared to the character. divided by the supply is the e ective resitance of the turned-on tran. Essentially the output voltage would rise at an exponentially-decaying rate and asymptotically approach the supply voltage. since that is the time during sistor. I(t) I = Q=tr = CLVdd =2 (5) To understand this limiting case.total charge Q = CL Vdd =2 that is transfered istic RC time constant of the circuit. where R to the load capacitance. Diagram (a) shows qualitatively what would happen if the supply rail were to rise very quickly compared to RC.(a) Fast charging. the current I(t) through the Z 1 Etr = P(t)dt (1) transistor will also rise. VDS ing closed-form formulas for I(t) and VDS (t) is always small compared to Vdd =2.rail rise time tr . 2 which in our case is (with V = Vdd =2) 1 2 Efast = 8 CLVdd : (4) Figure 5: Power supply and output voltage On the other hand. is constant. so that determin. just as happens in a regular CMOS inverter whose input switches very quickly. refer to tr 4 . The output voltage VL will initially rise slowly. During the transition. the output voltage Unfortunately. and so requires solving a tricky di erential equation. and so the current I = CL dVL dt we will approximate the energy dissipation by through the transistor will be approximately treating the limiting case where the supply rise constant as well. φ VDS VL the diagrams in gure 5. Cases where the rise time is about as which almost all of this charge is transfered. The energy dissipation Efast for this fast-switching case is well known to be (3) Efast = 1 CL ( V )2 . lyze. small as RC will not be adequately addressed by the below analysis. sipation of the transition Etr: but as the voltage drop VDS across the transistor increases. t=0 input voltage stops rising. φ VL (b) Slow charging. VL (t) itself is determined will nish the approach to Vdd =2 in asymptotic by integrating the current I(t) owing into fashion. where the supply rail rises very slowly. until an equilibrium is t=0 reached at which point VL is rising at the same Z 1 input voltage. behind it = I(t)VDS (t)dt (2) rate as the amount V = but lagging when the by a small DS IR. gure 5(b) shows what happens in the case which we will now anacurves for fast (a) and slow (b) charging. Instead. VL (t) (t). d =dt which we will not attempt here. the load capacitance CL. with an RC time constant.

since then even turned-on transistors r may only be in moderate or weak inversion.1 In the following. assumed that I 2 = Vdr ? Vdr ? 2 k r (10) (11) 1 This formula may not be appropriate for turnedon transistors if Vdd is about as small as the thermal voltage kB T =q. we can use the standard MOSFET trioderegime current-voltage formula to derive a closed form expression for VDS . from 0 to Vdd =2. and the fact that VDS = 0 when I = 0. the drive voltage Vdr is itself actually time-dependent. 11 to the very concise form VDS I=kVdr : (17) Now. and therefore that I is small (from 2 eq. let us make a further simpli cation of eq. and the slope is given by dVDS =dI: dVDS = d V ? V 2 ? 2 I (12) dr dI dI dr k 1 1 2 I ? 2 ?2 = ? 2 Vdr ? 2 k k (13) 1 = q (14) I 2 k Vdr ? 2 k 1 (for small I ) (15) p 2 k Vdr 1 (16) = kV : dr r ! (?Vdr )2 ? 4 ?1 2 2 ?1 ?I 2 k Now. 2 DS (6) I = k (VGS ? VT )VDS ? V2 Let's write VGS ? VT as Vdr (drive voltage) for conciseness. 2 (7) I = k Vdr VDS ? VDS 2 We can easily solve this equation for VDS . The reason we use the triode-regime rather than the saturation-regime formula is that turned-on transistors in SCRL are never in saturation. 11 as follows. we can therefore simplify eq. We observe that VDS will be approximately linear in I for these small Is. the supply voltage. because it is de ned in terms of the gate-to-source voltage VGS . using the quadratic formula. VDS will pass through 0 at I = 0. armed with this constant current I. Vdr (t) VGS (t) ? VT (t) (18) = (VG ? VS (t)) ? VT (t) (19) dd = Vdd ? V2 tt ? VT (t) r (20) Vdd t (21) = (Vdd ? VT (t)) ? 2 t 5 . With I kVdr . and although the gate voltage is constant. This is one area where Moreover. VGS is the gate-to-source voltage. We observe that our earlier approximation. VT (t) as well will vary along with the current analysis needs re nement. the transistor source voltage changes linearly over time tr . and the current may scale exponentially with VGS rather than accordingto the triode formula.Now. 11. following (t). 2 I = V V ? VDS (8) dr DS k 2 1V 2 ? V V + I = 0 (9) 2 DS dr DS k VDS = Vdr q tr is large. 5). and VT the threshold voltage. this will allow us to approximate eq. Everything except k (the transistor's gain factor) is here implicitly a function of t. due to the changing body Given this slope. that I(t) was constant.

although if we really fying step. by assuming that our maximum power supply voltage Vdd is being scaled procared to do it. determining Vdd =VT0. C 2 n2 V 2 ? 3 L dd T0 (31) = Now. given eqs.concise by renaming the factor containing ndd per bound on the integral to be time tr rather as just than 1. 22). we would like to take another simplito conveniently evaluate. For examr tr ple. 5) and Vdr (eq. then t=0 cdd 1:45.. if ndd = 4 and bavg = 1:25 (i. most of the energy dissipation occurs by time tr . 28 to body-e ect factor bavg = VT =VT0 for a typical re-express it in terms of a single voltage pabody-e ected VT . SCRL will not work (22) Vdr = 4 dd avg T0 properly if Vdd is too close to the threshold voltage VT0 . with our approximate constant expres4tr k 4 ndd ? bavg VT0 sions for I (eq. the energy integral in equation 2 would still a bit too complicated Now.CL Vdd=2 2 t e ect as the source voltage changes.SCRL might be 4. in observance of the fact that in the cdd n2 =(3ndd ? 4bavg ): (33) dd slow-transition limit. VT (27) = kVdr might be perhaps (as a roughly estimated typ2 C 2 Vdd ical value) 50% above the minimum value VT0 = 4t LkV (28) that it has when = 0V . the zero-bias threshold voltage: the body-e ected threshold VT as a multiple of VT0 is that it will later allow us to derive C 2 (ndd VT0)2 ?3 L (30) Etr = a very simple expression for the switching en4tr k 4 ndd VT0 ? bavg VT0 ergy. which allows us nally to approxitr k dd ? avg mate the transition energy integral (eq. and is equal to But instead. we could. and is Vdd = ndd VT0 (29) equal to where ndd indicates the scaling factor used for 3V ? b V . A reasonable value for ndd for taking the average of the initial (Vdd ) and . portionately to VT0 . let's just make the rough approximation that Vdr (t) is constant.e. when the supply voltage is at Vdd =2. 17 to be roughly = 3n ndd4b constant. To illustrate what a typical value of cdd might be. average Z tr Etr = I(t)VDS (t)dt (23) body-e ected threshold 25% above VT0 ). we can substitute Vdd and Vdr in eq. we can 2 2 CLVT0 . with an average 22. IVDS tr (24) Anyway. 29 and nal (Vdd =2) values of VGS (t). Using the correct r dr formulas for VGS and VT . (32) consider VDS as given by eq. We set the up. The reason for expressing rameter VT0 . Anyway. we can now write the transition energy formula (32) as just I (25) = I kV tr dr 2 2 tr I (34) Etr = cdd CLVT0 : = kV (26) tr k dr 6 . 2) and derive a fairly simple expression for Etr in the and let us nally just make this a bit more slow-transition limiting case.

Unless most of the capacitance is in the interconnects. is that it decreases linearly with increasing transition time tr . This means that there is some speed at which the energy per operation of an SCRL circuit is minimized. and helps guide us in designing these circuits. The . In standard CMOS. the e ective on-resistance of our transistors increases. from the time it rst holds one valid value to the time it rst holds the next. Let us consider what happens to a signal wire in an SCRL circuit during a complete cycle. the switching energy dominates. and use eq. the other from Vdd =2 to the new value. Intuitively. causing higher dissipation. let's go a little further. leading to the conclusion that the energy per operation of SCRL circuits can be made arbitrarily small by just making the transition time larger. During this time there will be two complete transitions on the wire: one from the old value to Vdd =2. But now.1 Adjusting Speed 3 Trading o Switching Energy and Leakage Energy One often-cited characteristic of the switching energy of adiabatic circuits. The other very interesting point is that given a constant ndd ratio between supply and threshold voltages. However. and at lower speeds. the switching energy of SCRL circuits decreases only linearly with decreasing threshold voltage. based on equations like eq. The reason is that higher capacitance means higher currents through our transistors. the voltage drop across the transistors during switching is already as high as possible. the leakage energy dominates. when compared to equations like eq. If most capacitance is in transistor gates and PN junctions.There are a couple of very interesting things to note about equation 34. 3. and thus contributes a term to total energy per operation that increases linearly with increasing time per operation. in contrast to traditional CMOS where the CV 2 dissipation scales only linearly with capacitance. this statement is somewhat misleading. then increasing transistor widths increases energy dissipation roughly linearly (not quadratically. and so making them more resistive doesn't a ect the dissipation at all. because MOS transistors have a signi cant leakage power dissipation that is always present. given current device technologies. because k is scaled too). because it allows us to predict the switching energy of SCRL circuits constructed in particular process technologies. so the voltage drop across the transistors during transitions is increased. The rst thing is that the transition energy in eq. Equation 34 is interesting and useful on 7 its own. and thus a larger voltage drop across them. less capacitive transistors. in contrast to the quadratic drop of traditional CMOS due to its CV 2 switching energy. 34 as part of a more sophisticated analysis of SCRL energy dissipation that includes the e ects of leakage. 3 that govern the dissipation in fast SCRL transitions or ordinary CMOS transitions. In this section we derive a formula for the optimal rise time for minimizing total energy per operation. the reason is because as voltages go down in SCRL. minimum-sized transistors are favored. 34 scales in proportion to the square of the load capacitance. and everything else but VT0 also constant. The ip side of this coin is that SCRL bene ts greatly from improved process technologies that allow smaller. So in designing SCRL circuits we must be even more careful to get load capacitances small than we are in regular CMOS. in addition to greater charge to move across that drop. at faster speeds. 34.

let us collapse everything except tr into r r coe cients a and b: 2cdd C VT0 (43) tr = L nt kPleak 2 V =k a 2cdd CL T0 (37) b Pleak nt (38) At this minimum-energy setting for tr . power associated with the signal wire: ing energy equals the leakage energy: Etot = 2Etr + Pleak T (35) d a 2 (40) = 2cdd CLVT0 + Pleak nt tr : (36) dtr tr + btr = 0 tr k ? ta2 + b = 0 (41) where the multiplication by 2 comes from the r above-mentioned fact that an SCRL wire uns r 2 dergoes two transitions per cycle.total time for the complete cycle depends on the number of phases in the particular SCRL clocking discipline in question.Figure 6: How total energy/operation scales pressed in terms of Pleak . These numbers are probably not minimal. 3-phase and 4-phase SCRL take 24 transitions. A complete cycle of the 2-phase SCRL described by Younis 1] is the length of 18 transitions. the average leakage with tr in SCRL. In between. which turns out to be where the switchk 8 . Now we can write down an expression for the total energy dissipation associated with this signal wire per complete cycle. Anyway. the total energy dissipation is: Etot = ta + btr : (39) r Etot = ta + btr (44) r p Figure 6 shows how the total energy in (45) Emin = pa + b a=b eq. including terms for both the transition energy and the leakage energy. where the leakage energy is ex. let nt be the number of transitions per cycle. r 2 VT0 it's just where the derivative of eq. Etot is high because of the high switching p p energy. the total cycle time is then T = nt tr . 39 equals = 2 2cdd CL Pleak nt (49) zero. and at very low values of = a=b + b2 a=b (46) tr . kPleak nt First. a = 2cdd CLVT0 tr = b (42) We want to nd the tr that minimizes Etot. Etot is high because of the a2 p high leakage energy. there is a point where the = ab + ab (47) total energy is minimized. 39 scales with tr . p (48) = 2 ab We can nd a formula for the tr at this point. etc. We can see that at very a=b s high values of tr .

and the leakage power. Then we'd want to maximize the gain factor k of our transistors. this also increases the capacitance. which is ideally 0 but in practice is perhaps closer to 1.P Emin = 2 2cdd nt CL VT0k leak (50) Looking at eq. and19 transistor for about 24 of the cycle (this latter gure is adjusted to take into account the smaller voltage drops that occur during transitions). So narrower transistors are favored. Now. as follows. and the others contribute small amounts to the total leakage power. All these leakages occur through o devices that have a VGS of zero. T is the absolute temperature.2 Adjusting Threshold are found empirically to have a greater dependence of leakage on temperature than is predicted by the theoretical ideal. One may carry out a careful analysis of leakage based on the timing diagram of Younis's 3-phase clocking cycle. we can identify two types of leakage: (1) leakage through the middle of a logic gate across a voltage drop of Vdd when the gate's supply rails are split. 50. we will want to rst minimize the wiring capacitance and other parasitic capacitances we need to drive. ? p r In a single transistor across which there is a voltage drop of VDS = Vdd .e. However. However. During some transitions. so as to minimize the quantity VT0 Pleak in eq. and is a technology-dependent constant fudge factor. the leakage power Pleak is given by Pleak = Ileak Vdd (51) = Ileak ndd VT0 (52) and Ileak for transistors that are supposed to be \o " (VGS VT ) is given by a standard formula Ileak = I0 e(VGS ?VT )=((1+ )kBT=q) (53) where I0 denotes the leakage current when the transistor is just barely on the edge of being o (i. when VGS = VT ).. q is the magnitude of the electron charge. other o devices with VGS < 0 have exponentially less leakage. one nds that for each signal wire. Let I0G be the e ective I0 in the pullup/pulldown net- . and so we ignore them. depending on how the devices are sized relative to each other. In static versions of SCRL such as Younis's 3-phase clocking scheme. in a way which we will now analyze. and also remembering that if a logic gate is not a simple inverter but rather contains several parallel paths. 50. but uctuates during the SCRL cycle as di erent rails split and merge. However. Further. kB is Boltzmann's constant. the I0 for the leakage inside logic gates may be di erent than the I0 for the leakage through the pass transistors. if we want the energy per operation of an SCRL circuit to be as low as possible. Some of these happen when VGS < 0. there is leakage inside one of the logic gates that drive that wire during 22 leakage through a pass 24 of each cycle. is needed because real devices 9 3. But choosing the optimal VT0 is actually a bit tricky. there may be leakage through all of the paths. and (2) leakage through a turned-o pass transistor across a voltage drop of Vdd =2. since Pleak itself depends on VT0. there are also leakages across voltage drops smaller than Vdd =2. Ideally we'd like to get a handle on minimum energy by adjusting the threshold voltage. which we will later see su ces to model the leakage through all the transistors attached to a given SCRL signal wire. all of these factors can incorporated into our de nition of the e ective I0 for the SCRL signal wire. if we try to increase k by making the transistors wider. We will not relate the analysis in detail here. the leakage in SCRL circuits is not really continuous.

Let I0P be the I0 through our pass transistors (taking into account their widths). 0 Now that we've gotten Ileak expressed in terms of VT0. Further. It can be interpreted as the drive voltage required to turn on a standard-length transistor strongly enough to conduct current at some xed multiple of the transistor's zero-drive leakage current I0 . This substitution is valid because the other factor in eq. nodes are not always being actively driven. since VGS = 0 for all the signi cant leakage. we can reexpress the minimum energy as p (61) s 2s2cdd nt ndd tn3 dd = 2 3n 2n? 4b (62) dd avg 0 0 2 Perhaps v is related to the drive voltage needed c for strong inversion. and so high leakages can harm functionality as well as dissipating power.) In such a designs. Then we just de ne the e ective I0 for the single-transistor equivalent model of the SCRL signal wire's average leakage as 22 1 I0 = I0G 24 + I0P 2 19 (54) 24 1 where the 2 compensates for the fact that the leakage through the pass transistors involves a voltage drop of Vdd =2 rather than Vdd . I0 =k can be thought of as a widthindependent voltage vc that is characteristic of the particular device technology being used. We further note that almost all of the leakage takes place when VGS = 0 and VSB = 0. 52 & 57 back into our expression for Emin (eq. that in 2-phase SCRL. (Larger fanins yieldp larger e ective length. kB T=q (55) t 0 (1 + ) t (56) t Now we can re-express the leakage current as just Ileak I0 e?VT0 = t : (57) Although the above method for estimating Ileak was developed for the particular case of static 3-phase SCRL. so that at these times VT = VT0. It scales up with increasing length however (because k scales down proportionally. we note that since I0 and k both scale roughly proportionally to transisp tor width. 50): r ? p P Emin = 2 2cdd nt CL VT0k leak (58) ? p = 2 2cdd nt CL s VT0(ndd VT0 )I0 e?VT0 = t (59) k ? p = 2 2cdd nt ndd r ! 2 CL VT0 Ik0 e? 1 VT0= t (60) To make this formula easier to work with. Remember. Also. 53 (the exponential) doesn't depend on the magnitude of the VDS voltage drop or on which kind of leakage we are looking at. indicating that SCRL favors designing with minimum-length devices and small gate fan-ins. but I0 does not scale down as much). let's merge eqs. 53. therefore the analysis later in this section will probably not be appropriate for dynamic 2-phase clocking. the voltage factor I0 =k is basically independent of transistor width. 10 .works of our logic gates (taking into account the widths of devices and number of parallel paths). however. and we can substitute VT0 for VT in eq. with appropriate modi cations to eq. we'll express the factor involving the SCRL power and timing parameters ndd and nt as just s. for conciseness let's de ne convenient notations for the thermal voltage kB T=q with and without the (1 + ) fudge factor. it is fairly clear that the same approach could be carried out similarly for other SCRL clocking schemes as well.2 Given the above de nitions. I need to look into this sometime. 54.

Therefore. a Vdd that is as our model for general CMOS logic gates. which allows us to run at exponentially slower speeds and still not have leakage dom. However.voltage will probably not be high enough to produce strong inversion. of the above analysis. the energy/op can be made cuit technique is actually better for achieving arbitrarily small as well. and the square-law equation (6) will probably not accurately represent the source-drain current of our transistors. (63) 3. 7 also suggests that at very compare the results. we will assume vc I0 =k 2 = sCL vc VT0e? 1 VT0 = p 0 4 Comparison vs. which. Ap. will serve or below the thermal voltage. above be validly applied.tionship varies with temperature. the minimum energy/op of is. the high leakage power will call for a very short rise time from eq. CMOS 11 . threshold static-CMOS circuits. where CMOS?" That is the question we will address higher thresholds mean quadratically larger in the next section.energy dissipation of SCRL circuits. gure 8 shows an ordinary the curve occurs when VT0 = 2 0t. for our purposes adjusted thermal voltage. which thus allows ex. determined by equations like eq. assuming we are operating Figure 7 shows qualitatively how Emin scales within a regime where the above analysis can as VT0 is changed.. only a small xed multiple of the threshold As with our SCRL analysis. which will invalidate the assumptions upon which the analysis of section 2. and determine which cirlow thresholds. 43. twice the static CMOS inverter.2 was based.. 3. Perhaps surprisingly. 64) actually better than that threshold voltage is increased! This contrasts achievable via voltage scaling in standard with the situation in standard CMOS.Having successfully determined the minimum inate the total energy. the burning question now a certain point. pendix A shows that the maximum point on For reference.3 Range of Validity t (64) . this part minimal energy dissipation. so that we can The curve in g. upon which the above analysis was based. we would ponentially less energy to be dissipated during now like to perform a similar analysis for lowour quasistatic charging at high thresholds. at low thresholds. Moreover. The di erence in SCRL is that higher thresholds mean exponentially smaller leakage power. and how this relaof the curve is probably not accurate. Emin In any case. At thresholds near of determining minimum energy. \Is SCRL's minimum energy/operation (as SCRL actually decreases exponentially as the given by eq. let's now make an e ort to deterFigure 7: How minimum energy/operation mine some expressions for the range of validity scales with VT0 in SCRL. switching energy. it will not be large compared to the e ective RC of our transistors. if the rise time is too short.

To simplify the analysis further. meanwhile the conductance of the PFET is decreasing. and that the falling edge has the same shape as the rising edge. and that the gain factors of the N and P devices are the same. the input voltage during the rising edge is given by (65) Vin (t) = tt Vdd : r Given this. as shown in gure 10. the NFET starts conducting signi cantly and pulling the output voltage Vout low. When Vin exceeds the threshold voltage VTn of the NFET. Figure 9 is a familiar illustration of the dynamic behavior of the CMOS inverter. we can tr Figure 9: CMOS inverter dynamic behavior. Since we presume that the inverter is driven by a similar gate. The input voltage Vin rises (the falling case is similar) from 0V to Vdd in a time tr . These allow us to begin writing down some simple dynamic equations for the system. The falling edge is delayed from the rising edge by an amount td . the output voltage Vout falls almost all the way to 0V. since that is the delay between the rising edge starting to rise and the falling edge starting to fall. because as stated earlier. we can derive an expression for the delay td . tr 12 . that the pullup and pulldown networks behave equivalently to single PMOS and NMOS transistors. we approximate the rise/fall curves with straight lines. Taking t = 0 to be the time at the start of the rising edge. in a time tf . and turns almost completely o when the input voltage rises above the PFET threshold. Therefore. with gain factors kn = kp = k which are assumed to be made equal via appropriate sizing (matching the rise and fall times). As a result. td should be about equal to the time for the rising edge to exceed the VTn threshold. we suppose that tf = tr . tf ≈ tr Vdd VTp Vout VM Vin VTn 0V td Figure 10: Simpli ed CMOS dynamic model.V dd Vdd tr kp Vin kn Vout CL VTp VM VTn 0V Vin td Vout td Figure 8: Ordinary CMOS inverter.

So let's just come up with a simple formula for an average I that has the right overall order of magnitide and the right scaling properties. but it 0 until some interior nodes have been charged to the is expected that the overall scaling behavior right level. just V (68) tr = CLI dd (69) = CLndd VT0 I to threshold/supply voltage. according to a square law current equation such as eq.3 there is no body e ect. Since the source-to-bulk voltage for these devices is zero. Now. 29). however if we imagine that interior node capacitance is small we might say this charging happens implied here (except perhaps with regards to ndd ) will be found to be essentially correct. This is very crude but suitable for our purposes. The maximum value of VGS is Vdd = ndd VT0 . VSB of some transistors might not be up with a more accurate formula for I.Estimating the net output current is tricky. i. but let's keep in mind that really ? 2 2 (74) I = (ndd 4 1) kVT0 : all we care about in this document is to get a picture of the overall scaling laws. It is not constant or linear during the transition since the NFET will be turning on. this is only true for the inverter. The reason we are interested in td for our energy analysis is because it will determine how long we must wait before cycling new inputs into a pipeline of CMOS gates. At this point we might throw up our hands I that shows how it scales with k and VT0. with respect 3 Actually.e. our normal threshold voltage. the transition time tr of the output is just the time required for the (assumed constant) net output current I to charge up the load capacitance CL from 0 to Vdd . but we are not too con dent in the accuracy of this formula if ndd is very di erent from 4. Without further justi cation. VT = VT0. and give up. I Isat =2: (70) The standard formula for the saturation current (ignoring short-channel e ects) is Isat = k (VGS ? VT )2 : 2 (71) As noted earlier. which will determine the amount of energy that is dissipated each cycle by leakage. and so we do not care so much small constant factor errors in our formulas. quickly enough to be ignored. We will nd that Vin (t) = VT0 at time (66) td = VT0 tr Vdd = tr =ndd (67) where ndd is some standard ratio of Vdd =VT0 such as 4. unlike in our SCRL anal. 6. we announce that we will approximate the average net current as just half of the maximum saturation current through the NFET. and VTn = VT0 .So now we have an equation for the average ysis.. Some rough hand-estimates show this formula to be approximately right for the linear model in 10 where ndd = 4. in more It is left as an exercise for the reader to come complex gates. and the PFET o . So the maximum saturation current is (72) Isat = k (ndd ? 1)VT0 ]2 2 2 = k (ndd ? 1)2VT0 : (73) 2 . And VDS itself is not constant either. 13 approximate the delay as the time for the input voltage to exceed the NFET threshold VTn . just like we had in section 2 (eq.

short-circuit energy. and leakage energy. the voltage change during switching is V = Vdd . which will happen if we assume that ndd < 2. with xed ndd . let's go ahead and plug eq. it's probably within a factor of two. eq. making the circuit slower. C (77) td = (n 4 1)2 kVL : ? dd T0 Now it's time to begin analyzing the total energy dissipation for a CMOS circuit. L T0 ndd (83) showing that the short-circuit energy scales with CV 2 just like the switching energy from eq. Contrast this with the situation in SCRL. In normal CMOS. 10. 74 that we used in expressing the net output current. Short-circuit energy is dissipated by the current that ows through the PFET and the NFET during the period of switching when both devices are turned on. as a sum of the switching energy. that the rise time scales up proportionally to load capacitance. Interestingly. 76) and I (eq.) The voltage drop across which this current falls is Vdd = ndd VT0 . 80.Now. CL ndd (75) tr = (ndd?1)2 VT02 4 kVT0 C dd = (n 4n? 1)2 kVL : (76) dd T0 This says. let's look at the short circuit energy Ess . 74 back into eq. we'll worry about making better approximations in a later version of this document. (Hey. 1 2 Esw = sw 2 CLVdd (79) = sw n2 dd 2 2 CLVT0: (80) Now. 3. minimum-energy SCRL is still slower than CMOS. Thus the sum of switching and shortcircuit energy can be conveniently expressed 14 . minimum-energy SCRL is very much slower than CMOS. let's plug eq. fairly intuitively. 76 back into eq. Anyway. and at high thresholds. 74) to get Ess = ndd ? 2 CL 4ndd ndd (ndd ? 1)2 kVT0 (ndd ? 1)2 kV 2 n V : (82) T0 dd T0 4 sw which simpli es to Ess = sw ndd ? 2 C V 2 . as VT0 goes down. Given our linear model in g.2. so we will multiply all this by the activity factor sw to get the average short-circuit energy: Ess = sw ndd ? 2 t In V : r dd T0 ndd (81) We can substitute our expressions for tr (eq. The current during this transition we will crudely estimate as being the same I (half the saturation current) from eq. We multiply this by an activity factor sw giving the probability of switching during a given operation to yield the expected switching energy per operation Esw . the length of this period is (ndd ? 2)=ndd of the total transition time tr . 67 to get our new formula for the delay. we will see later that at the lowest feasible thresholds. where the switching rise time for minimum energy actually goes down as the threshold decreases. short-circuit dissipation only occurs if the input actually changes. the rise time scales up. and scales down with the transistor gain factor. But actually. One might at rst think that this implies that SCRL will be faster than CMOS at su ciently low thresholds. Etot = Esw + Ess + Eleak : (78) We saw the general equation for switching energy back in section 2. 69. Also.

It is probably being kind to CMOS to make the cycle time this short. to ensure correct functionality.as a constant times CV 2. However. However. let's write the cycle time as described and expand the expression using eqs. 43). being the time from the start of one input transistion to the next. Now. 3. If we take this as our cycle time. the cycle time. However. we can write the leakage energy as the leakage current times the supply voltage times the cycle time. 78). it can be decreased by using shorter pipeline stages or by carefully matching delays of parallel circuit paths. will probably not be able to be much smaller than. Anyway. n2 ndd ? 2 dd Esw + Ess = sw 2 + n dd 2 CLVT0: (84) Now let's analyze the leakage energy Eleak . we just saw Esw + Ess does not depend on the cycle time. to nd the point of minimal ET0. the reader should be aware that the I0 used here will not in general be the same as the e ective I0 for the equivalent SCRL circuit which we saw in sec. the leakage energy is going to depend on the rate at which we will be clocking the circuit. so this will make our evaluation of SCRL's bene t a little conservative. because it is about the same as we used for SCRL earlier. (We can see that this point is a minimum rather than a maximum by inspecting the graph of the formula. 0 Eleak t (1 + ) t (87) = I0 e?VT0 = t (ndd VT0) 4(ndd + 1) CL (88) (ndd ? 1)2 kVT0 dd + = 4ndd (n? 1)21) CLI0 e?VT0 = t (ndd k (89) 0 0 Now. in CMOS. which is the third and nal component of the total CMOS energy Etot (eq. I am not going to go through the detailed justi cation of the leakage current factor below. 54. As with SCRL. In SCRL we saw that the tradeo between leakage and switching energy led to an expression for the optimal cycle length (eq. which we have not yet speci ed. 76 and 77: tcyc tr + td (85) 4(ndd + 1) CL : (86) = (n ? 1)2 kV dd T0 Now.) dEtot = 0 (94) dVT0 2 d c0VT0 + c1e?VT0 = t = 0 (95) dVT0 2c0VT0 ? c1 e?VT0 = t = 0 (96) 0 0 0 t 15 . the transition time plus the delay. n2 ndd ? 2 dd (90) c0 sw 2 + n dd 4ndd (ndd + 1) I0 c1 (91) (n ? 1)2 k 2 Etot = c0CL VT0 + c1 CLe?VT0 = t (92) 2 = CL c0VT0 + c1e?VT0 = t (93) 0 0 dd Now. eq. we are saying that our gates will wait until the output reaches a valid level before beginning the next input transition. we can nally write a complete formula for Etot . or in other words when the cycle time is made as short as possible. say. the leakage energy will be greater the longer the cycle is.2. let's set the derivative of this formula with respect to ET0 equal to zero. so the total energy will be minimized when the leakage energy is minimized. Cycle time is partly an architectural issue.

and numerical solution of eq. Instead.59 20. Let's asp sume I0 =k = 70mV since that was the value I calculated earlier for an inverter in the HP14 process. 102 analytically. 93 more explicit: 9 Etot = 2 r2 0t2 + 80 Ik0 e?r CL 9 (107) 16 . (103) c2 = 2cc1 02 0 t = = 4ndd(ndd+1) I0 (ndd?1)2 k n2 ndd?2 02 dd 2 sw 2 + ndd t 80 I0 1 153 k sw 0t2 T ( K) r = VT0= 0t VT0 (mV) 450 0. is slightly below the thermal (104) (105) Let's assume random input bits. So it is fair to say that the optimal threshold voltage for CMOS is close to the thermal voltage for a wide range of temperatures.9. but for most of the reasonable temperature range. we nally have a basis for comparing CMOS's minimum energy dissipation at a given temperature to SCRL's. at room temperature. assumptions. But rst. I know of no way to solve eq.24 21. it stays fairly close to 1. at room temperature c2 is about 1. peaking at about 21. With ndd = 4. let's plug in some numbers.2c0VT0 = c10 e?VT0 = t r VT0= 0t 2c0 r 0t = c10 e?r t c1 r = re 2c0 0t2 c1 c2 2c0 0t2 r = c re 2 0 t (97) (98) (99) (100) (101) (102) We'd like to solve this for r. rameter choices.5 250 1. To nd typical values.3 350 0. so that the probability of switching sw is 0. So I get c2 (35:8mV= t)2 : (106) Interestingly.1 50 3.77 0.7 200 1. or VT0 21:5mV.4 100 2.70 21.0 10 5.1 300 0. table 1 shows how r scales among a range of temperatures. voltage.10 18.9 1 9.7 mV at about 250 K. and pa- Using the parameter choices above. let's plug in our chosen values of the parameters sw = 0:5 and ndd = 4 to make eq.51 19. subject to all the above crude approximations.01 13. especially as absolute zero is approached. given these values of r which yield minimal total energy dissipation. 102 gives r = 0:83.8 400 0. We can see that r increases somewhat as the temperature goes down. the fudge factor in the leakage current. 102.8 Table 1: How the optimal threshold voltage in CMOS scales with temperature.4 150 1. Let's also plug in a typical value of 1 for . Thus.58 20. These numbers were derived through numerical solution of eq.71 4. the optimal threshold voltage for standard CMOS. Additionally.5. to get a formula for how the optimal ratio of threshold voltage to thermal voltage scales with temperature. until the temperature goes below about 100 K. but unfortunately.01 21. with the above choices of parameters. we must resort to numerical methods to solve r for particular values of c2 .83 21. Now. the optimal threshold voltage stays very close to 20 mV.

2] S. F. 0 = dEmin (108) dVT0 1 d = dV sCL vc VT0 e? 2 VT0 = t (109) T0 d 1 = sCLvc dV VT0 e? 2 VT0 = t (110) T0 d V e? 1 VT0 = t = dV (111) T0 2 0 0 0 References 1] S. 64 we can easily derive the value of VT0 that maximizes SCRL's minimum energy. Younis and T. 1994.. we are nding the maximum point of eq.now do similar for SCRL ] 5 Conclusion Duh. Younis.. Asymptotically zero energy split-level charge recovery logic.e.. G. 7. Knight. G. I. Appendix A: Worst-case VT0 for SCRL In this section we derive and analyze the implications of a formula for the VT0 that leads to the maximum energy dissipation for SCRL circuits whose speed is adjusted for minimum energy at the given threshold. In International Workshop on Low Power Design. Jr. 64 illustrated in g. Asymptotically Zero Energy Intelligence Laboratory. pages 177{182. PhD thesis.. From eq. 1994. by setting the derivative of Emin with respect to VT0 equal to zero. MIT Arti cial 1 de? 2 VT0 = t + e? 1 VT0 = t dV(112) T0 2 = VT0 dV dVT0 T0 1 de? 2 VT0 = t + e? 1 VT0 = t (113) 2 = VT0 dV T0 0 0 0 0 T0 1 ?e? 2 VT0= 0 t 1 de? 2 VT0= t (114) = VT0 dV T0 d(? 1 V = 0t ) 2 = VT0 e? 1 VT0 = t 2 T0(115) dV 0 0 0 2 = VT0 e? 1 VT0 = t (?1=2 0t)(116) T0 VT0 = 2 0t 2 ?e? 1 VT0 = t = ? 1 VT0= t (117) 2 e (?1=2 0t) 0 0 (118) VT0 = 2(1 + )kBT=q 17 (119) . Computing Using Split-Level Charge Recovery Logic.

we had tr = Pleak VT0 2cdd C VT0 L nt kPleak (124) = I0 e?VT0 =((1+ )kB T=q)ndd VT0 (125) = 2(1 + )kB T=q: (126) Pleak = I0 e?2 ndd VT0. and 119). or on the load capacitance CL. actually.12 W per cell of my chip. This seems pretty reasonable. i. when the speed is adjusted for minimum energy. 119 back into eq. At room temperature. Let's go back now a little bit and gure out what the speed is when operating at the worstcase threshold. the 170fF load capacitance per signal I estimated for my Billiard Ball chip comes out to 3 ns per edge. 52. Need to recheck this! 18 .. according to my calculations. 125. rather than the relatively ine cient billiard-ball model. Note that this result does not depend on what kind of SCRL cycle we are using. we get (127) and if we then plug this value back into eq. it would be a little worse for 3-phase).6 ns per picoFarad of load capacitance. as 18. or on how large the transistors are! Just for fun. which is 100 times better than the DEC StrongARM.8 fJ/pF. 43. 4 I'm not sure I estimated I correctly from the 0 HP14 manual. the maximum possible energy dissipation of SCRL circuits. for a line that is driven by a minimum-sized NFET. I estimate the vc for a simple inverter in the HP14 process to be around 70 mV4. and s0 for 2-phase SCRL with ndd = 4 and a reasonable body-e ect fudge factor to be about 10. When I did the above example calculations. Note that this worst-case energy is proportional to temperature.5 mW for a whole (8mm)2 . which comes out to 57 ns/cycle or 17 MHz if I used 2-phase clocking (which I didn't. If the chip were implementing a better-designed architecture that performed 1 MIPS/MHz. 1 Emin = sCL vc VT0e? 2 VT0 = Emm = sCL vc (2 t)e?1 s0 2s=e 0 57. let's now see what we get when we plug eq. 64. being maximized with respect to cycle time and minimized with respect to threshold voltage. Therefore. two femto-Joules per pico-Farad of load capacitance. For example. considering the power comes out to only 0. in a process like HP14 but with an worst-case threshold voltage. my treatment of leakage currents was less sor t (120) (121) (122) (123) Emm = s0 CLvc t As an example. comes out to be 1. This seems pretty good. From earlier (see eqs. when corrected by the technology-dependent fudge factor (1 + ). 124. this would be a MIPS/Watt ratio of about 40. we get 2cdd C e2 (128) tr = L kI n nt 0 dd r C dd (129) = e n2cn p L kI0 t dd which comes out.000. and call the result Emm since it is the \maximum minimum" energy. 4000-cell chip|even at the VT0 yielding worstcase energy. s r r If we expand the rst occurrence of VT0 in eq. t is 26 mV. per complete SCRL cycle of operation (from one input to the next). for an HP14-like process. or only 0.e.A fascinating result! The minimal energy dissipation of SCRL circuits (with respect to speed) is maximized (with respect to threshold voltage) when the device threshold is equal to exactly twice the thermal voltage kB T=q.

phisticated than it is now. then tr for \worst-case threshold" given by this analysis will be very short. due to uncontrollable uctuations in dopant concentrations. The upshot is that we want the transistors to contribute about as much capacitance as the other. one should be careful to check for this possible problem. This is pretty intuitive. the power per SCRL signal wire in this case is always just a small constant times the thermal voltage times the e ective barely-o leakage current associated with the SCRL wire. However. however.] It is interesting to note that the above speed result is independent of temperature. and the transition time predicted may not even be su cient for correct functionality. at which point additional width doesn't help the speed any further. as can be seen from eq. in which case the whole analysis will be incorrect. perhaps even so short that it is comparable to RC. Also 2-phase clocking is not really appropriate. even if this is the case. if the fundamental I0 of our devices (which depends on the technology being used) happens to be very large compared to the gain factor k. The rst is that the entire analysis hinged on the assumption that tr was large compared to the RC time constant of our gates. if the threshold voltage is adjusted as described previously to maximize energy at the given temperature. Another very important quali cation is that the worst-case threshold voltage stated in eq. However. such as parasitics between wires. Another interesting thing about eq. and starts hurting the energy. and the times are slow enough that the analysis is fairly accurate. lowtemperature operation is still favored. and allowable thresholds must be restricted to be above a certain level. We should make a couple of qualifying remarks about the relevance of the results presented above. at least until the transistors start to dominate the load capacitance. when applying these results to other technologies. this is not the case. the worst-energy speed will also. My rough hand-calculations indicate that in the HP14 technology. Finally. functionality may be compromised if we attempt to use the very small threshold that is worst-case for a low operating temperature. Need to redo the calculations. What's the deep meaning of this? I don't know. (130) P = Emm nt tr p s0 CL t I0 =k p = p (131) nt e 2cdd =ntndd CL= kI0 p p s0 p I0 =k kI0 t = (132) nt e 2cdd =ntndd p (133) = (2s(1 + )=e) t I0 nt e 2cdd =nt ndd p p = 2 2 2cdd nt ndd (1 + ) t I0(134) nt e2 2cdd =ntndd + (135) = 4nt ndd (1 e2 ) t I0 nt P = 4ndd (1 + ) t I0 (136) e2 Interestingly. which gives the minimum energy achievable at a given VT0 and operating temperature. 64. let's derive the power when running at a worst-case threshold voltage but running at the best-case speed. unavoidable parts of the load capacitance. 129 is that since k and I0 both scale proportionally to transistor width. 19 . Therefore. 119 may not actually be reliably achievable in a given process technology.

Sign up to vote on this title
UsefulNot useful