Professional Documents
Culture Documents
(6)
Fig. 2. Delay of typical library gates over a wide voltage range, normalized to (7)
inverter delay.
where
that, from an energy efficiency point of view, the minimum oper- activity factor;
ating voltage for functionally correct operation does not provide number of stages;
the best results. switching energy of a single inverter;
total leakage power of the entire inverter chain;
III. MINIMUM ENERGY ANALYSIS delay of the entire inverter chain;
total switched capacitance of a single inverter;
A. Minimum Energy Point Modeling leakage current of a single inverter;
We first illustrate the energy dependence on supply voltage delay of a single inverter.
using a simple inverter chain consisting of 50 inverters and then Note that we assume that short-circuit power is negligible and
extend the analysis for more general circuits. A single transition can be ignored. This assumption is known to hold for well-
is used as a stimulus and energy is measured over the time period designed circuits in normal (super-threshold) operation [14].
necessary to propagate the transition through the chain. (Note Using the method in [15], we measured the short-circuit cur-
that simultaneous switching noise is also an important issue in rent for an inverter chain over a wide range of and have
DVS energy efficiency, and we will not address it in this paper found that the short-circuit energy percentage is less than 9% at
for simplicity.) The energy– relation is plotted in Fig. 3. The and even lower as is reduced further, which is smaller
dynamic energy component reduces quadratically while than that at superthreshold, as shown in Fig. 4. Thus, we ig-
the leakage energy increases with voltage scaling. The nore the short-circuit component in energy modeling. Although
reason for the increase in leakage energy in the subthreshold op- rise and fall time increases almost exponentially with the re-
erating regime is that as the voltage is scaled below the threshold duced in subthreshold operation, the average short-circuit
voltage, the on-current (and hence the circuit delay) decreases current also scales down almost exponentially with . There-
exponentially with voltage scaling, while the off-current is re- fore, short-circuit energy does not increase in subthreshold and,
duced less strongly. Hence, the leakage energy will rise in fact, diminishes because of the leakage energy increase.
and supersede the dynamic energy at about 180 mV. This First, we focus on finding an accurate estimate of .
effect creates a minimum energy point (referred to as ) in Let denote the ideal inverter delay with a step input
the inverter circuit that lies at 200 mV, as shown in Fig. 3. and denote the actual inverter delay with an input
In the previous example, we are implicitly assuming that there rising time of . We can compute based on a simple
is always one input transition per clock cycle. However, the charge-based expression
switching activity varies in different circuits and therefore we
include the input activity factor , which is the average number (8)
of times the node makes a power consuming transition in one
clock period. We now derive an analytical expression for the
energy of an inverter chain as a function of the supply voltage. where is the average on-current of an inverter. can be
Suppose we have an -stage inverter chain with activity factor estimated with alpha-power law [5] in superthreshold and takes
of . The standard expression for subthreshold current is given the form of (5) (with and substituted by ) in the sub-
by2 threshold regime. Furthermore, for normal operating voltages,
the step delay can be extended to the actual delay as follows
[21]:
(5)
(12)
(13)
However, we need to test if this linear model is valid for sub-
threshold operation. To justify the linear modeling of Note that here is subthreshold “on” current because we are
with at such a wide supply voltage range, we plot the focusing on the subthreshold region where occurs. By sub-
calculated as a function of , based on SPICE simulation in stituting (5) into (13) ( is obtained by substituting and
Fig. 5. in (5) with 0 and , respectively), we now arrive at our
From Fig. 5, it is clear that the coefficient increases as the final expression for the total energy as a function of supply
supply voltage is reduced to the subthreshold regime. Other fac- voltage for subthreshold operation
tors affecting the accuracy are that (5) does not perfectly model
in subthreshold operation3 and that voltage swing degrades (14)
3We find that, over the entire subthreshold region, (0 < V < V ); I
deviates from the simple exponential (5) by up to 20% if we treat mobility as Based on this simple expression of total energy, we can
constant. find the optimal minimum energy voltage by setting
ZHAI et al.: LIMIT OF DVS AND INSOMNIAC DVS 1243
. Letting and , we
obtain
(15)
(16)
Fig. 6. Energy–V for an inverter chain (n = 20).
(17)
Fig. 8. Minimal energy V with inverter effective stage number n . Fig. 10. Minimal energy V with NAND2 effective stage number n .
(18)
Fig. 9. NAND2 chain energy–V (SPICE). where the average leakage current ratio and on-current ratio are
from SPICE simulation.
Using this , we obtain an accurate match between the
with a high degree of accuracy for a wide range of effective in- modeled and SPICE simulation, as shown in Fig. 10. Other
verter chain lengths . complex gates can be modeled in a similar way by computing
We now consider the energy optimal voltage for more com- an appropriate value.
plex gates, such as NAND and NOR, as well as larger circuit This approach can be extended to larger circuit blocks as well.
blocks. Fig. 9 shows results of SPICE simulations for a NAND2. In Fig. 11, we show the total energy as a function of supply
Similar to the inverter chain case in Section III-A, a single transi- voltage obtained using SPICE for a 16 16 multiplier with ac-
tion is used as a stimulus and energy is measured over the signal tivity factor , where is the average switching activity
propagation through the chain. As can be seen, the minimum over all of the nodes in the circuits. The multiplier was obtained
voltage shifts right compared to the inverter chain, indi- by synthesizing the ISCA85 benchmark circuit c6288 [22] with
cating that the energy optimal voltage occurs at a higher voltage. an industrial 0.18- m process using a wireload model for par-
This is caused by the fact that, for a chain of NAND2 gates, the asitics. We estimate the total power consumption for large cir-
number of leaking PMOS transistors is doubled in every other cuit blocks such as this by extending the expression in (14) as
gate and NMOS transistors are twice the size to achieve com- follows:
parable performance. The capacitance increase does not affect
because the delay and switching energy are proportional (19)
to the loading . This is evident in (13) (if we write out a (20)
ZHAI et al.: LIMIT OF DVS AND INSOMNIAC DVS 1245
The dynamic performance management policy is based on exactly sufficient to just complete every task, we can compute
Vertigo [9] and ARM’s Intelligent Energy Manager.4 The distri- the optimal energy consumption with different values.
bution of the four available performance levels (with a highest The total energy is based on the proposed energy model of
frequency of 600 MHz) among the executed tasks is shown Section III-A for subthreshold voltage operation, combined with
in Fig. 12 for each application. This workload distribution is a simple fitted model for energy and delay at super-threshold op-
recorded from a real DVS processor running applications. As erating voltages. Note that we do not consider the sleep–wakeup
the bar graph shows, the processor spends significant time in energy overhead now since a more detailed energy model for
sleep mode, meaning that the processor completes many tasks this will be presented in Section IV. We show the
well ahead of schedule. Most importantly, we observed that, tradeoff for the first five applications in Fig. 14. As can be seen,
during the execution of all tasks, a run-then-idle pattern was the energy efficiency improves as the is reduced and levels
seen 50% of the time. This implies that many tasks could run off for most applications below 10%, which corresponds to a
at a frequency less than the minimum (50%) available on the of 30.7% (553 mV for a of 1.8 V). We also
processor if it was able to do so. Therefore, we convert these analyze the tradeoff for the idle-mode trace, in
traces to ideal continuous distributed performance levels and ob- which the processor is mostly in sleep mode and wakes up only
tain the histogram shown in Fig. 13. The histograms show that to do regular “housekeeping” chores for the operating system.
emacs, fs, and netscape spend most of their execution time in Note that this state can be quite common on a processor. The
the low-performance end. results are show that the energy continues to reduce down to a
By extending the lower limit of voltage scaling, the amount of performance level of 0.02%, corresponding to a of
idle time can be reduced, leading to more energy-efficient op- 13% (234 mV for a of 1.8 V). In such low-activity situa-
eration. Based on the previous analysis, energy efficiency can tions, the practical value approaches the theoretical
increase until it reaches the energy optimal voltage . In ad- levels of Section III-A. The energy savings of a more voltage
dition, by eliminating the need to enter a sleep state, any en- scalable processor over the traditional one are summarized in
ergy overhead due to switching to and from sleep mode is also Table II, demonstrating that substantial energy savings can be
avoided, further increasing the energy efficiency. obtained by extending the voltage range appropriately.
We therefore study the total energy consumption of the pro-
cessor as a function of the lower limit of the performance that the IV. ENERGY EFFICIENCY EVALUATION OF DIFFERENT
processor provides, denoted by . Assuming that we have an LOW-POWER APPROACHES
ideal performance scheduler that is able to set the performance
A. Detailed Energy Modeling
4[Online]. Available: http://www.arm.com/products/CPUs/cpu-arch-IEM. In this section, we compare the energy efficiency of different
html low power design approach. In the analysis we include the over-
ZHAI et al.: LIMIT OF DVS AND INSOMNIAC DVS 1247
(25)
where is the overhead energy when gating the
head that each specific low power technique incurs, as well as power supply, is the virtual supply rail capacitance,
the efficiency of the dc–dc voltage converter. First, we define is the total internal node capacitance of the circuit,
five different systems: and is the gate capacitance of the sleep transistors.
arises from three sources: the power rail charge loss,
• , a basic system with clock gating but without the circuit internal node charge loss, and sleep transistor gate
power-gating or DVS; charge needed to conduct power gating. Depending on whether
• , a system that employs power-gating during idle a header or footer transistor is used in , the processor
mode (clock gating is implied) but no DVS; power-gating will lose the voltage level on virtual or virtual ground,
has been an effective approach in leakage power reduction respectively, shortly after it enters sleep mode [28]. Without
[25]; loss of generality, we use the average between virtual ground
• , a partial DVS system with power-gating capability and virtual capacitance in our calculations. To consider the
where the minimum scalable voltage is , set to be internal node capacitances, we assume that half of the internal
; nodes are at each logic state .
• , a system similar to but without power- According to recent research [28], [29], the virtual power rail
gating; can be restored in several cycles if the circuit is carefully de-
• , an Insomniac system with aggressive voltage signed. Thus, the wakeup process can be treated as instanta-
scaling capability, down to the energy optimal voltage; neous compared to the thousands/millions of cycles of useful
note that in this system power grating is not necessary, operation.
since processor always runs in “just in time” completion In order to model and , we must know how a
mode. DVS system implements the voltage scaling process. There are
1248 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 13, NO. 11, NOVEMBER 2005
(30)
TABLE III where the is defined as the converting loss coefficient for
PHYSICAL PARAMETERS FROM AN ALPHA PROCESSOR IN A 0.18-m a specific regulator. Since the is small for nominal con-
TECHNOLOGY (WITHOUT CACHES CONSIDERED)
verting region, we can get the nominal converting efficiency as
(35)
Then, a simple regulator efficiency model is developed as
(36)
(32)
(33) (37)
In the following section, we include the regulator loss factor into
where is the converter conduction loss, directly depen- our computation by means of (37).
dent on load current and is the energy loss during the
transistor switch-on and switch-off transitions, dependent on the C. Energy Efficiency Evaluation Results
switching frequency , and is the fixed loss, depen-
dent on neither load current nor switching frequency. With energy models derived for the various low-power
With a more efficient regulator [30]–[32], such as a vari- schemes, we can now evaluate their energy efficiency under
able-frequency regulator, can be made to scale with load different applications. We studied the same applications as
current . Given the fact that regulator efficiency is nearly in Section III, i.e., emacs, konqueror, netscape, fs, and mpeg.
constant over a wide loading range [30], we can assume that the To make a fair comparison, we convert these traces to match
conduction loss and the switching loss scale in the Alpha processor that we are using in the analysis. By
the same way as . That is, applying four different low-power schemes to these workloads,
we computed the energy savings relative to based on
(34) the models in Section IV-A. The results are shown in Fig. 18.
5In the context of DVS, we usually need a buck dc–dc converter to realize
As the bar graph shows, if the voltage cannot scale as low
down-converting, therefore all the modeling in this section is based on an off- as the applications request , it is helpful to utilize
chip buck dc–dc converter. power-gating to save leakage energy. However, the
1250 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 13, NO. 11, NOVEMBER 2005
[5] T. Sakurai and A. Newton, “Alpha-power law MOSFET model and its
applications to CMOS inverter delay and other formulas,” IEEE J. Solid-
State Circuits, vol. 25, no. 2, pp. 584–594, Apr. 1990.
[6] B. Zhai, S. Hanson, D. Blaauw, and D. Sylvester, “Analysis and mit-
igation of variability in subthreshold design,” in Proc. Int. Symp. Low
Power Electronics and Design (ISLPED), 2005, pp. 20–25.
[7] M. Miyazaki, J. Kao, and A. Chandrakasan, “A 175 mV multiply-ac-
cumulate unit using an adaptive supply voltage and body bias (ASB)
architecture,” in Proc. Int. Solid-State Circuits Conf. (ISSCC), 2002, pp.
58–59.
[8] A. Wang and A. Chandrakasan, “A 180 mV FFT processor using sub-
threshold circuits techniques,” in Proc. Int. Solid-State Circuits Conf.
(ISSCC), 2004, pp. 292–294.
[9] K. Flautner and T. Mudge, “Vertigo: Automatic performance-setting for
linux,” in Proc. 5th Symp. Operating Systems Design Implementation,
Dec. 2002, pp. 105–116.
[10] J. D. Meindl and J. A. Davis, “The fundamental limit on binary switching
energy for terascale integration (TSI),” IEEE J. Solid-State Circuits, vol.
35, no. 10, pp. 1515–1516, Oct. 2000.
[11] R. M. Swanson and J. D. Meindl, “Ion-implanted complementary MOS
transistors in low-voltage circuits,” IEEE J. Solid-State Circuits, vol. 7,
Fig. 23. Energy savings with S for general workload. no. 4, pp. 146–153, Apr. 1972.
[12] F. Miller, “Algorithm and architecture of a 1 V low power hearing in-
strument DSP,” in Proc. Int. Symp. Low Power Electronics and Design
wide range of activity, and gives significant savings over (ISLPED), Aug. 1999, pp. 7–11.
[13] H. Soeleman, K. Roy, and B. Paul, “Robust ultra-low power
or . Only when activity is very low does the energy sav- sub-threshold DTMOS logic,” in Proc. Int. Symp. Low Power Elec-
ings become saturated, which occurs since the leakage domi- tronics and Design (ISLPED), 2000, pp. 25–30.
nates the overall system energy consumption. The system has [14] J. Rabaey, Digital Integrated Circuits: A Design Perspective. Upper
Saddle River, NJ: Prentice-Hall, 1996.
scaled below at this point, as described in Section III. [15] N. H. E. Weste and K. Eshraghian, Principles of CMOS VLSI Design, a
Systems Perspective, 2nd ed. Reading, MA: Addison-Wesley, 2000.
[16] H. Soeleman and K. Roy, “Ultra-low power digital subthreshold logic
V. CONCLUSION circuits,” in Proc. Int. Symp. Low Power Electronics and Design
(ISLPED), 1999, pp. 94–96.
In this paper, we developed analytical models for the most [17] S. C. Terry, J. M. Rochelle, D. M. Binkley, B. J. Blalock, and D. Foty,
energy efficient supply voltage for CMOS circuits. A “Comparison of a BSIM3v3 and EKV MOSFET model for a 0.5 m
CMOS process and implications for analog circuit design,” IEEE Trans.
number of interesting conclusions are drawn: 1) energy shows Nucl. Sci., vol. 50, no. 4, pp. 915–920, Aug. 2003.
a clear minimum in the subthreshold region since the time over [18] H. Soeleman and K. Roy, “Digital CMOS logic operation in the sub-
which a circuit is leaking (delay) grows exponentially in this re- threshold region,” in Proc. Great Lakes Symp. VLSI, Mar. 2000, pp.
107–112.
gion while leakage current itself does not drop as rapidly with [19] A. Forestier and M. R. Stan, “Limits to voltage scaling from the low
reduced ; 2) does not depend on if is smaller power perspective,” in Proc. 13th Symp. Integrated Circuits and Systems
than ; 3) the circuit logic depth and switching factor impact Design, Sept. 2000, pp. 365–370.
[20] A. Wang, A. P. Chandrakasan, and S. V. Kosonocky, “Optimal supply
since they relate to the relative contributions of leakage and threshold scaling for subthreshold CMOS circuits,” in Proc. IEEE
energy and active energy; and 4) the only technology param- Symp. VLSI, Apr. 2002, pp. 5–9.
eters relevant to are subthreshold swing and the depen- [21] D. Hodges and H. Jackson, Analysis and Design of Digital Integrated
Circuits. New York: McGraw-Hill, 1988.
dency of delay on input transition time. The proposed analytical [22] F. Brglez and H. Fujiwara, “A neural netlist of 10 combinational circuits
models are shown to match very well with SPICE simulations. and a target translator in fortran,” in Proc. IEEE ISCAS, Jun. 1985, pp.
We then compare the energy savings of different low-power 663–698.
[23] T. D. Burd and R. W. Brodersen, “Design issues for dynamic voltage
schemes, namely, pure MTCMOS, partial-DVS, partial-DVS scaling,” in Proc. Int. Symp. Low Power Electronics and Design
with MTCMOS, and Insomniac. The comparison for five ap- (ISLPED), 2000, pp. 9–14.
plications traces recorded on two commercial processors shows [24] R. Graybill and R. Melhem, Eds., Power Aware Computing. New
York: Kluwer Academic/Plenum, 2002.
that Insomniac provides the best efficiency. For instance, it can [25] W. Liao, J. M. Basile, and L. He, “Leakage power modeling and re-
provide 27% energy savings for emacs over the traditional DVS- duction with data retention,” in Proc. IEEE/ACM ICCAD, 2002, pp.
with-MTCMOS design. A comparison for arbitrary workloads 714–719.
[26] T. Burd and R. Brodersen, Eds., Energy Efficient Microprocessor De-
shows that for the majority of different application activity ratios sign. Norwell, MA: Kluwer.
Insomniac continues to yield the largest energy improvements. [27] K. Nowka et al., “A 32-bit PowerPC system-on-a-chip with support for
dynamic voltage scaling and dynamic frequency scaling,” IEEE J. Solid-
State Circuits, vol. 37, no. 11, pp. 1441–1447, Nov. 2002.
REFERENCES [28] J. Tschanz et al., “Dynamic sleep transistor and body bias for active
leakage power control of microprocessors,” IEEE J. Solid-State Circuits,
[1] Transmeta Crusoe. [Online]. Available: http://www.transmeta.com/ vol. 38, no. 11, pp. 1838–1845, Nov. 2003.
[2] Intel XScale. [Online]. Available: http://www.intel.com/design/intelxs- [29] S. Kim, S. Kosonocky, and D. Knebel, “Understanding and minimizing
cale/ ground bounce during mode transition of power gating structures,” in
[3] IBM PowerPC. [Online]. Available: http://www.chips.ibm.com/prod- Proc. Int. Symp. Low Power Electronics and Design (ISLPED), Aug.
ucts/powerpc/ 2003, pp. 22–25.
[4] K. Flautner, S. Reinhardt, and T. Mudge, “Automatic performance set- [30] R. Erickson and D. Maksimovic, “High efficiency DC-DC converters for
ting for dynamic voltage scaling,” in Proc. 7th Annu. Int. Conf. Mobile battery-operated systems with energy management,” Worldwide Wire-
Computing and Networking (MobiCom’01), May 2001, pp. 260–271. less Communications, Annu. Rev. Telecommunications, 1995.
1252 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 13, NO. 11, NOVEMBER 2005
[31] A. Dancy and A. Chandrakasan, “Techniques for aggressive supply Dennis Sylvester (S’95–M’00–SM’04) received the
voltage scaling and efficient regulation,” in Proc. IEEE Custom Inte- B.S. degree (summa cum laude) from the University
grated Circuits Conf. (CICC), 1997, pp. 579–586. of Michigan, Ann Arbor, in 1995, and the M.S. and
[32] J. Kim and M. A. Horowitz, “An efficient digital sliding controller for Ph.D. degrees from the University of California,
adaptive power-supply regulation,” IEEE J. Solid-State Circuits, vol. 37, Berkeley (UC-Berkeley), in 1997 and 1999, respec-
no. 5, pp. 639–647, May 2002. tively, all in electrical engineering.
[33] A. Chandrakasan, W. J. Bowhill, and F. Fox, Eds., Design of High-Per- He is now an Associate Professor of electrical en-
formance Microprocessor Circuits. Piscataway, NJ: IEEE Press, 2001. gineering with the University of Michigan. He previ-
[34] R. Hegde and N. Shanbhag, “Toward achieving energy efficiency in pres- ously held research staff positions with the Advanced
ence of deep submicron noise,” IEEE Trans. Very Large Scale Integr. Technology Group of Synopsys, Mountain View, CA,
(VLSI) Syst., vol. 8, no. 4, pp. 379–391, Aug. 2000. and with Hewlett-Packard Laboratories, Palo Alto,
CA. He has published numerous papers along with one book and several book
chapters in his field of research, which includes low-power circuit design and
design automation techniques, design-for-manufacturability, and on-chip inter-
connect modeling. He also serves as a consultant and technical advisory board
Bo Zhai (S’02) received the B.S. degree in micro- membor for several electronic design automation firms in these areas.
electronics from Peking University, Bejing, China, in Dr. Sylvester is a member of the Association for Computing Machinery
2002 and the M.S. degree in electrical engineering (ACM), the American Society of Engineering Education, and Eta Kappa Nu.
from the University of Michigan, Ann Arbor, in 2004, He was the recipient of a National Science Foundation CAREER Award, the
where he is currently working toward the Ph.D. de- 2000 Beatrice Winner Award at ISSCC, the 2004 IBM Faculty Award, and
gree in electrical engineering. several Best Paper Awards and nominations. He was the recipient of the ACM
He is a Research Assistant with the Advanced SIGDA Outstanding New Faculty Award, the 1938E Award from the College
Computer Architecture Laboratory, University of Engineering for teaching and mentoring, and the Henry Russel Award,
of Michigan, working with Prof. D. Blaauw. His which is the highest award given to faculty at the University of Michigan.
research focuses on low-power VLSI design. His dissertation research was recognized with the 2000 David J. Sakrison
Memorial Prize as the most outstanding research in the Electrical Engineering
and Computer Science Department of UC-Berkeley. He has served on the tech-
nical program committees of numerous design automation and circuit design
conferences and was General Chair of the 2003 ACM/IEEE System-Level
Interconnect Prediction (SLIP) Workshop and the 2005 ACM/IEEE Workshop
David Blaauw (M’93) received the B.S. degree in on Timing Issues in the Synthesis and Specification of Digital Systems (TAU).
physics and computer science from Duke University, He is currently an Associate Editor for the IEEE TRANSACTIONS ON VERY
Durham, NC, in 1986, the M.S. degree in computer LARGE SCALE INTEGRATION (VLSI) SYSTEMS. He also helped define the circuit
science from the University of Illinois, Urbana, in and physical design roadmap of the International Technology Roadmap for
1988, and the Ph.D. degree in computer science from Semiconductors (ITRS) U.S. Design Technology Working Group from 2001
the University of Illinois, Urbana, in 1991. to 2003.
He was with IBM Corporation as a Development
Staff Member until August 1993. From 1993 until
August 2001, he was with Motorola, Inc., Austin, TX,
where he was the Manager of the High Performance Krisztian Flautner (M’00) received the Ph.D. de-
Design Technology Group. Since August 2001, he gree in computer science and engineering from the
has been a member of the faculty at the University of Michigan as an Associate University of Michigan, Ann Arbor.
Professor. His work has focused on VLSI design and computer-aided design He is the Director of Advanced Research at
with particular emphasis on circuit design and optimization for high-perfor- ARM Ltd., Cambridge, U.K., and the architect of
mance and low-power designs. ARM’s Intelligent Energy Management technology.
Dr. Blaauw was the Technical Program Chair and General Chair for the Inter- His research interests include high-performance,
national Symposium on Low Power Electronic and Design in 1999 and 2000, low-power processing platforms to support advanced
respectively, and was the Technical Program Co-Chair and member of the Ex- software environments.
ecutive Committee the ACM/IEEE Design Automation Conference in 2000 and Dr. Flautner is a member of the Association of
2001. Computing Machinery.