You are on page 1of 22

Power

6.371 – Fall 2002

10/25/02

L15 – Power

1

1000 100 Power (Watts) 10 1
8086 8080

Power Trends
Pentium® 4 proc

1000W CPU?

Pentium® proc 386

0.1 1970

1980

1990

2000

2010

2020

[ Source: Intel ]

CMOS originally used for very low-power circuitry such as wristwatches Now some CPUs have power dissipation >100W
6.371 – Fall 2002 10/25/02 L15 – Power 2

Itanium Temperature Plot Cache 70°C Execution core Temp (oC) 120oC Integer & FP ALUs [ Source: Intel ] 6.371 – Fall 2002 10/25/02 L15 – Power 3 .

371 – Fall 2002 10/25/02 L15 – Power 4 .~2MW – 25% of running cost is in electricity supply for supplying power and running air-conditioning to remove heat Environmental concerns – ~2003. 100W each => – 100 GW = 40 Hoover Dams 100 GW 6.Power Concerns Power dissipation is limiting factor in many systems – – – – battery weight and life for portable devices packaging and cooling costs for tethered systems case temperature for laptop/wearable computers fan noise not acceptable in some settings Internet data center.000 servers. 1 billion PCs. ~8.

PMOS & NMOS both on during transition Subthreshold leakage. increasing due to very thin gate oxides 10/25/02 the dominant source of power dissipation today kept to <10% of capacitor charging current by making edges fast approaching 10-30% active power in <180nm technologies Gate leakage from electrons tunneling across gate oxide 6. transistors don’t turn off completely Diode leakage from parasitic source and drain diodes usually negligible was negligible.Power Dissipation in CMOS Short-Circuit Current Diode Leakage Current Subthreshold Leakage Current Gate Leakage Current Capacitor Charging Current CL Primary Components: Capacitor charging. energy is ½CV2 per transition Short-circuit current.371 – Fall 2002 L15 – Power 5 .

energy CLVDD2 removed from power supply After transition. the other ½ CLVDD2 was dissipated as heat in pullup resistance The ½ CLVDD2 energy stored in capacitor is dissipated in the pulldown resistance on next 1->0 transition 6.371 – Fall 2002 10/25/02 L15 – Power 6 .Energy to Charge Capacitor VDD Isupply Vout CL E0 → 1 = ∫ P(t) dt = VDD ∫ Isupply(t) dt 0 0 T T VDD = VDD ∫ CL dVout = CL VDD2 0 During 0->1 transition. ½ CLVDD2 stored in capacitor.

371 – Fall 2002 10/25/02 L15 – Power 7 .Power Formula Power = activity * frequency * (½ CVDD2 + VDDISC) + VDDISubthreshold + VDDIDiode + VDDIGate Activity is average number of transitions per clock cycle (clock has two) 6.

should plot E-D curve for each solution 6. ED2) often misleading for real designs – usually want minimum energy for given delay or minimum delay for given power budget – can’t scale all techniques across range of interest To fully compare alternatives.Energy versus Delay Energy A B C D Delay Popular to try to compress this 2D information into single number – Energy*Delay product – Energy*Delay2 – gives more weight to speed. mostly insensitive to supply voltage Constant Energy-Delay Product Many techniques can exchange energy for delay Single number (ED.371 – Fall 2002 10/25/02 L15 – Power 8 .

371 – Fall 2002 10/25/02 L15 – Power 9 .Switching Power Power ∝ activity * ½ CV2 * frequency Reduce Reduce Reduce Reduce activity switched capacitance C supply voltage V frequency 6.

Reducing Activity Clock Gating – – – – Global Clock Enable Latch (transparent on clock low) Gated Local Clock Q don’t clock flip-flop if not needed also gate precharge/evaluate in domino blocks avoids transitioning downstream logic Pentium-4 has hundreds of gated clock domains D – enable adds to control logic complexity Bus Encodings – choose encodings that minimize transitions on average (e.g. Gray code for address bus) – compression schemes (move fewer bits) Remove Glitches – balance logic paths to avoid glitches during settling – use monotonic logic (domino) 6.371 – Fall 2002 10/25/02 L15 – Power 10 ..

dynamic) Segmented structures A Bus B C Shared bus driven by A or B when sending values to C A B C Insert switch to isolate bus segment when B sending to C L15 – Power 11 6.371 – Fall 2002 10/25/02 .Reducing Switched Capacitance Reduce switched capacitance C Careful transistor sizing Tighter layout Different logic styles (logic. pass transistor.

V )α DD th α = 1− 2 CV DD Delay rises sharply as supply voltage approaches threshold voltages [Horowitz] 6.Reducing Supply Voltage Quadratic savings in energy per transition (½CVDD2) Circuit speed is reduced Must lower clock frequency to maintain correctness T = d k(V .371 – Fall 2002 10/25/02 L15 – Power 12 .

but runs longer) – Get some saving in battery life from reduction in rate of discharge 6.371 – Fall 2002 10/25/02 L15 – Power 13 . just reduces rate at which it is consumed (power lower.Reducing Frequency Doesn’t save energy.

can trade surplus performance for lower energy by reducing supply voltage until “just enough” performance Dynamic Voltage Scaling 6.Voltage Scaling for Reduced Energy Reducing supply voltage by 0.5 improves energy per transition by 0.25 Performance is reduced – need to use slower clock Can regain performance with parallel architecture Alternatively.371 – Fall 2002 10/25/02 L15 – Power 14 .

Parallel Architectures Reduce Energy at Constant Throughput 8-bit adder/comparator 40MHz at 5V.3x) Power = 0.9V.2 Pref Chandrakasan et. area = 690 kµ2 (1.7x) Power = 0. “Low-Power CMOS Digital Design”.36 Pref Two parallel interleaved adder/compare units One pipelined adder/compare unit 40MHz at 2.0V. area = 1.9V.371 – Fall 2002 10/25/02 L15 – Power 15 .800 kµ2 (3. April 1992 Pipelined and parallel 6. IEEE JSSC 27(4). area = 530 kµ2 Base power Pref 20MHz at 2. area = 1.39 Pref 20MHz at 2.4x) Power = 0.961 kµ2 (3. al.

) 6.“Just Enough” Performance Frequency Run fast then stop Run slower and just meet deadline t=0 Time t=deadline Save energy by reducing frequency and voltage to minimum necessary (usually done in O.S.371 – Fall 2002 10/25/02 L15 – Power 16 .

7 L15 – Power 17 .4 80.50 1.6 59.4 57.60 1.0 41.10 10/25/02 100.9 28.Voltage Scaling on Transmeta Crusoe TM5400 Frequency (MHz) Relative Performance (%) Voltage (V) Relative Energy (%) Relative Power (%) 700 600 500 400 300 200 6.65 1.0 57.40 1.371 – Fall 2002 100.7 71.1 42.0 94.6 72.0 100.0 85.4 24.0 82.6 12.4 44.6 1.25 1.

Micro 2000] 6.371 – Fall 2002 10/25/02 L15 – Power 18 . want to reduce threshold voltage as fast as supply voltage But subthreshold leakage is an exponential function of threshold voltage and temperature Isubthreshold = k e -q VT a kB T [ Butts.Leakage Power Under ideal scaling.

25m 0.1m 0.Rise in Leakage Power 250 Power (Watts) 200 150 100 50 0 Active Power Active Leakage power 120% 100% 80% 60% 40% 20% 0% 0.371 – Fall 2002 10/25/02 L15 – Power 19 .13m 0.07m Technology Technology [ Intel] 6.18m 0.

so use more highly stacked gates off critical path leakage drops with increasing channel length. so use smallest devices off critical path leakage drops greatly with stacked devices (acts as drain voltage divider).process engineers can provide two thresholds (at extra cost) use high VT off critical path 6. so slightly increase length off critical path dual VT .371 – Fall 2002 10/25/02 L15 – Power 20 .Design-Time Leakage Reduction Use slow. low-leakage transistors off critical path leakage proportional to device width.

run-time leakage reduction – switch off critical path transistors when not needed 6.Critical Path Leakage Critical paths dominate leakage after applying designtime leakage reduction techniques Example: PowerPC 750 5% of transistor width is low Vt. but these account for >50% of total leakage Possible approach.371 – Fall 2002 10/25/02 L15 – Power 21 .

371 – Fall 2002 10/25/02 L15 – Power 22 .Run-Time Leakage Reduction Gate Vbody > Vdd Body Biasing Drain Source Body Vt increase by reverse-biased body effect Large transition time and wakeup latency due to well cap and resistance Sleep transistor between supply and virtual supply lines Increased delay due to sleep transistor Power Gating Vdd Sleep signal Virtual Vdd Logic cells Sleep Vector Input vector which minimizes leakage Increased delay due to mux and active energy due to spurious toggles after applying sleep vector 0 0 6.