Closing the Power Gap between ASIC and Custom
David Chinnery, Kurt Keutzer

Outline 
    Motivation for focusing on reducing ASIC power The power gap between ASIC and custom Where does the power go? What can we do about it? Conclusions on automating low power techniques

3

Why power? 
Battery life is limited by power (e.g. laptop, mobile phone)  Cost for packaging and cooling increase rapidly with power dissipation (e.g. plastic vs. ceramic package, heatsink, fan)  Higher temperatures degrade performance and reliability
± Circuits are slower, with more leakage, at higher temperature ± Less reliable due to increased rate of electromigration 

Increasing integration increases power demand in portable applications (e.g. mp3 player/PDA/mobile phone combined)  Performance is limited by power now even for high end microprocessors
4

Power of high performance chips has increased
Power/Unit Area (W/cm2)

As device dimensions (W, L, Tox) scaled down by a factor k, for high performance,  If supply Vdd and threshold voltage Vth fixed, then power/unit area w k3  If Vdd and Vth scaled down linearly and , then power/unit area w k0.7 Further voltage scaling may be limited

data from ISSCC chips 1982-2002

1000 100 10 1
microprocessor digital signal processor

[Kuroda OYO 1 10 BUTURI 2004] Scaling Factor k (1/um)

5

n Vdd Vth. which is the major contributor to Pleakage Vdd Vth.p Vth.p Vdd 0V subthreshold leakage Vth.n Cload ± Must reduce Vth to maintain drive current  But reducing Vth increases subthreshold leakage current.p dynamic power ± Reducing Vdd gives quadratic reduction in Pdynamic  But transistor drive current depends on Vdd [Chen in Trans.Impact of voltage scaling on power Major components of power: Ptotal = Pdynamic + Pleakage  Dynamic power due to switching of capacitances Vdd Vth.n 6  Must look for other ways to reduce power . On Electron Devices 1997] Vth.

Automate low power techniques  Custom designers can try to optimize the design at all levels  Electronic design automation (EDA) tools for ASICs ± Most of the design optimization is high level ± Fast time-to-market and lower design cost ± Increasingly important to reduce design cost for larger chips  What is the power gap between (automated) ASIC design and custom design? ± We need to characterize the contributing factors ± Can we close the power gap? ± Identify custom techniques that can be used in an EDA flow 7 .

Outline      Motivation for focusing on reducing ASIC power The power gap between ASIC and custom Where does the power go? What can we do about it? Conclusions on automating low power techniques 8 .

stalled pipeline stages) ± Energy/operation is the inverse of throughput/unit power ± Maximize throughput/unit power or minimize energy/operation 9 . e.What is our metric for power?  Power ± Fixed performance constraint (clock frequency or throughput e.g.g. MIPS/mW ± Cycles per instruction (CPI) accounts for impact of architectural choices (e.g. 30 frames/s for MPEG2) ± Reduce the power and meet the performance constraint  Energy efficiency ± No performance constraint ± Throughput/unit power (1/PvTvCPI).

18 Process Technology (um) 0.0 0.0 2.13  ×1.60 0.50 0.1 MIPS/mW Comparison of Custom and Hard Macro ARM Implementations XScale 3.0 1.4 gap between ARM7TDMI-S and ARM7TDMI  ×3 to ×4 overall from synthesizable to custom ARMs 10 .What is the power gap? ARM cores  ×2 to ×3 gap between custom and hard macro ARMs Dhrystone 2.25 0.0 StrongARM Burd 0.35 0.3 to ×1.

What is the power gap? DCT/IDCT blocks  ×4 to ×7 between discrete cosine transform (DCT) and inverse discrete cosine transform (IDCT) blocks. after scaling linearly for technology [Fanucci ICECS 2002] ± We assumed power reduces linearly with technology  To get 30 frame/s MPEG2 with a general purpose processor would require two ARM9 cores and would consume 15× power [Fanucci ICECS 2002] ± Application-specific hardware substantially reduces power 11 .

Outline      Motivation for focusing on reducing ASIC power The power gap between ASIC and custom Where does the power go? What can we do about it? Conclusions on automating low power techniques 12 .

control. so we won t discuss memory here 13 .Breakdown of power by functionality Typical breakdown of on-chip power consumption for an embedded microprocessor  Clock 20% to 40%  Memory 20% to 40%  Control + datapath 40% to 60%  Input/output to off-chip ~5%  Most of power is in datapath. clock tree and memory ± Techniques focus on reducing this power ± Several companies provide custom memory for ASIC processes.

6 ×1.1  Voltage scaling. multi-Vth.0  High speed logic styles (DCVSL. multi-Vdd ×4.0 ×1.4 ×1. parallelism) ×2.3  Technology mapping ×1.3 ×1.2 ×1. PTL.0  Cell sizing and wire sizing ×1.0  Logic design ×1.0  Floorplanning and placement ×1.6 ×1. domino) ×1.6 ×1.6 ×1.3  Clock gating and power gating ×1.1  Process variation and process technology ×2.2 14 .5 ×1.Summary of factors effect on active power Automated designs are higher power than custom because of ASIC design quality Factor typical excellent  Microarchitecture (pipelining.

Outline      Motivation for focusing on reducing ASIC power The power gap between ASIC and custom Where does the power go? What can we do about it? Conclusions on automating low power techniques 15 .

6 ×1.6 ×1.3 ×1.1 ×4.1 ×2.2 ×1.0 ×1. domino) ± Technology mapping ± Cell sizing and wire sizing ± Voltage scaling.3 ×1.5 ×1.Outline     Motivation for focusing on reducing ASIC power The power gap between ASIC and custom Where does the power go? What can we do about it? Factor ± Microarchitecture (pipelining.0 ×1. PTL.0 ×1. multi-Vth.6 ×1.6 ×1.4 ×1. parallelism) ± Clock gating and power gating ± Logic design ± High speed logic styles (DCVSL.0 ×1.3 ×1.2  Conclusions on automating low power techniques 16 .0 ×1. multi-Vdd ± Floorplanning and placement ± Process variation and process technology ASIC design quality typical excellent ×2.

and branch prediction instruction fetch instruction decode ALU memory access write back instruction fetch instruction decode insert registers ALU memory access write back  Parallelism increases throughput in exchange for increased area  Limited by ± Routing. control overheads 17 .Microarchitecture leverage for voltage scaling and sizing Increase throughput/cycle to allow Vdd reduction  Pipelining inserts registers. multiplexing. ± Power and delay for registers. increasing throughput  Limited by ± Reduction in instructions/cycle (1/CPI) due to branch misprediction. data forwarding logic. etc. waiting to read or write memory.

ASIC is v2.Microarchitecture: pipelining model leverage for voltage scaling and sizing  Pipeline power model [Harstein 2003]: ± n stages.025/stage for custom.1 latch growth vs.6 FO4 total. and 0.050 vs. n.1um).05/stage for ASICs  Add fits for dynamic and leakage power with voltage scaling and sizing  At 40 FO4 delay constraint (500MHz for Leff=0. L=1. same tcombinational of 175 FO4  CPI penalty 0.6 1/(energy/operation) ASIC 18 .6 worse custom 1/(energy/operation) 0. 0.019 => ×2.05 for register power  Minimum stage delay: ± ASIC tpipelining overhead of 10 FO4 (register delay) + 10 FO4 (imbalance) ± Custom tpipelining overhead of 2. F=0.

and wider gates to compensate 19 . 20% power overhead v2  Without pipeline: Vdd=2.8 (typical) to v1.2V to meet throughput Parallel datapaths [Bhavnagarwala IEEE Trans. VLSI 00]  v2 to v4 reduction in power by reducing Vdd by increasing throughput with parallel datapaths Microarchitecture speed gap is v1. lower Vth.3 worse power due to higher Vdd.Microarchitecture leverage for voltage scaling and sizing Custom IDCT pipelining to reduce Vdd [Xanthapoulos JSSC 99]  With pipeline: Vdd=1.32V. this corresponds to about v2.3 (excellent)  At a tight delay constraint.6 to v1.

6 ×1. multi-Vth.6 ×1.3 ×1.4 ×1.0 ×1. parallelism) ± Clock gating and power gating ± Logic design ± High speed logic styles (DCVSL. domino) ± Technology mapping ± Cell sizing and wire sizing ± Voltage scaling.5 ×1.6 ×1.1 ×4.1 ×2.0 ×1.0 ×1.2  Conclusions on automating low power techniques 20 .0 ×1.3 ×1. PTL.2 ×1.0 ×1.Outline     Motivation for focusing on reducing ASIC power The power gap between ASIC and custom Where does the power go? What can we do about it? Factor ± Microarchitecture (pipelining.3 ×1. multi-Vdd ± Floorplanning and placement ± Process variation and process technology ASIC design quality typical excellent ×2.6 ×1.

g.1. Logic is lower activity ~0.0  Clock signal has high activity.6 to v1.Clock gating v1.3 [Hsu ISLPED 02]  ASICs can do this add insert clock gating add select_add clock select_shift shift 21 clock shift . clock gating and avoiding computation reduces power by v10 [August SOC 01]  Typical savings are up to v1. reduce precision for DCT/IDCT coefficients ± Precomputation control signals reduces power by v1.  Turn off clocks to inactive modules  Some DCT/IDCT registers are active < 3% of time.6 power reduction  Power minimization tools automatically insert gated clocks  Designer can make microarchitectural/algorithm decisions ± E.4 to v3. 2.

Power gating reduces leakage in standby  Turn off leakage path in inactive modules ± May need to preserve the state registers  Can reduce standby leakage by 3 orders of magnitude [Mutoh JSSC 95]  Other approaches ± reverse biasing the substrate ± setting input vectors to low leakage states. gives v1.4 leakage reduction [Lee DAC 03]  Just now getting ASIC methodology support ± Need large sleep transistors to turn off power ± Sleep transistors reduce available supply voltage select_add clock select_shift shift 22 add .

2  Conclusions on automating low power techniques 23 .0 ×1.6 ×1.3 ×1.0 ×1.0 ×1. domino) ± Technology mapping ± Cell sizing and wire sizing ± Voltage scaling. multi-Vth.0 ×1.6 ×1.3 ×1.1 ×4.5 ×1. parallelism) ± Clock gating and power gating ± Logic design ± High speed logic styles (DCVSL.Outline     Motivation for focusing on reducing ASIC power The power gap between ASIC and custom Where does the power go? What can we do about it? Factor ± Microarchitecture (pipelining.3 ×1.4 ×1.2 ×1. multi-Vdd ± Floorplanning and placement ± Process variation and process technology ASIC design quality typical excellent ×2.0 ×1.6 ×1. PTL.6 ×1.1 ×2.

larger capacitance PMOS transistors in series static CMOS DCVSL PTL domino 24 . robust  PMOS pullup series transistors are slow Faster custom logic styles speedup critical paths  Custom can use slack from higher speed (v1.High speed logic styles leverage for voltage scaling and sizing Low power designs use mostly static CMOS logic  Static CMOS logic is low leakage.4) to reduce power by lowering Vdd ± ASIC power v1.3 worse than custom at a tight delay constraint due to logic style 32-bit Adder [Tiwari DAC¶98] domino Power 22% higher static 25% lower Delay slow.

0 ×1.4 ×1.1 ×2. multi-Vth.0 ×1.3 ×1.3 ×1.0 ×1.6 ×1. PTL.6 ×1.6 ×1.1 ×4.Outline     Motivation for focusing on reducing ASIC power The power gap between ASIC and custom Where does the power go? What can we do about it? Factor ± Microarchitecture (pipelining. domino) ± Technology mapping ± Cell sizing and wire sizing ± Voltage scaling.2  Conclusions on automating low power techniques 25 .5 ×1. parallelism) ± Clock gating and power gating ± Logic design ± High speed logic styles (DCVSL.3 ×1. multi-Vdd ± Floorplanning and placement ± Process variation and process technology ASIC design quality typical excellent ×2.6 ×1.0 ×1.0 ×1.2 ×1.

0 Technology mapping tools don t target low power  We found that targeting minimum area for multipliers can result in v1.4 to v1. delay is a poor choice Technology mapping techniques to reduce active power v1.Technology mapping v1.0 ASICs can do as well as custom. 1/2 1/2 lower activity 1/2 3/8 3/8 7/32 3/8 26 1/2 . if tools improve 1/2 1/2 1/2 1/2 3/8 7/32 1/2 3/8 equivalent logic.3 power.

2 ×1.Outline     Motivation for focusing on reducing ASIC power The power gap between ASIC and custom Where does the power go? What can we do about it? Factor ± Microarchitecture (pipelining.3 ×1.0 ×1.3 ×1. multi-Vdd ± Floorplanning and placement ± Process variation and process technology ASIC design quality typical excellent ×2.5 ×1.1 ×2.1 ×4.6 ×1.3 ×1.6 ×1.0 ×1.2  Conclusions on automating low power techniques 27 .6 ×1.0 ×1.0 ×1.4 ×1. parallelism) ± Clock gating and power gating ± Logic design ± High speed logic styles (DCVSL. multi-Vth.6 ×1. PTL. domino) ± Technology mapping ± Cell sizing and wire sizing ± Voltage scaling.0 ×1.

78 18.76 3.76 2.63 10% 16% 28 .35 power reduction on Xtensa processor at 325MHz by (mostly sizing) power minimization with Design Compiler and 0.649 c3540 36 1283 1.11 1.67 2.25 2.39 9.1  v1. Keutzer will be at ISLPED 05]  Can do better than Design Compiler (DC) with cell sizing via linear program (LP) (global optimization vs.305 c7552 31 2779 0.847 Average savings vs.49 2.999 c2670 23 1164 0.054 c5315 34 1956 0.37 3.22 1.08 0. Design Compiler: Power (mW) 1.11 2.2 power reduction ISCAS'85 # logic Minimum Netlist levels # cells Delay (ns) c17 4 10 0.53 5.1 to v1.98 3.08 6.81 6.62 4.6 to v1.78 2.60 4.88 5.79 5.13um library [internship at Tensilica] [Chinnery.91 6.69 5.094 c432 24 259 0.Cell sizing and wire sizing v1.83 4.70 10.44 9.90 6.2T min DC LP DC LP 1.86 0.08 4.07 6.83 2. greedy pin-hole optimization).61 6.51 8.12 3.26 3.02 16. about v1.82 7.946 c6288 113 3544 3.1T min 1.700 c1355 27 764 0.63 8.60 13.701 c880 23 484 0.76 5.97 4.65 15.778 c1908 33 635 0.733 c499 25 644 0.23 8.

6 gap due to cell sizing and wire sizing.6 to v1.2 power reduction and v1.Cell sizing and wire sizing v1.1 to v1.1  Cell libraries lack fine-grained sizes and skewed P:N drives ± [Hurat SNUG 01] Generate new cells: v1. v1.7 [Gong ISLPED 96] ± v1. a good sizing tool. and design-specific cells 29 .15 faster for bus controller.2 reduction in total power ± Not available for ASIC interconnect yet  Up to v1.1 using a library with finely-grained sizes. can reduce to v1.4 MHz/mW Vdd optimize transistor sizes Vdd GND GND  Simultaneous buffer and wire sizing reduced clock tree power by v2.

5 ×1.0 ×1. parallelism) ± Clock gating and power gating ± Logic design ± High speed logic styles (DCVSL.6 ×1.6 ×1.0 ×1.0 ×1.0 ×1.1 ×2. domino) ± Technology mapping ± Cell sizing and wire sizing ± Voltage scaling.4 ×1.3 ×1.1 ×4.3 ×1. multi-Vth.6 ×1.2 ×1. PTL.Outline     Motivation for focusing on reducing ASIC power The power gap between ASIC and custom Where does the power go? What can we do about it? Factor ± Microarchitecture (pipelining.3 ×1.2  Conclusions on automating low power techniques 30 .0 ×1.6 ×1. multi-Vdd ± Floorplanning and placement ± Process variation and process technology ASIC design quality typical excellent ×2.

7 power reduction for voice. ARM 02] MIPS [Burd ISSCC 2000] Energy (mW/MIPS)  Reduce Vdd and bias substrate to lower Vth ± v1.0 to v1.Dynamic supply and substrate biasing v4.0  Change Vdd based on processor load ± v10 more energy efficient at low performance [Burd ISSCC 00] ± Adaptive voltage scaling with the ARM11 gives v1. SMS. same speed [Hamada CICC 98] ± Increase Vth in standby to reduce leakage  These are complicated to automate for ASICs ± Dynamic voltage requires accurate knowledge of path delays 31 .7 reduction in power. web applications [National Semiconductor.

reduces leakage by v500 ± v1.0 Basic idea: high speed where critical. low power elsewhere  Dual Vdd reduces power by v1.Multiple supply and threshold voltages v4.25 to v3 average power reduction.0 to v1. depending on activities  Dual Vth can give v3 to v6 reduction in leakage [Sirichotiyakul DAC 99] ASICs are limited to Vdd and Vth offered by library and foundry  Can t change Vth to design-specific optimal point  Standard cell libraries characterized at only two or three Vdd  Dual Vdd requires level converters and dual Vdd layout 32 .7 after substrate biasing/lower Vdd [Usami JSSC 98] ± v2 reduction in clock tree power by using low Vdd  Separate voltage islands [Lackey ICCAD 02] different speeds and Vdd ± Turn off Vdd to modules not in use.

0 ×1.1 ×2.Outline     Motivation for focusing on reducing ASIC power The power gap between ASIC and custom Where does the power go? What can we do about it? Factor ± Microarchitecture (pipelining.1 ×4.0 ×1. multi-Vdd ± Floorplanning and placement ± Process variation and process technology ASIC design quality typical excellent ×2.5 ×1.3 ×1.0 ×1. parallelism) ± Clock gating and power gating ± Logic design ± High speed logic styles (DCVSL.3 ×1.6 ×1.6 ×1. domino) ± Technology mapping ± Cell sizing and wire sizing ± Voltage scaling.4 ×1.3 ×1.6 ×1.2  Conclusions on automating low power techniques 33 .0 ×1. multi-Vth.2 ×1. PTL.0 ×1.6 ×1.

inaccurate wire loads  ×1. Report 01] 34 .5 to v1. and gates will be upsized to drive the longer wires automatic place and route block partitioned [Hauck Micro.13um  42% longer wires for 200K partitions  Interconnect is 20% to 40% of total power [Sylvester ICCAD 98]  v1.Floorplanning and placement v1. 200K gate modules from 0.25um to 0.5 worse power than custom We compared partitioning a design into 50K vs.1 to v1.1 Poor floorplanning and cell placement.2 increase in total power due to wiring.

automated place-and-route ± up to v1. Puri.1 energy reduction [Chang SM Thesis MIT 98]  ASICs still ×1.1 faster.Floorplanning and placement v1.5 energy reduction from bit slicing and some logic optimization [Stok.1 higher power than custom due to layout automatic place-and-route tiled bit-slices custom 35 . Bhattacharya.1  Bit slices can reduce wire length by 70% or more vs.5 to v1. about v1.4 energy reduction as faster and lower wiring capacitance [Chang SM Thesis MIT 98] ± v1. Cohn]  Manual place-and-route achieves 10% shorter wires and v1.

PTL. multi-Vth.2 ×1.0 ×1.2  Conclusions on automating low power techniques 36 . parallelism) ± Clock gating and power gating ± Logic design ± High speed logic styles (DCVSL.6 ×1.6 ×1.1 ×4.6 ×1.5 ×1.3 ×1.0 ×1.0 ×1.0 ×1.0 ×1.3 ×1.4 ×1.1 ×2.Outline     Motivation for focusing on reducing ASIC power The power gap between ASIC and custom Where does the power go? What can we do about it? Factor ± Microarchitecture (pipelining. multi-Vdd ± Floorplanning and placement ± Process variation and process technology ASIC design quality typical excellent ×2.3 ×1. domino) ± Technology mapping ± Cell sizing and wire sizing ± Voltage scaling.6 ×1.

Process variation impact on power v2.5 37 .2  ASICs are designed to work at the worst case delay and worst case power corners for the process typical delay and power are less ± Simulated power was ×1.7 actual power for custom DCT/IDCT  Up to a factor of v1.75 [Takahashi JSSC 98] ×1.6 to v1.3um) ×1.75 between worst and best (average power of 80 chip samples in 0.

Process variation impact on power v2. no speed test) ×1.13um Intel and AMD PC chips ± ASICs don t speed bin (they scan test.6 to v1.2  Binning would leave gap of v1.4 low power bin higher power bin 38 . after derating for Vdd and frequency) bins of 0.4 between low and high bins  We found a gap of v1.2 between low speed (high power) and high speed (low power.18 and 0.

6 higher active power.13um vs. IBM 0.Process technology v2. VLSI 2001]  We compared cell libraries in UMC 0. typical conditions.3 faster.13um process ± IBM cells about ×1.05 faster. UMC had ×17 leakage Overall impact of process variation and technology  v2.1 to ×1.2 in a low power process. ×1.2  Low power libraries are more expensive ± 5% to 10% transistor width shrinks to reduce capacitances ± Copper is 40% lower resistivity than aluminum ± Low-k dielectric reduces wire capacitances we estimate about a ×1. no speed binning 39 .6 to v1.1 reduction in total power with a low-k dielectric ± Silicon-on-insulator is ×1. ×1.6 ASIC power relative to custom for worst case conditions and a cheap process  v1.4 power reduction [Narendra Symp.

Outline      Motivation for focusing on reducing ASIC power The power gap between ASIC and custom Where does the power go? What can we do about it? Conclusions on automating low power techniques 40 .

These are the biggest levers for reducing power. ± Can get 10v or more going from general purpose hardware to application-specific hardware.6 each 41 . about ×2.  The largest factor for the power gap is voltage scaling responsible for up to ×4  Process and microarchitecture can be large factors.g.Low power design conclusions  Typical ASIC is v3 to v7 less energy efficient than custom ± We assumed ASIC and custom designs can use the same microarchitectural and logic design techniques. ± E. Fast Fourier transform implementations as discussed in Andrew Chang s paper.

5 at a lower performance target (~2v slower) ± Make full use of scaling down Vdd and Vth 42 . and upsized gates are needed to meet performance target  v1.Low power design conclusions By incorporating custom techniques can get within  v3 at a high performance target ± Can t use custom logic styles ± ASIC speed penalty drags down efficiency. lower Vth. as higher Vdd.

1 increase in MHz/mW overall The third speaker. FIR filter.000 gates implementing Hilbert transform.Low power ASIC design example 0. Cohn]  240. Ruchir Puri will discuss some of their recent low power work at IBM.13um DSP example [Stok. 43 . Puri.86 increase in efficiency  A fine grained standard cell library gave another v1. physical synthesis gave v1. logic design (carry save adders). bitslicing. and fast Fourier transform. with 42KB register array  Technology mapping.16  Voltage scaling gave another factor of v1. Bhattacharya.46  v3.

Extra slides .

p Vth. which is the major contributor to Pstatic Vdd Vth. gate switching activity E.n subthreshold 0V leakage (Clock frequency f. X.Impact of voltage scaling on power Ptotal = Pdynamic + Pshort circuit + Pstatic  Short circuit power when switching is 10% or less of Ptotal  Dynamic power due to switching of capacitances ± Reducing Vdd gives quadratic reduction in Pdynamic  But transistor drive current depends on Vdd [Chen in Trans.n Cload Vdd Vth. and m. On Electron Devices 1997] Vdd Vth. constants F.p Vdd Vth.) 45 .p Vth. temperature T.n ± Must reduce Vth to maintain drive current  But reducing Vth increases subthreshold leakage current. Io. transistor gate oxide thickness Tox.n short circuit current Vdd dynamic power V th. capacitance C.p Vth. transistor length L.

total power high speed. leakage low power.13 leakage increasing 0. total power low power. ignoring interconnect).045 Technology (um) 0.065 0.1 fast.01 0. high Vth 0. low Vth slow.09 0.1.022 From International Technology Roadmap for Semiconductors data for 2001-2016 (assuming activity of 0.ITRS leakage power trends 1000 Power/Die Area (W/cm )  Can t scale down Vth much further due to large subthreshold leakage currents  Gate tunneling leakage through thin gate oxide Tox is also becoming a significant cause of leakage  Further Vdd voltage scaling will be limited  Must also look to other low power techniques 100 2 high speed. leakage 10 1 0. 46 .001 0.

0 ×1.3 ×1.Summary of factors affecting (active) power Automated designs are higher power than custom because of ASIC design quality typical excellent ×2.5 ×1.1 ×2.0 ×1.2 ×1. multi-Vth. parallelism)  Memory  Clock gating and power gating  Logic design  High speed logic styles (DCVSL.6 ×1.6 ×1.0 ×1. multi-Vdd  Floorplanning and placement  Process variation and process technology .3 ×1. domino)  Technology mapping  Cell sizing and wire sizing  Voltage scaling.0 ×1.3 ×1.1 ×4.6 ×1.6 ×1.0 ×1.4 ×1. PTL.0 ×1.2 47 Factor  Microarchitecture (pipelining.4 ×1.

but reduced cache misses ± Pipeline stalls. waits many cycles for read/write to off-chip memory  Caches with higher associativity (e. direct mapped) consume more power.Memory reduce cache misses v1.32 energy savings ± Software optimizations to reduce cache misses gave on average a v1. increasing the transistor length in the caches by 12% reduced leakage by v20 [Montanaro JSSC 96] slower off-chip memory write buffer on-chip cache processor  ASICs can do this. 8-way vs.g. custom memory is available for ASICs 48 .4 to v1.0  Larger caches consume more power. also affects likelihood of a cache miss  [Duarte ASIC/SOC 2001] ± Sub-banking: only precharge the need section of the cache bank.6 reduction in power  90% of the StrongARM area was caches. v1.

6 ×1.0 ×1.2 ×1. parallelism) ± Clock gating and power gating ± Logic design ± High speed logic styles (DCVSL.3 ×1. multi-Vdd ± Floorplanning and placement ± Process variation and process technology ASIC design quality typical excellent ×2. PTL. domino) ± Technology mapping ± Cell sizing and wire sizing ± Voltage scaling.6 ×1.5 ×1.4 ×1.0 ×1.2  Conclusions on automating low power techniques 49 .6 ×1.0 ×1.1 ×2.0 ×1.3 ×1.3 ×1.6 ×1. multi-Vth.1 ×4.Outline     Motivation for focusing on reducing ASIC power The power gap between ASIC and custom Where does the power go? What can we do about it? Factor ± Microarchitecture (pipelining.0 ×1.

0 Logic design refers to the topology and logic structure to implement functional units  Logic switching activity of a carry select adder was v1.Logic design v1. ASIC designers can choose the same logic design as custom. reduced energy by v1. 92]  0. v1.3 energy compared to radix-4 [Zlatanovici ESSC 03]  We implemented an algorithm to reduce switching activity in multipliers.8 worse than a 32-bit carry lookahead [Callaway VLSI Signal Proc.2 to v1.1 for 64-bit [Ito ICCD 03]  Given similar design constraints.13um 64-bit radix-2 compound domino adder was slower and about v1.0 carry save adder x0 y0 + ripple carry adder + z0 (x+y+z)0 z1 x1 y1 + + z2 (x+y+z)1 x2 y2 x3 y3 + + z3 (x+y+z)2 + + (x+y+z)3 (x+y+z)4 50 .