ICCAD ‘03

Leakage Issues in IC Design: Part 3

Anirudh Devgan devgan@us.ibm.com IBM Research Austin

Outline
Part1: Siva Narendra (Intel Corporation)
– Device physics, process technology, leakage fundamentals

Part 2: David Blaauw (University of Michigan)
– MTCMOS, Dual-Vt, estimation/optimization techniques for dual-Vt

Part3: Anirudh Devgan (IBM Corporation)
– Process & Environmental variations, ABB, Vdd control

Part4: Farid Najm (University of Toronto)
– State dependence, sleep states, memory/cache circuits and architectures

Leakage…
Leakage power is becoming a significant portion of total power.
– Currently managed to 15-20% in current IBM designs. – Expected to increase dramatically in future designs & technologies. – In some cases (11S), predicted to more than 50% of total power.
1.E+04

International Technology Roadmap
NTRS '97 ITRS '99 ITRS '01

1.E+03

1.E+02

1.E+01

Leakage is the problem
– Emerging as the critical challenge in VLSI Design.
1.E+00 1990 1995 2000 2005 2010 2015

2

Leakage depends super linearly with Process variations, Temperature, Power supply voltage Gate

Sub-Threshold

Variability: Leakage and Timing Source: Intel. DAC 2003 .

Leakage & Thermal Variations Temperature varies with-in the chip FXU FXU FPU ID U IF U BXU L3 Directory/Control LSU LSU IS U IS U FPU ID U IF U BXU L2 L2 L2 Chip Floorplan Chip Thermal Profile Power 4 Server Chip: 2 CPU on a chip The CPUs can be much hotter than the caches .

Power Supply Variations Voltage Vcrit Package Grid + VDD Rg L Rd Decap time Cd IDD µ Load Load Waveform tp time VDD Noise l µ tp Rg + µL – µ R2g Cd (1 – e-t /τ) p ~Same DC Package Decap .

Power Delivery C4 “balls” Connection o package Connection to circuits .

Need to combine Power supply analysis with Leakage.5 2 1.2 Subthreshold Sum 0.07 0.8 0.1 0.05 0.06 0.08 0.Supply Voltage & Temperature Leakage dependence on Vdd – Variation in both gate and subthreshold leakage. Leakage significantly affected by temperature. With-in die temperature variations need to be modeled.6 0.01 0 M ean Leakage M ax Leakage M in Leakage Temperature (deg C) 55 70 85 100 115 .04 0.5 0 0.7 0.1 1.03 0.09 0. 3 Gate 2.5 1 0.9 1 1.02 0.

Power integrity & Package Package design becoming increasingly complicated Large number of power/ground layers in current packages Significant Power supply variations caused by the package Solder balls capacitor chip capacitor chip carrier .

uniform across the various C4 – Can be difficult to predict and analyze .Power Supply Variations – Package Effects VDD GND C4s Planes Balls Power IR Drop by the package – Is usually non.

Nassif DAC 2003 .s Variability Real? S.

Nassif DAC 2003 -10 R is R is ing/ gnal Coupling Ris ing /Fa llin g ing SOI History VDD/Package Noise Temperature Process Line .Variability Time Scales – 10-8 10-7 – 10-5 10-4 – 10-2 105 – 107 S.

. isolated poly-silicon ∆L.Variability Distribution Physical: – – Die to die variation Imposed upon the design (constant regardless of design). Co-generated between design and process (depends on details of the design). Well modeled via worst-case files. Example: nested vs. Within-die variation – – Environmental: Only makes sense within-die.

Lack of modeling resources often transforms variability to uncertainty.Variability vs. or model too difficult/costly to generate or simulate. – – Designer has option to null out impact. . Example: power grid noise. – Example: nearest neighbor noise coupling. Uncertainty Variability: known quantitative relationship to a source (readily modeled and simulated). Uncertainty: sources unknown. Example: ∆L within die variation. – – Usually treated by some type of worst-case analysis.

Environmental components: – VDD & Temperature variations are natural byproducts of power distribution analysis. Physical components: – VT and ∆L variations across the die. – Lot. . – Layout (design) dependent.Within-Die Variations? Fundamentally different from die-to-die. – Power distribution analysis unavoidable . and Die distributions. Wafer.

Semi-periodic across wafer.Spatial Variability Facility. Line. Wafer level and above are not of interest to a designer. Lot. Designer does not get to choose where on a wafer his design goes! Parameter Die 1 Die 2 wafer Positi S. Wafer & Die components. Nassif DAC 2003 .

Layout (designer view) Mask & OPC bias Lithography bias Resist Etch bias Poly Silicon S. Exposure and Etch variations Essentially identical to modeling required for OPC.Sources of Within-Die Variations Poly line-width (LEFF) variation comes from: – – Mask. Nassif DAC 2003 .

Nassif DAC 2003 .LEFF variations via OPC S.

Sources of Within-Die Variations Vertical variations are caused by chemical-mechanical planarization (CMP) process. polishes faster C1 ≠ C2 S. Nassif DAC 2003 .

…) All effects need to be accurately modeled .Leakage Modeling and Analysis Operating Environment (Temperature. VDD. …) Input pattern vector (state) Sub-Threshold Gate Topology Process Variations (∆L. VT. TOX.

Leakage Requirements Need to include environmental and process variations Channel length Supply voltage Temperature Standby leakage Active & burn-in leakage S. Narendra ICCAD 200 .

25 0.1 0.Process Dependence 0.15 0. – 10X variation due to process vs.2 Leakage (mA) M in Leakage M ean Leakage M ax Leakage ~10X IDDQ for typical chip 0.05 ~10% Process Variation Parameter (NRN) 0 1 2 3 4 5 Leakage varies exponentially with process variations. 10% variations due to different patterns. – Reason: sub-threshold current varies exponentially with process variations .

99 0.Leakage & Process Variations Process parameters (Leff. Vth) have great influence Generate leakage performance statistics – Practical bounds for a given confidence level can be driven Leff distribution + Global variations Within-chip (Deterministic) + Within-chip (Random) 0.99 .01 Cumulative Distribution for Leakage Total Leakage L-3σ = L0.01 L3σ = L0.

S.Process Variations Leakage is exponentially dependent on process variations I = I oe −( L − Lmean λ ) Normal distribution of L leads to a LogNormal distribution of leakage. Narendra ICCAD 20 .

2 No of Transistors 1.2 0.0 0.Process Variations: ACLV 1.4 0.6 0.8 0.0 -4 -3 -2 -1 0 1 With-in Chip: ACLV 2 3 4 ACLV Sigma No of Chips Chip Mean Leff -4 -3 -2 -1 0 1 Global Sigma 2 3 4 Chip to Chip Variation .

Narendra ICCAD 200 .Leakage modeling Prior techniques Lower bound: Assumes all devices in the die are nominal L I leak -l = wp kp I o p + wn o In kn Upper bound: Assumes all devices in the die are minimum L I leak-u = wp kp I 3σ off − p + wn 3σ I off −n kn S.

Process Variations: ACLV If Length distribution on the chip is normal. the probability of given L is 1 p(l ) = σ 2π e − ( l − Lmean ) 2 2σ 2 Leakage with ACLV can be computed as I leak = E ( I o e −( l − Lmean λ ) Io )= σ 2π L max L min ∫ e − ( l − Lmean ) 2 2σ 2 e − ( l − Lmean ) λ dl which simplifies to I leak = I o e σ2 2 λ2 ACLV has exponential effect on total chip leakage .

Applications…  k I leak  σ = λ 2 ln  w Io  A macroscopic standard deviation (σ) representing parameter variation in a chip I leak −w = o I pwp σ p2 kp e 2λ p 2 + o In wn 2λn2 e kn σ n2 Leakage estimation Depends on parameters that can be estimated S. Narendra ICCAD 2002 .

65 σ: 0.2% of the samples using other techniques S.8 100 Ratio of measured to calculated leakage 50% of the samples within ±20% of the measured leakage Compared 11% and 0.3 Ileak-l µ: 6.18 um 32-bit microprocessors (n=960) Num ber of sam ples 500 400 300 200 100 0 0.1 1 10 Ileak-u Ileak-w µ: 0. Narendra ICCAD 2002 .04 σ: 0.27 µ: 1.5 σ: 3.Measurement results 0.

ISLPED 2003 .i ∆Linter ∆Lintra R. Rao.i = Lnominal + ∆Linter + ∆Lintra.Characterizing the variation Variations in channel length occur at both the intra-die (within-die) level and inter-die (die-to-die) level Ltotal .et al.

ISLPED 2003 .Empirical Model Using the following mathematical model I = q 1 e (q 2 L d + q 3 L d ) = h (L ) ⇒ 2  1 L=  2q  3     − q 2 +  q  2 q2 − 4 q 3 ln  1   = g (I )  I   Eliminate Vth as an intermediate variable and use general form of SPICE model to perform empirical fitting on channel length Properties of this equation: – Preserves exponential dependency of I on L – Is easily invertible (simple quadratic equation formula) – Yields closed-form expressions for the PDF of I – Accurately fits over a wide range of values of L for NMOS/PMOS and transistor stacks R.et al. Rao.

et al. Vds=Vdd and ±10% variation in Ld Comparison of simulation data with analytical expression Simulation Analytical BSIM3 Fit Comparison of experimental PDF with PDF obtained analytically Simulation Analytical R.Goodness of fit NMOS in 0.18µm: Vgs=0. Rao. ISLPED 2003 .

.Accounting for inter/intra-die variations Ltotal .inter and obtain the distribution of currents around that average point using σL.intra=15% of nominal – Enumerate all inter-die variation points (Ex: 1%. total = σ L.et al. First determine the parameters for σL.inter + σ L.total=15%. 2% etc.inter Weighted Summation .inter=0% and σL.intra We accommodate variation across both levels in our empirical model – Assume σL.i = Lnominal + ∆Linter + ∆Lintra.) – For each point.intra – Perform weighted summation across the range of σL. ISLPED 2 2 2 2 σ L. Rao. shift the mean by σL.i R.

Leakage PDF for different intra/inter-die variation

R. Rao,et al. ISLPED 2003

Leakage Modeling and Analysis
Operating Environment (Temperature, VDD, …) Input pattern vector (state) Sub-Threshold

Gate

Topology

Process Variations (∆L, VT, TOX, …)

All effects need to be accurately modeled

Power with on-chip environmental variations
Leakage power (P0) Stop Y Converge ? N Thermal profile Voltage drop contour Dynamic power (PD0)

Leakage Model Pleak= P0f(dVdd, dT)

Dynamic power model ~ dVdd2

Thermal Profile Computation Full chip thermal model – Thermal system modeled is this paper is assumed to be static (i. not time-varying) – Dynamic. time varying model can also be used if desired Package C4 Metal + ILD Si SiO2 Silicon Substrate Heat sinks .e.

y. k is the thermal conductivity Boundary conditions – Isothermal (Dirichlet): T = fi(x. y.Thermal Equations Steady-state heat conduction equation: k∇ T ( x.z) – insulated (Neumann): ∂T/ ∂ni = 0 – Convective (Robin): ki ∂T/ ∂ni=hi(T – Ta) . z ) = 0 2 where p is the power density of the heat sources. y. z ) + p ( x.

z − 2Tx .y+1. y . y . z +1 − 2Tx . z − 2Tx .z Tx. y . z +Tx . z +1 dz 2 )=− p dxdydz Model the circuit as a linear circuit Ri = dx kdydz Rb = 1 hdydz .y.Thermal Model Tx.z-1 Thermal network k( Tx+1. y . y −1.z Tx.y-1. z +Tx+1.y.y. z dy 2 + Tx . z dx 2 + Tx .z Tx.z Tx+1. y . y . y +1.y. y .y.z Tx-1. z +Tx .z+1 Tx.

thermal modeling of a typical chip leads to millions of nodes – Need to use specialized linear solvers .Thermal Model Analogy between thermal and electrical circuits: Thermal T : Temperature (K) Rh: Thermal res (K/W) Q : Heat flow (W) == == == Electrical V R I Thermal System can be modeled as linear circuit GV=I However.

Ccapacitance matrix.Circuit Analysis for Thermal/VDD circuits General equation: Gx + Cx’ = B(t) – – – – G x. . Analysis using SPICE-like simulators is not practical. time dependent current sources. x’ B(t) conductance matrix. – – Dimension of x ~ 105 to 107. time derivative. nodes voltages & KVL currents. Standard circuit simulators cannot make use of any special properties of the analysis.

The matrix (G+C/h) is independent of time ∴ only needs to be inverted once.Analysis Acceleration Make use of the linearity. BE discretization: (G+C/h)x(t+h) = B(t)+(C/h)x(t). Avoid data explosion. Algebraic multi-grid (AMG) solvers . Retain geometrical description and leverage to reduce translation overhead. Specialized Linear Solvers. System Solution: x(t+h) = (B(t)+(C/h)x(t))(G+C/h)−1. One layout “polygon” translates to many resistors.

Map solution back to fine grid and refine. Systems: Fine: Ah xh = bh Coarse: A2h x2h = b2h Fine Coarse . Map problem from original to coarser grid.Multi-Grid Methods Basic idea: Reduce original grid ωh to a coarser grid ω2h. Solve problem at coarse grid (iterative solver).

8 – 110.016-1.1 Change in Leakage with variations (W) -1.159-1.2 75.3 1.12 Vdd (V) T (°C) 1 2 1.850 -0.Leakage Variation Chip Leakage (No variations) (W) 9.5 – 89.60 1.196 80.136 .

1 .73M Analysis Thermal Thermal Power Grid Power Grid Power Grid Runtime (sec) 82.74M 2.Thermal & Power Grid Solver Matrix Size 170k 270k 630k 1.13 139.45 0.46 1.10 Mem (GB) 0.39 293.58 438.17 88.61 0.3 2.

Leakage Modeling Leakage various super linearly with temperature and power supply voltage –However.0)[1 + a1∆T + a2 ∆T 2 + b1∆V + b2 ∆V 2 + c1∆T∆V ] Gate and subthreshold leakage is modeled separately . on chip variations of temperature and voltage are limited –Leakage as a function of temperature and voltage is modeled as second order I (∆V . ∆T ) = I (0.

Leakage Modeling .

Results: Thermal profile Thermal map of 9mm x 9mm ASIC chip .

Results: VDD Profile VDD profile .

leakage is lower by 10% .Results: Leakage w/ VDD & Temp Variations Leakage considering environmental variations – – Accurate leakage model of actual VDD and temperature profile For this example.

3 85 Leakag e (W) 7.75 5.196 1.31 Change in Leakage (W) -1.Comparison to Fixed Drop Analysis Need to accurately captures the on-chip locality of power supply and temperature and their influence on leakage Reduce the optimism of the “fixeddrop” method Chip 1 Vdd (V) T (°C) OCV Fixed Drop 1.85 -4.29 .8 110.08 80.0161.

Variable Threshold CMOS (VTCMOS) Body effect to change device Vt Standby leakage reduction with maximum reverse bias Triple well structure VBBP VDD VSS VBBN VDD VBBP p+ N-well N-isolation P-sub p+ n+ n+ P-well VSS VBBN Body Effect: Vt = Vt0 + γ 2φ B − VBB − 2φ B ( ) .

ISSCC. 1996) In active mode: – Zero or slightly forward body bias for high speed In standby mode: – Deep reverse body bias for low leakage Triple well technology required Kaushik Roy.VTCMOS Variable Threshold CMOS (from T. ECE. Kuroda. Purdue University .

Kuroda. 150Mhz. “ JSSC Nov. 2-DCT Co Processor with Variable Vt Scheme. “A 0. 1996 J Kao ICCAD 2002 .9V.1mA active -> 10nA sleep (2. 10mW.8v ∆VB Dynamically tunes Vt (by matching leakage current monitor) to minimize Vt variation T.VTCMOS Example VTCMOS principle applied to 4-mm DCT core processor SSB increases Vt (more reverse bias) SCI decreases Vt (Standby -> Slee Leakage reduction 4 orders of magnitude 0. 4mm2. et al.

VTCMOS Pros/Cons PROS: Significant standby leakage reduction Memory elements retain state No transistor sizing/ partitioning required Dynamically tunable Vt during runtime CONS: Requires expensive triple well process Body factor decreases with scaling J Kao ICCAD 2002 .

Vth=0.9V.5V.DVS vs. DVTS TSMC 250 nm ( Vdd=2.15V ) aushik Roy. Vth=0.45V ) BPTM 70nm ( Vdd=0. Purdue University . ECE.

et al.” JSSC Feb 2002. J Kao ICCAD 2002 .2-GIPS/W uProc Using Spe Adapative Vt CMOS with Forward Bias. “A 1. Miyazaki.Speed Adaptive Vt CMOS Dynamically tune Vt so that critical path speed matched clock period Reduces chip-to-chip parameter variations Reverse bias: Forward bias: Speeds up slow chips Operate only as fast as necessary (reduces exc active leakage) Standby leakage with maximum reverse bi Also known as Adaptive Body Biasing (AB M.

3 0. VDD (implicit Vt) for fixed frequency 0.9 M. . leakage power (Vt) Minimize total ACTIVE power consumption (higher active leakage current at expense of lowering dynamic power) Power vs.1 0 0.3 Power [Watts] 0.5 VDD [Volts] J Kao ICCAD 2002 Minimum point where slope(leak) = .7 0.1 0.slope(dyn) Pleakage Ptotal Pdynamic 0.Adaptive Supply & Body Bias (ASB) Dynamically tune both VDD & Vt as operating conditions change Trade-off between dynamic power (VDD knob). et al.” ISSCC February 2002.2 0. Miyazaki. “A 175mV MultiplyAccumulate Unit using an Adaptive Supply Voltage and Body Bias (ASB) Architecture.

g.1 J Kao ICCAD 2002 .6 1. Varying Workload reduce leakage at expense of increased dynamic reduce dynamic at expense of increased leakage Vt-VDD Constant Performance Locus 50 Mhz 100 Mhz 150 Mhz 200 Mhz Power [Watts] 3 2.1 VDD [Volts] 1.7 0.2 0.6 2.Optimal VDD/VT Selection Optimal VDD & Vt target changes with operating conditions – – – e.9 0.5 0.6 1.8 0.6 100 Mhz 150 Mhz 50 Mhz 200 Mhz Low frequencies high Vt more optimal High Frequencies low VDD more optimal 1 0.1 0.6 0.1 0 0.1 1.5 1 0.3 0.5 0 0.1 Minimum Power Point (Vt implicit) 250 Mhz Vt [Volts] 250 Mhz 0.5 2 1.1 VDD [Volts] 2.4 0.

35) DVS (Vt=0.35) DVS (V t=0.5 Power [Watts] DVS (V t=0.1 0.5 1 0.VDD/VT Optimization vs.5 0 0 50 100 DVS (Vt=0.15 0.2 0.14) Optimal V DD /V t Scaling Power [Watts] 0. DVS Dynamic voltage scaling ignores VT influence DVS is sub-optimal over the frequency range 0.25 3 2.14) Optimal VDD/Vt Scaling 50 100 150 150 200 250 Fre que ncy [M hz] Frequency [Mhz] J Kao ICCAD 2002 .05 0 0 2 1.

A.ASB Architecture Vt Controller ABB Generator VDD VBBN VBBP Decouple VDD/ Vt tuning loops ABB (Auto Body Biasing) generator chooses Vt based on VDD/ Freq/ etc. Simple VDD sweep to search minimum active power point Architecture ensures minimum power for any operating condition D Controller Variable DC/DC DSP CORE Power monitor M. J Kao ICCAD 200 . J. Kao.” ISSCC February 2002. Chandrakasan. “A 175mV Multiply-Accumulate Unit using an Adaptive Supply Voltage and Body Bias (ASB) Architecture. Miyazaki.

Chatterjee. et al ISLPED 2003 .Active Well vs VDD Scaling Leakage Power (for 65nm) Total Power (for 65nm) Reverse Body bias (Active Well) is a more effective leakage minimization technique than VDD scaling Total Power savings depends on ratio of leakage to total power – Vdd scaling reduces dynamic power much more than Active Well B.

Chatterjee.Background: Active Well vs VDD Scaling VDD scaling – Less effective than Active Well to reduce leakage – Causes higher degradation in performance – More effective in reducing dynamic power B. et al ISLPED 200 Active Well – More effective in reducing leakage – Does not work with SOI – Effectiveness reducing in newer technologies but still more effective than VDD scaling till 65n .

Total Power and VDD Scaling Total Power (for 130nm) Total Power (for 65nm) B. . Chatterjee. et al ISLPED 200 VDD scaling is very effective in current technologies to reduce dynamic power VDD scaling is more attractive if total power is dominated by dynamic power.

Nose. ECE.Vth hopping scheme (Variable Vth) Kaushik Roy. Purdue University K. JSSC ‘02 .

JSSC ‘02 .Vth hopping scheme Kaushik Roy. Purdue University K. Nose. ECE.

ECE.15 V Kaushik Roy.8 V body bias Vth 0.7 V PMOS body bias 0.9 V 0 time NMOS -1.45 V 0. Purdue University .The DVTS Scheme Fmax Frequency 2.

Purdue University .Implementation – overview error[n] = Fclock[n] – Fvco[n] CLK Counter + PMOS body bias + Feedback Alg. ECE. Charge Pumps NMOS body bias System Counter ÷N VCO Schematic of the DVTS system Kaushik Roy.

Forward Body-Biasing (50nm) 1.4 0.E-04 Nominal Vt + ZBB 4.2 Super high Vt=350m Gate voltage (V) Kaushik Roy.8 1 1.E-04 Super high Vt 2. to lower leakage Main idea: high Vt device + forward body-biasing .E-04 Drain Current (A/um) Super high Vt + FBB 3% 17% 6. ECE.2 0.E-04 + ZBB Nominal Vt=270mV 0. Purdue University • • • Previous techniques: use circuit/arch.E+00 0 0. to lower leakage This technique: use dev/ckt/arch opt.6 0.E-03 8.

. VPWELL . 32 . Purdue University WL31 MA MP . WL0 ......2x32 Forward Body-Biased Subarray 0.4V power supply SUBSL M1 M2 M3 . Kaushik Roy... ECE. MN 32 .. … ..

Purdue University . ECE.Body Transition Delay Hiding A[N-1:0] SUBSL VPNWE LL WL Standby standby active • Subarray turned on ahead of time using SUBSL • Extra time for body-bias transition to complete Kaushik Roy.

cache. SimpleScalar. 500M cycles 93% of the accesses hit the same subarray in the next access: locality of reference Transition energy wasted only in 7% of accesses Kaushik Roy. ECE.Body Transition Energy Reduction Percentage of consecutive accesses to same subarray 100% 80% 60% 40% 20% 0% 95% 92% 91% 96% 94% 95% 90% 93% c ip m pe cf rlb m vo k rt ex bz ip 2 vp r gc e 32KB L1 inst. Purdue University av er SPEC2000 gz ag .

Purdue University .06 0.12 0. ECE.04 0.10 0.itline Performance Under Iso-Leakage 0.3% faster than SBSRAM under iso-leakage Kaushik Roy.02 0.00 0 40 80 t=138ps Differential voltage (V) SenseAmp activates t=152ps t=164ps VBL VBLB Vdiff Conventional FBSRAM SBSRAM 120 160 200 Time (ps) SBSRAM delay penalty: 3 transistor stack FBSRAM delay penalty: + diffusion capacitance FBSRAM is 7.08 0.

Summary Leakage is critically dependent on process and environmental variations Leakage control through two promising techniques – Power supply control – Threshold voltage control Leakage will be the key design variable in next generation ICs .