This action might not be possible to undo. Are you sure you want to continue?
Physics of Power Dissipation in CMOS FET Devices
2. Physics of Power Dissipation in CMOS FET Devices
For an ideal MIS diode, the energy difference between the metal work function m and the semiconductor work function s is zero: ms m - ( + Eg/2q + B) = 0 (2.1)
where is the semiconductor electron affinity (from conduction band to vacuum level), Eg the band gap (from valence band to conduction band), B the potential barrier between the metal and the insulator, and B the potential difference between the Fermi level EF and the intrinsic Fermi level Ei.
The Fermi-Dirac Function
fFD(E) = 1/ (1 + exp ((E ± EF) / kT)) The Fermi-Dirac distribution function gives the probability that a certain energy state will be occupied by an electron. As in a gas, the electrons in a solid are in constant motion and consequently changing their energy and momentum.
CMOS Gate Power equations
P = CLVDD2f 0p1 + tsc VDD Ipeak f 0 p 1 + VDD Ileakage Dynamic term CLVDD2f 0p1 Short-circuit term tsc VDD Ipeakf 0 p 1 Leakage term VDD Ileakage
The Maxwell-Boltzmann statistics relates the equilibrium hole concentration to the intrinsic Fermi level: p0 = ni exp((Ei ± EF)/kT) (2.2)
semiconductor is now ±qV below the Fermi level in the metal gate.)
P substrate (The Fermi level EF in the
If the applied voltage is increased sufficiently, the bands bend far enough that level Ei at the surface crosses over to the other side of level EF. This is brought about by the tendency of carriers to occupy states with the lowest total energy. In the present condition of inversion the level Ei bends to be closer to level Ec and electrons outnumber holes at the surface.
Ei at the surface now is below EF by an amount of energy equal to 2 B , where B is the potential difference between the Fermi level EF and the intrinsic Fermi level Ei in the bulk.
The value of V necessary to reach the onset of strong inversion is called the threshold voltage.
Surface Space Charge Region and the Threshold Voltage
Poisson equation D = (x, y, z) (2.3) Where D, the electric displacement vector, is equal to s E under low-frequency or static conditions; s is the permittivity of Si; E the electric field vector; and (x, y, z) the total electric charge density.
VT = (2d/ i ) * ( q s NA B (1 ± e-2 B) )0.5 + 2 B The total voltage needed to offset the effect of nonzero work function difference and the presence of the charges is referred to as the flat-band voltage VFB. VFB = ms ± QT*d/ i
VT = (2d/ i ) * ( q VFB
(1 ± e-2 B
) )0.5 + 2 B
126.96.36.199 Effects Influencing Threshold Voltage
VT decreases when L (length) is decreased, varies with Z (width), and decreases when the drain-source voltage VDS is increased.
Drain-induced barrier lowering (DIBL) is the basis for a number of more complex models of the threshold voltage shift. It refers to the decrease in threshold voltage due to the depletion region charges in the potential barrier between the source and the channel at the semiconductor surface.
A recent model adopt a quasi twodimensional approach to solving the twodimensional Poisson equation. dEx/dx at each point (x, y) can be replaced with the average of its value at (0, y) and at (W, y)
Short channel effect
The minimum value of the surface potential increases with decreasing channel length and increasing VDS.
188.8.131.52 Subsurface Drain-Induced Barrier Lowering (Punchthrough)
The punchthrough voltage VPT defined as the value of VDS at which I D, st reaches some specific magnitude with VGS = 0. The parameter VPT can be roughly approximated as the value of VDS for which the sum of the widths of the source and the drain depletion regions becomes equal to L.
If the field in the oxide, Eox, is large enough, the voltage drop across the depletion layer suffices to enable tunneling in the drain via a near-surface trap. The minority carriers emitted to the incipient inversion layer are laterally removed to the substrate, completing a path for a gate-induced drain leakage (GIDL) current. In CMOS circuits this leakage current contributes to standby power.
2.3 Power Dissipation in CMOS
The first ICs ever fabricated used a PMOS process. This is due to the simplicity of fabrication of a pchannel enhancement mode MOS field-effect transistor (PMOST) with threshold voltage VTp < 0. The charge mobility factor caused the move to the NMOS process. Then change to CMOS because of the power dissipation problem.
This advantage of CMOS over NMOS has proven to be important enough that the shortcomings of CMOS are overlooked. The CMOS process is more complex than the NMOS, the CMOS requires use of guard-rings to get around the latch-up problem, and CMOS circuits require more transistors than the equivalent NMOS circuits.
The threshold voltages place a limit on the minimum supply voltage that can be used without incurring unreasonable delay penalties. If the threshold voltage is too low, the static component of the power due to subthreshold currents becomes significant.
2.3.1 Short-Circuit Dissipation
The short-circuit dissipation of the gate varies with the output load and the input signal slope. The short-circuit dissipation decreases linearly (roughly) in both absolute terms and a fraction of the total dissipation as the output load is increased to a critical value and then it will increase again rapidly.
For simplicity a symmetrical inverter (i.e., N = p and VTn = -Vtp;) and a symmetrical input signal (rise time = fall time) are considered. I = /2(Vin ± V T)2 for 0 I Imax Imean = 1/T 0T I(t) dt = 2* 2/T t1t2 /2 (Vin (t) ± VT)2 dt
Assuming the rising and falling portions of the input voltage waveform to be linear ramps, Vin(t) = t* VDD/ Imean = 2*2/T(Vt/Vdd) Let = (VT/ )t - VT
/2(t*VT/ ± VT)2 dt
Imean = - 2 /T(Vt/Vdd)
d Imean = 1/12* /VDD(VDD ± VT)3 /T The short-circuit power dissipation of an unloaded inverter is PSC = /12(VDD ± VT)3 /T
If the inverter is lightly loaded, causing output rise and fall times that are relatively shorter than the input rise and fall times, the short-circuit dissipation increases to become comparable to dynamic dissipation. To minimize dissipation, an inverter should be designed in such a way so that the input rise and fall times are about equal to the output rise and fall times.
2.3.2 Dynamic Dissipation
Assuming that the input Vin is a square wave having a period T and that the rise and fall times of the input are much less than the repetition period, the dynamic dissipation is given by PD = CL VDD2/T
When V = VDD, E 0->1 = CLVDD2. When energy stored in a capacitor with capacitance CL and voltage VDD across its plates is CL VDD2/2, the rest of the energy, another CL VDD2/2, is converted into heat.
Networks of pass transistors
2.3.3 The Load Capacitance
The overall load capacitance is modeled as the parallel combination of 4 capacitors ± the gate capacitance Cg, the overlap capacitance Cov, the diffusion capacitance Cdiff, and the interconnect capacitance Cint.
184.108.40.206 The Overlap Capacitance
Cgd1 = Cgd2 = 2 Cox xd W Cgd3 = Cgd4 = Cgs3 = Cgs4 = Cox xd W The total overlap capacitance is simply the sum of all the above:
± Cov = Cgd1 + Cgd2 + Cgd3 + Cgd4 + Cgs3 + Cgs4
220.127.116.11 Diffusion Capacitance
Two components: the bottomwall area capacitance and the sidewall capacitance
2.4.1 Principles of Low-Power Design
Using the lowest possible supply voltage Using the smallest geometry, highest frequency devices but operating them at the lowest possible frequency Using parallelism and pipelining to lower required frequency of operation Power management by disconnecting the power source when the system is idle Designing systems to have lowest requirements on subsystem performance for the given user level 45 functionality
2.4.3 Fundamental Limits
The limit from thermodynamic principles results from the need to have, at any node with an equivalent resistor R to the ground, the signal power Ps exceed the available noise power Pavail. The quantum theoretic limit on low power comes from the Heisenberg uncertainty principle. In order to be able to measure the effect of a switching transition of duration t, it must involve an energy greater than h/ t: P h/ ( t)2 where h is the Planck¶s constant.
Finally the fundamental limit based on electromagnetic theory results in the velocity of propagation of a high-speed pulse on an interconnect to be always less than the speed of light in free space, c0: L/ c0 where L is the length of the interconnect and is the interconnect transit time.
2.4.4 Material Limits
The attributes of a semiconductor material that determine the properties of a device built with the material are Carrier mobility Carrier saturation velocity s Self-ionizing electric field strength Ec Thermal conductivity K
Consider an SOI structure by surrounding the above generic device in a hemispherical shell of SiO2 of radius ri, indicating a twoorder-of-magnitude reduction in thermal conductivity.
The response time of the global interconnect circuit is = (2.3 Rtr + Rint) Cint where Rtr is the output resistance of the driving transistor and Rint and Cint are the total resistance and capacitance, respectively, of the global interconnect.
2.4.7 System Limits
The architecture of the chip The power-delay product of the CMOS technology used to implement the chip The heat removal capacity of the chip package The clock frequency Its physical size
Transition-sensitive energy models
± Single energy tables
Bit independent modules e.g., flipflops
± Multiple energy tables
Large bit dependent modules e.g., 32-b adders Large multi-element modules e.g., register files
± Transition sensitive energy equations ± System level interconnect capacitance values
Analytical energy modes
± Cache and main memory
Transition-sensitive energy model
Must first design and layout a functional unit and then simulate it to capture switch capacitances
± Bit independent ± bus lines, pipeline registers
One bit switching does not affect other bit slices¶ operations Bit dependent ± ALU, decoders
Once constructed, the models can be reused in simulations of other architectures built with the same technology
Switch Capacitance Table
Previous Input Vector 0«00 0«00 « 1«11 Current Input Vector 0«00 0«01 « 1«11 Switch Capacitance cap0 0 cap0 « Cap2n-1
n 2 -1
± Results in large uncompressed table (e.g., 16-bit adder p 232 rows) ± Excessive simulation (e.g., 232!)
± Clustering Algorithm Reference: Huzefa Mehta, et al. ³Module Energy Characterization using Clustering´, DAC¶96 ± For 16-bit adder, to keep 12% average error p 1000 simulation points, 97 rows
2:1 Multiplexer Table
Uncompressed 64 rows 000 000 000 001 000 010 000 011 000 100 000 101 000 110 000 111 001 000 001 001 « 0.00 0.00 0.00 0.00 0.04 0.05 0.04 0.05 0.00 0.00 « Compressed 32 rows 000 0xx 000 100 000 101 000 110 000 111 001 0xx « 0.00 0.04 0.05 0.04 0.05 0.00 «
Reduced 11 rows 000 0xx 0.00 000 1xx 0.045 001 0xx 0.00 « «
Memory System Energy Model
Parameterizable analytical energy models for the on-chip memories that capture
± Energy dissipated by bitlines: precharge, read and write cycles ± Energy dissipated by wordlines: when a particular row is being read and written ± Energy dissipated by storage cell on access ± Energy dissipated by address decoders ± Energy dissipated by peripheral circuits ± cache control logic, comparators, etc.
Off-chip main memory energy is based on peraccess cost
Cache energy model example
± Energy = Ebus + Ecell + Epad ± Ecell = F * (Wl_length) * (Bl_length + 4.8) * (Nhit + 2 * Nmiss) ± Wl_length = m * (T + 8L + St) ± Bl_length = C / (m * L) ± Nhit = number of hits; Nmiss = number of misses; ± C = cache size; L = cache line size in bytes; ± m = set associativity; T = tag size in bits; ± St = # of status bits per line; ± F = 1.44e-14 (technology based cell access cost of SRAM) 62 ± Em = 4.95e-9 (technology based access cost of DRAM)
Architectural Level Analysis Considerations
Very computationally efficient
± Requires predefined analytical and transitionsensitive energy characterization models ± Requires design only to RTL (with some idea as to the kind of functional units planned) ± Coarse grain ± use of gated clocks implicit
Reasonably accurate (within 5% - 15% of SPICE)
Simulation based so can be used to support architectural, compiler, OS, and application level experimentation WattWatcher (Sente), DesignPower and PowerCompiler (Synopsys), prototype academic tools (Wattch ± Princeton, SimplePower ± PSU)
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.