You are on page 1of 70

2.

Physics of Power Dissipation in CMOS FET Devices

2. Physics of Power Dissipation in CMOS FET Devices


For an ideal MIS diode, the energy difference ms between the metal work function m and the semiconductor work function s is zero: ms m - (+ Eg/2q +B) = 0 (2.1)
where is the semiconductor electron affinity (from conduction band to vacuum level), Eg the band gap (from valence band to conduction band), B the potential barrier between the metal and the insulator, and B the potential difference between the Fermi level EF and the intrinsic Fermi level Ei.
2

The Fermi-Dirac Function


fFD(E) = 1/ (1 + exp ((E EF) / kT)) The Fermi-Dirac distribution function gives the probability that a certain energy state will be occupied by an electron. As in a gas, the electrons in a solid are in constant motion and consequently changing their energy and momentum.
3

P-type

CMOS Gate Power equations


P = CLVDD2f 01 + tsc VDD Ipeak f 0 1 + VDD Ileakage
Dynamic term CLVDD2f 01 Short-circuit term tsc VDD Ipeakf 0 1 Leakage term VDD Ileakage

The Maxwell-Boltzmann statistics relates the equilibrium hole concentration to the intrinsic Fermi level: p0 = ni exp((Ei EF)/kT) (2.2)

semiconductor is now qV below the Fermi level in the metal gate.)

P substrate (The Fermi level EF in the

P substrate

If the applied voltage is increased sufficiently, the bands bend far enough that level Ei at the surface crosses over to the other side of level EF. This is brought about by the tendency of carriers to occupy states with the lowest total energy. In the present condition of inversion the level Ei bends to be closer to level Ec and electrons outnumber holes at the surface.
9

Ei at the surface now is below EF by an amount of energy equal to 2 B , where B is the potential difference between the Fermi level EF and the intrinsic Fermi level Ei in the bulk.

10

The value of V necessary to reach the onset of strong inversion is called the threshold voltage.

11

Surface Space Charge Region and the Threshold Voltage


Poisson equation D = (x, y, z) (2.3) Where D, the electric displacement vector, is equal to s E under low-frequency or static conditions; s is the permittivity of Si; E the electric field vector; and (x, y, z) the total electric charge density.
12

13

Threshold voltage
VT = (2d/i ) * ( q s NA B (1 e-2B) )0.5 + 2B The total voltage needed to offset the effect of nonzero work function difference and the presence of the charges is referred to as the flat-band voltage VFB. VFB = ms QT*d/i
14

Threshold voltage
VT = (2d/i ) * ( q s NA B (1 e-2B) )0.5 + 2B + VFB

15

16

2.2.3.1 Effects Influencing Threshold Voltage


VT decreases when L (length) is decreased, varies with Z (width), and decreases when the drain-source voltage VDS is increased.

17

Drain-induced barrier lowering (DIBL) is the basis for a number of more complex models of the threshold voltage shift. It refers to the decrease in threshold voltage due to the depletion region charges in the potential barrier between the source and the channel at the semiconductor surface.
18

A recent model adopt a quasi twodimensional approach to solving the twodimensional Poisson equation. dEx/dx at each point (x, y) can be replaced with the average of its value at (0, y) and at (W, y)

19

Short channel effect


The minimum value of the surface potential increases with decreasing channel length and increasing VDS.

20

2.2.3.2 Subsurface Drain-Induced Barrier Lowering (Punchthrough)


The punchthrough voltage VPT defined as the value of VDS at which I D, st reaches some specific magnitude with VGS = 0. The parameter VPT can be roughly approximated as the value of VDS for which the sum of the widths of the source and the drain depletion regions becomes equal to L.
21

22

If the field in the oxide, Eox, is large enough, the voltage drop across the depletion layer suffices to enable tunneling in the drain via a near-surface trap. The minority carriers emitted to the incipient inversion layer are laterally removed to the substrate, completing a path for a gate-induced drain leakage (GIDL) current. In CMOS circuits this leakage current contributes to standby power.
23

2.3 Power Dissipation in CMOS


The first ICs ever fabricated used a PMOS process. This is due to the simplicity of fabrication of a pchannel enhancement mode MOS field-effect transistor (PMOST) with threshold voltage VTp < 0. The charge mobility factor caused the move to the NMOS process. Then change to CMOS because of the power dissipation problem.
24

This advantage of CMOS over NMOS has proven to be important enough that the shortcomings of CMOS are overlooked. The CMOS process is more complex than the NMOS, the CMOS requires use of guard-rings to get around the latch-up problem, and CMOS circuits require more transistors than the equivalent NMOS circuits.
25

26

The threshold voltages place a limit on the minimum supply voltage that can be used without incurring unreasonable delay penalties. If the threshold voltage is too low, the static component of the power due to subthreshold currents becomes significant.
27

28

2.3.1 Short-Circuit Dissipation


The short-circuit dissipation of the gate varies with the output load and the input signal slope. The short-circuit dissipation decreases linearly (roughly) in both absolute terms and a fraction of the total dissipation as the output load is increased to a critical value and then it will increase again rapidly.
29

For simplicity a symmetrical inverter (i.e., N = p and VTn = -Vtp;) and a symmetrical input signal (rise time = fall time) are considered. I = /2(Vin V T)2 for 0 I Imax
Imean = 1/T 0T I(t) dt = 2* 2/T t1t2 /2 (Vin (t) VT)2 dt
30

Assuming the rising and falling portions of the input voltage waveform to be linear ramps, Vin(t) = t* VDD/
Imean = 2*2/T(Vt/Vdd) /2 /2(t*VT/ VT)2 dt

Let = (VT/)t - VT
31

Imean = - 2/T(Vt/Vdd) /2 d Imean = 1/12*/VDD(VDD VT)3 /T The short-circuit power dissipation of an unloaded inverter is PSC = /12(VDD VT)3 /T
32

If the inverter is lightly loaded, causing output rise and fall times that are relatively shorter than the input rise and fall times, the short-circuit dissipation increases to become comparable to dynamic dissipation. To minimize dissipation, an inverter should be designed in such a way so that the input rise and fall times are about equal to the output rise and fall times.
33

2.3.2 Dynamic Dissipation


Assuming that the input Vin is a square wave having a period T and that the rise and fall times of the input are much less than the repetition period, the dynamic dissipation is given by PD = CL VDD2/T

34

35

When V = VDD, E 0->1 = CLVDD2. When energy stored in a capacitor with capacitance CL and voltage VDD across its plates is CL VDD2/2, the rest of the energy, another CL VDD2/2, is converted into heat.

36

Networks of pass transistors

37

38

2.3.3 The Load Capacitance

39

40

The overall load capacitance is modeled as the parallel combination of 4 capacitors the gate capacitance Cg, the overlap capacitance Cov, the diffusion capacitance Cdiff, and the interconnect capacitance Cint.

41

42

2.3.3.2 The Overlap Capacitance


Cgd1 = Cgd2 = 2 Cox xd W Cgd3 = Cgd4 = Cgs3 = Cgs4 = Cox xd W The total overlap capacitance is simply the sum of all the above:
Cov = Cgd1 + Cgd2 + Cgd3 + Cgd4 + Cgs3 + Cgs4

43

2.3.3.3 Diffusion Capacitance


Two components: the bottomwall area capacitance and the sidewall capacitance

44

2.4.1 Principles of Low-Power Design


Using the lowest possible supply voltage Using the smallest geometry, highest frequency devices but operating them at the lowest possible frequency Using parallelism and pipelining to lower required frequency of operation Power management by disconnecting the power source when the system is idle Designing systems to have lowest requirements on subsystem performance for the given user level 45 functionality

2.4.3 Fundamental Limits


The limit from thermodynamic principles results from the need to have, at any node with an equivalent resistor R to the ground, the signal power Ps exceed the available noise power Pavail. The quantum theoretic limit on low power comes from the Heisenberg uncertainty principle. In order to be able to measure the effect of a switching transition of duration t, it must involve an energy greater than h/ t: P h/ (t)2 where h is the Plancks constant.
46

Finally the fundamental limit based on electromagnetic theory results in the velocity of propagation of a high-speed pulse on an interconnect to be always less than the speed of light in free space, c0: L/ c0 where L is the length of the interconnect and is the interconnect transit time.
47

2.4.4 Material Limits


The attributes of a semiconductor material that determine the properties of a device built with the material are Carrier mobility Carrier saturation velocity s Self-ionizing electric field strength Ec Thermal conductivity K
48

Consider an SOI structure by surrounding the above generic device in a hemispherical shell of SiO2 of radius ri, indicating a twoorder-of-magnitude reduction in thermal conductivity.

49

The response time of the global interconnect circuit is = (2.3 Rtr + Rint) Cint where Rtr is the output resistance of the driving transistor and Rint and Cint are the total resistance and capacitance, respectively, of the global interconnect.
50

2.4.7 System Limits


The architecture of the chip The power-delay product of the CMOS technology used to implement the chip The heat removal capacity of the chip package The clock frequency Its physical size
51

Energy characterization
Transition-sensitive energy models
Single energy tables
Bit independent modules e.g., flipflops

Multiple energy tables


Large bit dependent modules e.g., 32-b adders Large multi-element modules e.g., register files

Transition sensitive energy equations System level interconnect capacitance values

Analytical energy modes


Cache and main memory
52

Transition-sensitive energy model


Must first design and layout a functional unit and then simulate it to capture switch capacitances
Bit independent bus lines, pipeline registers
One bit switching does not affect other bit slices operations Bit dependent ALU, decoders

Once constructed, the models can be reused in simulations of other architectures built with the same technology

53

Switch Capacitance Table


Previous Input Vector 000 000 111 Current Input Vector 000 001 111 Switch Capacitance cap00 cap01 Cap2n-12n-1
54

Table Compression
Problem
Results in large uncompressed table (e.g., 16-bit adder 232 rows) Excessive simulation (e.g., 232!)

Solution
Clustering Algorithm Reference: Huzefa Mehta, et al. Module Energy Characterization using Clustering, DAC96 For 16-bit adder, to keep 12% average error 1000 simulation points, 97 rows
55

2:1 Multiplexer Table


Uncompressed 64 rows

000 000
000 001 000 010 000 011 000 100 000 101 000 110 000 111 001 000 001 001

0.00
0.00 0.00 0.00 0.04 0.05 0.04 0.05 0.00 0.00

Compressed
32 rows 000 0xx 000 100 000 101 000 110 000 111 001 0xx 0.00 0.04 0.05 0.04 0.05 0.00
56

Reduced 11 rows 000 0xx 0.00 000 1xx 0.045

001 0xx 0.00


57

58

59

60

Memory System Energy Model


Parameterizable analytical energy models for the on-chip memories that capture
Energy dissipated by bitlines: precharge, read and write cycles Energy dissipated by wordlines: when a particular row is being read and written Energy dissipated by storage cell on access Energy dissipated by address decoders Energy dissipated by peripheral circuits cache control logic, comparators, etc.

Off-chip main memory energy is based on peraccess cost


61

Cache energy model example


On-chip cache
Energy = Ebus + Ecell + Epad Ecell = * (Wl_length) * (Bl_length + 4.8) * (Nhit + 2 * Nmiss) Wl_length = m * (T + 8L + St) Bl_length = C / (m * L) Nhit = number of hits; Nmiss = number of misses; C = cache size; L = cache line size in bytes; m = set associativity; T = tag size in bits; St = # of status bits per line; = 1.44e-14 (technology based cell access cost of SRAM) 62 Em = 4.95e-9 (technology based access cost of DRAM)

63

64

65

66

67

68

Architectural Level Analysis Considerations


Very computationally efficient
Requires predefined analytical and transitionsensitive energy characterization models Requires design only to RTL (with some idea as to the kind of functional units planned) Coarse grain use of gated clocks implicit

Reasonably accurate (within 5% - 15% of SPICE)


69

Simulation based so can be used to support architectural, compiler, OS, and application level experimentation WattWatcher (Sente), DesignPower and PowerCompiler (Synopsys), prototype academic tools (Wattch Princeton, SimplePower PSU)
70