You are on page 1of 54

LOW POWER DESIGN

CIRCUIT LEVEL
• CMOS only circuit design techniques – does not
consume any static power execept leakage
• Increasing requirements for speed and
functionality tend to lead classic static CMOS
logic to limits of acceptable power consumption
• Different CMOS logic styles and special circuit
design techniques proposed for improving the
power characteristics as well as speed
CIRCUIT LEVEL
• When speed and area are taken into account for the different design
techniques
• Many factors can influence the efficiency of each of the proposed
techniques
• Signal probabilities are the determinative factor for the use – it is not
of dynamic logic in design
• Pseudo nMOS reduce the power when used in complex logic
function with high frequency switching
• In this chapter we will discuss
• Several logic styles in terms of performance, area and power
consumption
• Overview of latches and flip flops focusing on power
characteristics
• Reducing power dissipation based on transistor sizing and
reordering
• Energy issues in the design of drivers for large loads
LOGIC STYLE
• Discussion about the influence of each logic
style on speed, size and power dissipation
• Concept and power considerations –relative to
other logic families
• Static logic
• Static means that the output of a logic gate at
every time point is connected through a low
resistance path to the power supply rails
• Static CMOS gates will be given
• A static CMOS gate consists of a pull up network
and pull down network
• Static CMOS has the following characteristics
• Ratioless logic
• High noise margins which offer low sensitivity to
noise
• Sufficient sped, especially for small gates
• Comparable rise and fall times under
appropriate scaling
• Ease of design
LOGIC STYLE
POWER CONSIDERATIONS
• Other three power components namely dynamic, short
circuit and leakage power
• If transistors are properly sized short circuit power is less
than 10% of the total power
• Recent techniques attempt to operate the circuits at
supply voltage which are less than the sum of pMOS and
nMOS threshold voltages
• Leakage power is due a) leakage currents due to
reverse bias diodes which are present at a transistor’s
drain and b) subthreshold leakage currents due to carrier
diffusion between source and drain, when the gate to
source voltage of a transistor is below the threshold
voltage
POWER CONSIDERATIONS
• Variation of static logic –called branch based logic
provides a layout optimization for low power by reducing
parasitic node capacitances. Logic cells are designed
exclusively with branches which are implemented by
transistor in series between supply and output
• This new approach, achieves improved speed and lower
power consumption compared to conventional libraries
with a large number of complex gates
• Example: 16 bit adder has been shown – have lower
static and dynamic power dissipation compared to an
equivalent complementary CMOS adder
DYNAMIC LOGIC
• Reduce the transistor count, increase speed and to avoid static
power consumption two phases : precharge and conditional
evaluation phase
• Power consumption
• Power is consumed during the precharge phase each time the
output capacitor is discharged in the preceding cycle
• Dynamic gate can consume power even if the inputs remain
constant.
• The output transition probability which determines the dynamic
power consumption depends only on the signal probabilities
• P0 ---1 = N0/2N
• N0-number of 0’s for the output signal in the truth table
• N – number of inputs
Power considerations
• Leakage power carefully taken into account
when a ckt operates in stand by mode for large
periods of time
• Variation of static logic – branch based logic
(BBL) provides layout optimization for low power
by reducing parasitic node
• Capacitance
• Large cells – designed exclusively with branches
– implemented by transistor in series between
supply and output
Branch Based Logic
• Used to build a cell library with a limited number
of standard cells
• Achieves improved speed and low power
consumption compared to conventional libraries
with a large number of complex gates
• Better optimization enabled by a limited choice
of cells
• BBL – 16 bit adder has been shown to have
lower static and dynamic dissipation compared
to an equivalent CMOS adder
Dynamic logic
• Reduce transistor count
• Increase speed
• Avoid static power consumption which
present in pseudo nMOS
dynamic gates are clocked and based on
the sequence of 2 phases
• Precharge and evaluation
• Precharge pMOS conducts and output
node is precharged nMOS cutoff
Dynamic logic
• No DC current flows regardless of input signals
• Evaluation: pMOS off nMOS ON – depending
upon inputs a conditional path between out and
Gnd
• If no path output node remains in precharge
causing high output value
• One transition during the evaluation phase
• If not redistribution occurs, corrupt the output
node voltage
• Drawback doesn’t allow single phase dynamic
gate to be cascaded

Power considerations
• In dynamic implementation power is
consumed every time the output equals ‘0’
• Dynamic gate can consume power even if
the inputs remain constant
• Output transition probability determines
the dynamic power consumption depends
only on the signal probabilities
• Uniformly distributed inputs, the transition
probability


Power considerations
• P
0 1
N
0
/2
N
• N
0
number of 0’s for the output signal in truth
table and N – number of inputs
NOR
2
gate
Dynamic power consumption by the dynamic
implementation
P
NOR
= 0.75 C
L
V
dd
2
f
c
Static power consumption is significantly smaller
P
NOR
= 3/16 C
L
V
dd
2
f
c

Power considerations
• The capacitance being switched is
dynamic logic < static implementation
• Total power include the power dissipated
in driving the capacitance of the clock lines
Advantages of static Vs dynamic
logic

• Two and 4 phase clocking strategies have
been developed to overcome the problem
of cascading dynamic gates (8 clock signal
not suited for low power
• To correct the problem modifications to the
basic dynamic logic style
Domino, np-CMOS and NORA logic
Comparison
Static logic Dynamic logic
Glitches 30% energy increase Intrinsically does not
have this problem
Short ckt current Less than 10% of the total power
Parasitic cap - Fewer transistors,
reduced switched
capacitance
Switching activities Depends on previous
state
Does not depend on
previous state.
Generally higher
activity factor
Power down models Effectively used Not well suited
Clock power No clock Due to gate cap. Of
precharge MOS
transistor
Pass transistor logic
• To reduce physical capacitance
• Boolean function are implemented as a network
of switches, realized by pass transistor
• Series connection – AND function Parallel – OR
function
• Relatively expensive for simple monotonic gates
• Efficient in terms of transistor count for XOR and
MUX
• XOR implementation in CMOS – 12 transistors
• In Pass transistor - 4 transistors
Pass transistor logic
• Full adder requires - 28
• Pass transistor - 24
• Pass transistor presents the inherent problem of
the threshold drop across a transistor
• Causes static power dissipation and requires the
addition of level restoring transistors
• n Channel CMOS:
• Not suited for low supply voltage
• Most important logic style is complementary
pass transistor
Complementary pass transistor
• Logic (CPL) consisting of 2 nMOS logic
• 2 small pull ups pMOS for level restoration
• CPL logic in ratio less and high noise margins
enable reliable operation even at low voltages
• CPL
• Output driving capability due to output inverters
• Fast differential stage due to cross coupled
pMOS
Complementary pass transistor
• Small input loads reduces overall cap
switched
• Power consumption is lower and rise/fall
times are faster

• Application
High performance application multipliers
Power consideration
• CPL gates count fewer transistors – small
transistor size – smaller node capacitance
• Significant power reduction can be
achieved
• Threshold drop
• Static power dissipation of the output
• Inverter is properly addressed
Example
• Pass gate family adder with zero threshold
pass transistor at a supply voltage of 4 V
• Consumes 30% less energy than
conventional static design
Full adder simulation result
Logic family Delay ns
3.3 V 1.5 V
Power mw
3.3 V 1.5 V

Power delay
3.3 V 1.5 V

CMOS
1.89 7.88 32.9 6.4 1.00 1.00
CPL
1.39 8.33 34.1 6.0 0.76 0.99

Providing 50 % energy savings
Multiplier based full adder and using modified booth – power savings of 18 %
Speed improvement of 30 %
PASS TRANSISTOR
• Static power proves to be superior to all
pass transistor logic style both delay and
power for all logic gates except for the full
adder at higher supply voltages
• Pass transistor logic not the best choice
for low power design.
• Full adder is based on XOR gates and 2:1
MUX are suitable for pass transistors
SPL
• Number of full adders are limited compared to
other logic gates and flip flop
• Single rail pass transistor logic (SPL) is viable
alternative if low power and compatibility
• Pass transistor logic has been increased during
the last few years
• Proved by the large number of designs
• Synthesis methodologies that target pass
transistor logic
• Starting from higher level, tech independent
design specification
Single rail pass transistor logic
• Known as single ended pas transistor logic and
refer as LEAP ( LEAn integration using pass
transistor)
• Offers a promising low power ckt design
• It is simplest member of the pass transistor logic
family
• Like CPL uses only nMOS transistor in the pass
network
• It doesnot implement the second pass transistor
network for the complementary signals, which
are generated locally if required
SPL
• 3 main components
• Input inverters that buffer inputs and generate all
signal for the pass transistor network
• Pass transistor network that implements the
logic function n type transistor, the output swing
at the end of the network will be 0V to V
dd
- V
tn

• Output buffers for speed improvement including
a weak pMOS transistor for voltage level
restoration and elimination of short ckt currents
in the output inverter
SPL
• Optimum power delay product output
buffer must be inserted every 3 to 4 stages
– more buffers in the critical path
• Basic element in the pass transistor is the
2 input MUX each mux is a node in the
BDD (binary decision diagram)
Advantages
• SPL library has no more than 10 components it
has only 3 main components
• Most functions are based on MUX and XOR –
implemented efficiently in terms of transistor
count
• Pass transistor network contains only nMOS
transistor resulting in a compact layout with fast
operation
• Ckt synthesis starting from BDD - automated
POWER
• Power consumption is very sensitive to the
min voltage of operation
• nMOS transistor is better down to 1V
when V
t
=0.4V
• Lower threshold may become a severe
constraint against low power design
Report
• 7 input and 4 output reduction in power consumption
around15.5% with 31% increase speed
• 4 bits adder/sub circuit no significant power reduction
• 3.3V and 0.35 μm tech the power consumption of a full
adder and 4 bits adder is about ½
• Delay is worse and significantly greater for lower supply
voltage
• SPL does not work for low supply voltage (1.5V)
• Final conclusion: power delay performance of SPL logic
has to be investigated for future deep submicron
technologies
Other logic styles
• Pseudo nMOS
• N inputs N+1 transistor are required, resulting in smaller
area and smaller parasitic capacitance
• Logic is ratioed, transistor sizes have to be selected
• Static power (min size gate consumes 1 mw
• 1,00,000 gates consumes 50 W ( half of them output is
low output)
• Reduced power is complex logic function switching at
higher frequency where dynamic power is less due to
reduced capacitance
• Ckt makes low to high large pMOS is on large current
and fat transition
• Suited for gates that should switch only during certain
time periods (decoder)

Differential voltage logic style
(DCVSL)
• Eliminates static current in pseudo nMOS based
on dual rail
• Advantages
• Faster switching due to reduced cap.
• Pseudo nMOS the DCVSL exceeds in that there
is no static power
• Current during switching increases due to the
large pull up transistor
• Two pulldown network – area increase but
output (both are available)
• Dynamic DCVSL – power hungry

Differential current switch logic
(DCSL)
• Suitable for high fan in gates, restricting
internal node voltage swings (1V for 5V
supply voltage)
• Evaluation is complete DCSL gate does
not respond to its input – latch followed by
combinational ckt
• Moderate tree DCSL reduces power
dissipation 1/2
DCSL problems
• Precharged differential logic it has high
activity factor
• Sensitive to noise
• AND/NAND can’t be implemented
• Balanced layout techniques
Charge recycling Differential logic
• Power consumption reduced by using
some of already used charge in precharge
• Half supply precharge level is achieved.
50% of that with full swing
• 0.8 µm 5V manchester carry chain 27 %
improvement in power dealy product
compared with DCVSL
Push pull pass transistor logic
• Similar to CPL
• Pass transistor network employs two complementary
pass transistor networks
• Complementary network turns the corresponding pull up
or pull down transistor for the threshold voltage drop
• Push pull eliminated the need for output buffers with
restoring transistors
• Good low power choice for logic style
• 40 stage full adder indicate a power delay product of
only 60 % compared to SPL
• 0.8 μm 3.3 V CMOS technology power delay product for
PPL 42% , 63% CPL and 78% SPL implementation for
multiplier
Logic styles - Discussion
• Not possible to conclude a specific logic style is the
optimum in terms of both performance and power
consumption
• SPL has been promising logic style in the era of low
power designs – reduced transistor count for complex
functions – drawback is submicron technology (below
1V)
• Analysis should not focus on dynamic power due to
charging and discharging but also short circuit power
dissipation, energy consumption due to glitching activity,
sub threshold currents and ability to benefit from future
developments (ultra low supply voltages) should be
taken
Logic styles - Discussion
• Static logic remains the most reliable logic
style
• Simple, robust and relatively low power
techniques, ease of design and
advantages of being supported by the
majority of electronic design automation
(EDA) tools
Latches and Flip-flops
• Clock distribution network and clocked
registers are flipflops and latches
• Clock distribution network a substantial
part of the system total power
consumption
• To save power the clocked capacitance
should be minimum
LATCHES
• Dynamic latches are the simplest and most efficient
timing circuits.
• Classic latch – fastest latch (due to true single stage),
transmission gate followed by inverter
• C
2
MOS latch –slower, robust, more power efficient (no
contact at the intermediate nodes)
• Both have four clocked transistors (including the inverter,
three are loading the clock input)
• True single phase clocked (TSPC) half latch- isolate high
inputs at clock low
• Non-precharged TSPC latch – two of the TSPC half latch
ckts- slower , more robust and only two clocked
transistors
LATCHES
• Dual rail CVSL logic – depends on transistor ratios – n-
transistor must be stronger than p-transistor (flip the
latch)
• Dynamic single transistor clocked (DSTC) latch – uses
common clocked transistor to save power, fast and
power efficient but sensitive to input glitches (hold state)
• In simulation Non precharged true single phase clocked
latch (NPTSPC) – lowest power (due to two clocked
transistors and the lack of precharging).
• TSPC and PTSPC is next for power consumption
STATIC LATCHES
• Positive feedback (cross coupled inverter)
• 3GATE latch – more complex, low power
consumption (four clocked transistors)
• TGATE latch – classical transmission gate – 6
clocked transistors
• Above two are based on standard CMOS cell
libraries
• SRAM memory cell based ( 6 transistors) –
depends on transistor ratios
• Static version of SSTC
STATIC LATCHES
• SSTC and SRAM – highest speed and low
power consumption. SSTC operation
speed is higher than SRAM. p version of
these latches – worse performance –
depend on transistor ratios
• 3GATE latch less power consumption than
TGATE but worse speed

DYNAMIC FLIP FLOP
• Dynamic f/f – constructed by cascading two
latches with different polarities used in classic,
C
2
MOS and NPTSPC
• Efficient f/f constructed by p-half TSPC latch and
PTSPC latch (9T latch). Inverter is used for
complementary output.
• Dual rail f/f using DSTC latch with only two
clocked transistors
• Simulation results shows DSTC,NPTSPC and
9T f/f – lowest power consumption. NPTSPC
and 9T f/f can improve the speed without
complementary output

STATIC FLIP FLOP
• 3GATE, TGATE and SSTC flip flop is
shown
• SSTC uses only two clocked transistors –
very low power consumption – minimum
delay
• 3GATE f/f consumes low power compared
to TGATE
DOUBLE EDGE TRIGGERED F/F
• Triggered on both edges – clock frequency can be half
for the same data rate – reduce the power dissipation on
the clock distribution
• Double edge triggered (DET) and Single edge triggered
(SET) - static and dynamic – both has two D type
latches – DET latches are arranged in parallel whereas
SET latches are serial
• DET lower power consumption (20%) less than SET
• Dynamic SET f/f being slightly faster – requires clock
operating twice of dynamic DET.
• Dynamic f/f consumes less power than static
• System level energy saving is possible in DET. Example
proves that DET can save about 17% power
TRANSISTOR ORDERING
• Relative placement and ordering will reduce the power
• By reordering the transistors in CMOS gate – switching activity
• Power dissipation can be reduced by two ways – minimizing the
drain –source capacitance and by signal probability algorithms
• Highest capacitance have to placed closest to the supply and
ground (fig b)
• Signal probability algorithm pull down network was reported to
be better (fig a)
• The reduction in power dissipation (15,1% for worst case and
7.2 % average)
• MUX, adder and ALU 12 % average power reduction with 4 %
increase in delay
• Reordering rules reduces power by about 10% on average and
30% in some cases
TRANSISTOR SIZING
• Minimizing the power consumption under a given delay
constraint by sizing the transistor. Two algorithms
• Algorithms that start with a circuit that satisfies the timing
constraint and reduce the size of the gates to reduce
power dissipation
• Performing an initial power optimal sizing on each gate
• Power minimal layout satisfies the delay constraint, the
process terminate
• Algorithm is more complex – it takes into account for
circuit capacitance and short circuit power dissipation
• Power consumption of a CMOS ckt is a convex function
DRIVERS FOR LARGE LOADS
• To drive high capacitance loads with reasonable speed
with short rise/fall time
• Long rise/fall time – larger short ckt power consumption
• Ex: clock networks, clock drivers, long buses, long
interconnects and chip outputs
• To drive big loads – tapered inverter chain is used –
scaling factor in a uniformly tapered buffer – minimizes
the power delay product
• Simulation shows 15 -35 % savings in power delay
compared with min. propagation delay
• Non uniform tapering shows 8 % improvement in
dynamic switching – improvement becomes smaller 3- 5
%for total switching energy
DRIVERS FOR LARGE LOADS
• Use uniform buffer, much simpler and provides better
insight into the optimization of power delay product
• α – tapering factor Y= C
L
/C
i
• C
i
– input capacitance of the first inverter
• α
N
= Y; N = lnY/ln α
Total delay time τ
tot
=(lnY/lnα) α τ
i
τ
i


propagation delay of the first inverter
Differentiating with respect to α optimum value of α = e=2.72
Total capacitance of the chain C
tot
= α(Y-1)C
i
/(α-1)
Power delay product of inverter chain
P
tot
τ
tot
directly proportional =(lnY/lnα) α * α(Y-1) /(α-1)
Differentiating with respect to α optimum power delay
product α = 4.25

DRIVERS FOR LARGE LOADS
• Tapering factor increases the total
capacitance and power consumption
decreases
• α is increased from 3.5 to 9 power
consumption overhead reduced from 80 %
to 25 % and the cost delay increases by
20 %.
• Very large value of α, the delay will be too
large

Conclusions
• Different logic style, circuit structures and
circuit design techniques have been
presented