You are on page 1of 108

Circuit-Level Low-Power

Techniques

Naehyuck Chang
Dept. of EECS/CSE
Seoul National University
naehyuck@snu.ac.kr

1
Contents
 Traditional low-power circuits
 Power-aware circuit design techniques
 Leakage-aware circuit design techniques

Embedded Low-Power Laboratory 2


Traditional Low-Power Circuit Design
 Circuit design styles
 Non-clocked
 Fully complementary logic, pass transistor, etc.
 Slower, but consumes less power
 Clocked
 Domino
 Faster, but consumes more power
 Trade-off between performance and power
 Faster logic consumes more power

Embedded Low-Power Laboratory


Non-clocked: Fully Complementary Logic
 AKA. CMOS
 Active mode
 Switching/short-circuit current
 Glitches or spurious transitions due to different delays through
different paths of the circuit
 Stand-by mode
 Leakage current
 High noise margin
 Can reduce the threshold voltage
 Performance degrading factor
 Large PMOS
 Large input capacitance
 Weak output driving

Embedded Low-Power Laboratory


Non-clocked: NMOS and Pseudo-NMOS
 AKA. Ratioed logic
 Pull-up resistance is higher than pull-down resistance
 Smaller area than CMOS
 Good for large fan-in gates
 Higher power dissipation than CMOS due to the static current
 DC current

NMOS Pseudo-NMOS
(depletion-mode)

Embedded Low-Power Laboratory


Non-clocked: DCVS
 Differential cascade voltage switch
 A differential output signal is available
 Eliminates the static power in the ratioed logic
 f(network 1) = ~f(network 2)
 Larger switched capacitance
 Higher switching power
 Can be reduced by the sharing between two networks

Embedded Low-Power Laboratory


Non-clocked: Pass Transistor Logic (PTL)
 AND: connected in series
 OR: connected in parallel
 NMOS: good to transmit “0”, but not for “1”
 CPL: Complementary Pass-transistor Logic
 Different input/output signals
 Power-delay product is 10% better than CMOS

Level restorer +

PTL: AND CPL: NAND/AND CPL: XOR/XNOR


Embedded Low-Power Laboratory
Clocked: Domino
 Clock = 0: output is precharged
 Clock = 1: evaluated (conditionally discharged)
 Only implements non-inverting logic gates
 Good for large fan-in gates
 Clock switching: high power

Keeper

Domino NAND

Embedded Low-Power Laboratory


Power-Aware Circuit Design Techniques
 Switched capacitance

 Linearly proportional to CMOS dynamic power consumption


 Logic-level power optimization
 Try to minimize the total switched capacitance of the target logic
 Assign the job of low switching activity to the node with large load
capacitance
 Remove unnecessary switching such as glitches

Embedded Low-Power Laboratory 9


Power-Aware Circuit Design Techniques
 Combinational logic optimization
 Block the generation of unnecessary spurious transitions
 Reconstruct the logic to reduce the switched capacitance while
maintaining I/O semantics of target logic
(ex) Path balancing, logic factorization, technology mapping, “Don’t-
care” optimization, etc.
 Sequential logic optimization
 Reduce the switching in registers or invoked by their values
 Block the propagation of unnecessary async. signals
(ex) State encoding, retiming, clock gating, precomputation, etc.
 Dynamic power optimization techniques
 Pass balancing, “Don’t care” optimization, logic factorization,
technology mapping, state encoding, retiming, clock gating, etc.

Embedded Low-Power Laboratory 10


Path Balancing
 Equalize the delay of input paths of each gate to reduce the
possibility of spurious transitions
 Spurious transitions are reported to amount to 10~40% of all
switching activities
 Balancing the paths
 Increase the possibility of simultaneous transition at the input
 Decrease the possibility of hazards at the output

Embedded Low-Power Laboratory 11


Path Balancing
 Balance the paths by restructuring the logic circuit

 Balance the paths by inserting unit-delay buffers

Embedded Low-Power Laboratory 12


“Don’t-care” Optimization
 Traditionally have been used for area minimization
 Include appropriate “don’t-care” sets in either the ON set or the
OFF set
 Exploit the “don’t-care” set so as to decrease the output
transition probability
 Include the “don’t-care” set in the ON set if Pone(F) > 0.5
 Include the “don’t-care” set in the OFF set if Pone(F) < 0.5

* Transition probability of CMOS: Ptransition(F) = 2 Pone(F) (1- Pone(F))


 Maximized when Pone(F) = 0.5 (Pone(F): the probability of F being 1)
Ptransition(F)
CMOS
gate
F

Embedded Low-Power Laboratory 13


Logic Factorization

 Have been commonly used for area optimization


 Reduce literal count to minimize the number of transistors being
used to represent the target logic

 Perform the factorization to reduce the switched capacitance


 The smaller literal count does not guarantee the smaller switched
capacitance unlike the case of area optimization
 Should consider both the transition probability at the input and
the load capacitance

Embedded Low-Power Laboratory 14


Logic Factorization

 Should select the circuit (a) for area optimization


 Should select the circuit (b) for power optimization

(a)

(b)

Embedded Low-Power Laboratory 15


Technology Mapping
 The process of binding a set of logic equations to the gates in
target cell library
 Have been originally developed to optimize area and delay
 Hide nodes with high switching activity inside the gates
 Generally, internal capacitances in gates are much smaller than
external load capacitances
 Select the library with same function but different
capacitances while meeting the delay constraints
 Most technology libraries include the same logic element with
different sizes

Embedded Low-Power Laboratory 16


Technology Mapping

 Should select the circuit (a) for area optimization


 Should select the circuit (b) for power optimization

(a)

Gate Area Intrinsic Input load


cap. cap.

INV 928 0.1029 0.0514


NAND2 1392 0.1421 0.0747
(b)
AOI22 2320 0.3410 0.1033

Embedded Low-Power Laboratory 17


State Encoding

 The process of assigning a unique binary code to each state


in a FSM (Finite State Machine)
 Have been studied well for area minimization
 Assign codes with smaller Hamming distance to states with
larger state transition probability when focusing on low power
 Minimize the following cost function

Embedded Low-Power Laboratory 18


State Encoding
 Gray coding example
State Gray Binary
S1 0 0
S2 1 1
S1
S8 S2 S3 11 10
S4 10 11
S7 S3 S5 110 100
S6 111 101
S6 S4 S7 101 110
S5 S8 100 111
Total # of transitions 8 14
Max. transitions / cycle 1 3

 Need to consider not only the switching activity in the state registers
but also in the combinational logic affected by assigned codewords
for further optimization
Embedded Low-Power Laboratory 19
Retiming
 The process of repositioning registers (FFs) in a pipelined
circuit (while maintaining I/O functionality)
 First proposed to minimize the number of registers or the delay
of the critical path (the longest pipeline stage)
 Pipeline the circuit by adding a register
 Block the glitch propagation to the large load cap. (CL)
 Generally, input load cap. of registers are much smaller than CL

Embedded Low-Power Laboratory 20


Retiming
 Move registers to nodes with higher switching activity
 Maintain I/O functionality and (sequential) timing
 May change the switching activity at one or more nodes
 Choose the circuit with less switched capacitance

Embedded Low-Power Laboratory 21


Retiming
 Add a register with different clock phase
 Maintain I/O functionality and (sequential) timing while placing
more registers in the pipeline
 Replace an existing register with multiple non-overlapping level-
clocked latches or registers synchronous to different phase clocks
and reposition them over the circuit

Embedded Low-Power Laboratory 22


Clock Gating
 Provide a way to selectively stop the clock
 Force the circuit to make no switching whenever the computation
at the next cycle is unnecessary
 Should be implemented as follows
 Construct an idleness-detecting circuit which is small (i.e., consume
little power) and accurate (i.e., able to stop the clock whenever idle)
 Design gated-clock distribution circuit with minimum routing
overhead
 Keeps clock skew under tight control

Embedded Low-Power Laboratory 23


Clock Gating
 Gate the clock of infrequently accessed blocks for large
circuits
 For example, gate the clock of register files, which need not be
updated every cycle in general, for processors
 Synthesize FSMs with gated clock
 Add a new activation signal to selectively stop the local clock for
the FSM when no state transition occurs
 Used by high-level power management such as DPM
 Have a smaller turn-on/off performance overhead than supply
shutdown
 However, still consume static power in the target circuit and dynamic
power in the clock circuit

Embedded Low-Power Laboratory 24


Clock Gating
 Applicable to sequential circuits
 Modeled as FSMs or sequential networks
 Exploit internal idleness:
 Find conditions under which state and outputs do not change
 Disable clock (by gating) under these conditions
 Save power
 On the local clock line
 In the registers
 In the combinational logic

Embedded Low-Power Laboratory


Clock Gating
 One activation function and a latch
 fa: activation function

Embedded Low-Power Laboratory


Clock Gating
 FSM conversion
 Transform Mealy FSM to Moore FSM

Embedded Low-Power Laboratory


Clock Gating
 Synthesis and optimization
 Determine disabling conditions:
 Analyze state transitions and frequencies
 Split states to isolate Moore states
 Activation function:
 Idleness observation block
 Defined by Moore-states self-loops
 Implement activation function:
 Trade off power of disabling circuit with power
 Saved by stopping clock
 Implement activation sub function

Embedded Low-Power Laboratory


Clock Gating
 Activation function design
 Given an activation function:
 Find a sub-function:
 Whose literal cost is minimum and such that its probability to be true is a
predefined fraction of that of the activation function
 Exact and heuristic solution:
 Based on prime implicant generation and on constrained covering

Embedded Low-Power Laboratory


Clock Gating
 Overall procedure
 Transform Mealy model into locally-Moore model
 Extract the activation functions ƒa from the Moore states
 Compute probability of p (ƒa)
 Determine the primes of ƒa
 Find minimum-literal sub function fa
 While satisfying the pre-determined probability
 Use fa as don’t care set for reducing the FSM combinational
component

Embedded Low-Power Laboratory


Clock Gating
 Impact of the clock gating
 Power reduction of about 30% on standard benchmarks with
random test vectors
 Larger saving on reactive circuits
 Computing time limited by don’t care based optimization
 Major limitation is representing explicitly FSM tables with many
states

Embedded Low-Power Laboratory


Clock Gating
 Partitioned control
 Implement control function as interconnected FSM units
 Each FSM unit controls a portion of the overall control flow
 Determine idle FSM units
 Selectively clock FSM units
 Most of the time only one unit is not idle
 Two units active during transition of control
 Save power in the controller and (possibly) in the data path

Embedded Low-Power Laboratory


Clock Gating
 Partitioned state diagram

Embedded Low-Power Laboratory 33


Clock Gating
 Partitioned control unit

Embedded Low-Power Laboratory 34


Clock Gating
 Partitioning driven by behavioral models
 Analyze the sequencing graph and determine its mutually-
exclusive sections
 Partition control-unit into corresponding blocks
 Disable clock at registers of control-unit blocks which are not
executing
 Decouple datapaths by constraining resource sharing

Embedded Low-Power Laboratory


Body Bias Techniques (Introduction)
 Since the mid-1970s,
 RBB (Reverse Body Biasing) has been widely used in memory
chips to lower the risk of latch-up and memory data destruction
 Since mid-1990,
 It has been applied in logic chips for power reduction
 Lowest acceptable threshold voltage is determined by
 Sub-threshold leakage current
 Die-to-die and within-die threshold voltage variations
 Doping concentration in the channel area
 It can be also varied on the polarity of the voltage difference
between the source and body terminals during the circuit operations
 RBB – increasing subthreshold voltage
 FBB (Forward Body Biasing) – decreasing subthreshold voltage
 Bidirectional body bias circuit

Embedded Low-Power Laboratory 36


RBB (Reverse Body Biasing) Techniques
 Apply a negative voltage across the source-to-substrate p-n
junction

NMOS transistor PMOS transistor

Embedded Low-Power Laboratory 37


RBB Techniques
 Variation of the charge distribution in the depletion region and
inversion layer of the MOSFET
 Positive charge on the gate is balanced by the sum of the
electronic charge in the inversion layer and the negative ionic
charge in the depletion region

Zero body biased NMOS Reverse body biased NMOS


 The gate voltage needs to be increased to achieve the charge
balance → threshold voltage of a MOSFET increases
Embedded Low-Power Laboratory 38
RBB Techniques
 Threshold voltage changes due to the body effect

 VBS: substrate potential


 VTH0: threshold voltage for VBS = 0V
 γ: body effect coefficient
 Φb: substrate Fermi potential
 tox: gate oxide thickness
 εox: dielectric constant of silicon dioxide
 εsi: permittivity of silicon
 NA: doping concentration density of the substrate
 Ni: carrier concentration in intrinsic silicon
 k: Boltzmann’s constant
 q: electronic charge
 T: absolute temperature
Embedded Low-Power Laboratory 39
Impacts of the RBB Techniques
 Effects
 Reducing the sub-threshold leakage current during the standby
and burn-in modes
 Idle portion of an IC to reduce the active leakage power without
degrading speed
 Significant reduction up to 10000 times (1.2VDD, DCT processor, 0.3
um CMOS)
 Side effects
 Increasing the tunneling leakage current
 Low RBB: junction band-to-band tunneling leakage current is
dominated by gate-induced drain leakage (GIDL)
 High RBB (typically above 0.5V): band-to-band tunneling current in
the bulk is dominant component of the junction leakage current

Embedded Low-Power Laboratory 40


Impacts of the RBB Techniques
 There is an optimum reverse body bias voltage (specific to a
process technology)

Variation of total standby power as a


function of reverse body bias voltage

Embedded Low-Power Laboratory 41


Adaptive RBB Circuit Techniques
 Effective in reducing variations (supply voltage, temperature
and die-to-die process parameters)
 Adaptive body bias control scheme
 Dynamically varies the body bias voltage depending upon local
speed and power requirement

Embedded Low-Power Laboratory 42


Adaptive RBB Circuit Techniques
 Reduces die-to-die delay
variations from 45% to 30%
 Provides further opportunity
to scale the threshold
voltage without dissipating
excessive leakage power

Embedded Low-Power Laboratory 43


RBB with Technology Scaling
 Technology scaling may result in losing control of the charge
distribution in the channel area.
 Effectiveness of the RBB technique is reduced due to a
weaker body effect with technology scaling
 RBB alleviates short-channel effects by increasing the width of
the junction depletion regions

Long-channel MOSFET Short-channel MOSFET


Embedded Low-Power Laboratory 44
RBB with Technology Scaling
 Effect of the RBB on short-
channel effects and
threshold voltage roll-off
 Low VT devices are more
sensitive to variations in
the critical dimensions

0.25 um CMOS technology, NBB= No Body Bias

Embedded Low-Power Laboratory 45


FBB (Forward Body Biasing) Techniques
 Apply a positive voltage across the source-to-substrate p-n
junction

NMOS transistor PMOS transistor

Embedded Low-Power Laboratory 46


FBB (Forward Body Biasing) Techniques
 Variation of the charge distribution in the depletion region and
inversion layer of the MOSFET

Zero body biased NMOS Reverse body biased NMOS

 In order to maintain the charge balance, threshold voltage of a


MOSFET decreases

Embedded Low-Power Laboratory 47


Impacts of the FBB Techniques
 Standby mode
 Zero body bias
 High threshold voltage transistor to maintain the standby leakage
current below a target limit
 Active mode
 FBB
 Threshold voltage is reduced by applying to achieve a target
circuit speed
 Maximum FBB voltage is limited to the diode current
 Junction diode currents increase the active leakage power
 Voltage swing at an output node can be degraded due to the
junction diode currents

Embedded Low-Power Laboratory 48


Impacts of the FBB Techniques
 Side effects
 Diode currents oppose the
transition of the voltage state of a
node
→ degrading the effective switching
current and therefore switching
speed
 Increasing source-to-body and
drain-to-body junction capacitances
(CJ1 and CJ2)
→ increasing the active mode
switching power and propagation
delay

Embedded Low-Power Laboratory 49


Impacts of the FBB Techniques

 Variation of the  Variation of the energy-


propagation delay and delay product
energy consumption

0.18 um CMOS technology, 101 stage ring oscillator


Embedded Low-Power Laboratory 50
FBB with Technology Scaling
 FBB technique reduces short-channel effect and drain –
induced barrier lowering effects while enhancing the body
effects.

Effect of forward body bias on short channel effects in NMOS transistor

 FBB techniques are more attractable as compared to the RBB


in future nanometer CMOS technology generations

Embedded Low-Power Laboratory 51


Bidirectional Body Bias (BBB) Techniques
 With technology scaling, the effectiveness of the RBB circuit
technique will not satisfy the speed and power requirement beyond
70 nm technology.
 Due to the weaker body effect, FBB-only solution also fails to satisfy
these performance requirements beyond the 50 nm technology
 Beyond 50 nm technology, bidirectional body bias circuit technique
is desirable
 Transistor can be set to an intermediate value by controlling the
channel doping concentration
 Increase the circuit speed – FBB technique
 Reduce the circuit speed and leakage power – RBB technique
 Produce the wider choice of dynamically adjusted
threshold voltages

Embedded Low-Power Laboratory 52


Effects of BBB in Process Variations

 RBB - die-to-die parameter ↓, within-die parameter (increase short-


channel effects) ↑
 FBB – within die parameter ↓ (reduces short-channel effects)
 62 Microprocessor test circuits (leakage and clock frequency test)
 0.15 um CMOS

 Zero body bias


 50% passed
 LFB > HFB
 BBB technique
 100% passed

LFB = Low Frequency Bin, HFB = Higher Frequency Bin


Embedded Low-Power Laboratory 53
Effects of BBB in Process Variations
 Applying a single set of adaptive BBB is ineffective for
reducing within-die parameter variations
 32% of dies are acceptable in HFB (100% yield)
 Independently generated adaptive body bias voltage is
applied to each circuit zone
 99% acceptable rate in HFB (100% yield)

Embedded Low-Power Laboratory 54


Generalized Multiple VT Problem
 Power minimization problem
 Given:
 A random logic network of N static CMOS gates
 The critical path delay is less than equal to Tmax
 The device technology used
 Activity profiles at each input node
 Determine:
 Supply voltage VDD
 Threshold voltage VT
 Channel width (size) W
 Such that:
 Static leakage and dynamic power are minimized
 The area is within the bound
 Generally subthreshold leakage is minimized

Embedded Low-Power Laboratory 55


Generalized Multiple VT Problem
 Special cases
 VDD takes a single value
 VT can take one of the followings
 A continuous VTCMOS
 Multiple discrete VTCMOS
 Dual VT: high-VT and low-VT - the most popular case
 Limitation of multiple VT
 Each transistor in general can have individual VT (high or low)
 Actual VT assignment must be either stack-based or gate-based,
not individual transistor-based
 Due to closely-spaced transistors in a stack, their channels are
too close to achieve distinct channel doping
 Gate-based approach is more suitable for a standard cell design

Embedded Low-Power Laboratory 56


Generalized Multiple VT Problem
 Problem formulation

 Two dynamic power components

Load capacitance Internal nodes

Embedded Low-Power Laboratory 57


Generalized Multiple VT Problem
 Simultaneous transistor sizing and VT assignment
 Transistors may become unnecessarily fast in the critical paths if
they have low-VT while the size is fixed
 Lower VT causes 8 to 10% increase of the gate capacitance
 Earlier strong inversion makes the earlier channel formation during
the input transition
 Setting a low-VT to transistors w/o transistor sizing can degrade the
performance

Embedded Low-Power Laboratory 58


Dual VT Circuit Optimization
 Transistor is assigned either a high or low VT
 Low-VT transistor
 Reduced delay
 Increased leakage
 Speed critical path: low-VT
 Rest: high-VT

Low-VT: 0.8 V High-VT: 0.8 V Low-VT: 1.2 V High-VT: 1.2 V


Normalized 1 0.05 1 0.049
leakage
Normalized 1 1.36 1 1.30
delay

Embedded Low-Power Laboratory 59


Dual VT Circuit Optimization
 Objective
 Find an implementation between the two extremes of all low VT,
all high VT, trading off leakage power for delay
 Delay constraint must be met

Embedded Low-Power Laboratory 60


Dual VT Circuit Optimization
 Example
 Dual VT assignment approach
 Transistor on critical path: low VT
 Non-critical transistor: high VT

1.2
1.0
0.8
0.6
0.4
0.2
0
All Low Vt Dual Vt

Embedded Low-Power Laboratory 61


Dual VT Circuit Optimization
 VT assignment
 Greedy approach: backward traversal of circuit
 Select high VT gate in critical path
 Set gate to low VT
 Re-compute critical paths

0 1 2 3 4 5 6

Embedded Low-Power Laboratory 62


Dual VT Circuit Optimization
 VT assignment granularity
 Gate based assignment
 Pull up network / Pull down network based assignment
 Single VT in P pull up or N pull down trees
 Stack based assignment
 Single VT in series connected transistors
 Individually assignment within transistor stacks
 Possible area penalty

Embedded Low-Power Laboratory 63


Dual VT Circuit Optimization
 Examples

Gate
based

PU/PD Stack
based based

Embedded Low-Power Laboratory 64


Dual VT Circuit Optimization
 Gate sizing after VT assignment
 Resizing necessary after VT assignment to improve obtained
trade-off
 Transistors changed to low VT become oversized
 Input capacitance increases with low VT assignment

Embedded Low-Power Laboratory 65


Dual VT Circuit Optimization
 Simultaneous VT and sizing approach
 Determine the size (width) and threshold voltage for each
transistor
 Area
 Performance
 Leakage current
 Both the performance of the circuit and its leakage vary non-
linearly with device widths and their VT
 The width domain is continuous while the VT domain is discrete
 Heuristic approach

Embedded Low-Power Laboratory 66


Dual VT Circuit Optimization
 Sensitivity based VT selection
 In each iteration, pick a transistor with the best trade-off
between leakage and delay, weighted by its path slack

Delay change on timing arc α when transistor T is changed to low VT: Δdα(T)

Embedded Low-Power Laboratory 67


Mixed-VT (MVT) CMOS Design
 Mixed-VT (MVT) CMOS design technique
 Transistor-level dual VT design technique
 Transistors within a gate can have different VT
 Use multiple types of transistors within each gate
 MVT1: Same threshold voltage for all transistors in N or P networks
 MVT2: Same threshold voltage only for all transistors of a series stack
 No limitation (possible in some processes)

Embedded Low-Power Laboratory 68


Mixed-VT (MVT) CMOS Design
 MVT CMOS Design Algorithm
 Assume all low VT transistors
 For each transistor of each gate,
 Find the increase in the gate delay if high VT is used (Δtd)
 Find the decrease in the gate leakage if high VT is used (Δleak)

 Calculate:

 Higher value means more leakage can be saved using one unit of
slack
 The transistors are processed based on their priority (i) values
 After modifying each transistor, the slack values have to be
recalculated

Embedded Low-Power Laboratory 69


Mixed-VT (MVT) CMOS Design
 MVT design results

Embedded Low-Power Laboratory 70


IVC (Input Vector Control)
 The idea is based on the transistor stack effect

Least subthreshold leakage Least gate leakage Largest gate leakage

Embedded Low-Power Laboratory 71


IVC (Input Vector Control)
 Subthreshold leakage and gate leakage currents are
dependent on the input vectors

X0
X0 X1

Input vector Leakage Input Leakage (nA)


(X0) (nA) (X0X1)

0 100.3 0 37.84
1 100.30
1 227.2
10 95.17
11 454.50
Cadence spectra simulation, 0.18um technology

Embedded Low-Power Laboratory 72


IVC (Input Vector Control)
 Example of IVC (32-Bit Full Adder)
 Leakage current varies by 30-40%, depending on the input
vector

Distribution of standby leakage current in the 32-bit adder (random input vector)

Embedded Low-Power Laboratory 73


IVC (Input Vector Control)
 Implementation of IVC
 Concept
 IVC During the Sleep Mode
 Providing the minimum leakage vector (MLV) to the target logics during
the sleep (or standby) mode

Primary
input 0 Target
vector
Logic
MLV 1
Sleep

Embedded Low-Power Laboratory 74


IVC (Input Vector Control)
 Implementation of IVC
 Modification of a scan-chain
registers
 Original MLV is stored in left
FFs
 Sleep mode
 Sleep = 1
 Test = 1
 MLB is applied (right FF’s)
 Operational mode
 Sleep= 0
 Test = 0
 Inputs are directly applied to
the target logic

Embedded Low-Power Laboratory 75


IVC (Input Vector Control)
 Main Advantages of the IVC
 Reduction is not as high as the one achieved by the power gating
method
 However, IVC technique does not suffer from the implementation
overheads
 No modification in the process technology
 No change in internal logic gates of the circuit
 No reduction in voltage swing
 Technology scaling does not have a negative effect (even stronger
effect with technology scaling as DIBL worsens)
 From 10% to 55% reduction in the leakage is expected

Embedded Low-Power Laboratory 76


IVC (Input Vector Control)
 Modifying the internal logic gates for further leakage
reduction
 Due to logic dependencies of the internal signals, driving a circuit
with its MLB does not guarantee
 Increase controllability in the standby mode

Replacing an internal signal line


with a two-input AND gate Modifying CMOS gate

Embedded Low-Power Laboratory 77


IVC (Input Vector Control)
 IVC solution methods
 Finding MLV (minimum leakage vector)
 NP-complete
 Finding the MLV
 Random search
 Heuristic Algorithms
 Genetic algorithm, greedy algorithm (leakage observability for each
primary input, Node controllability)
 Formulations of existing problems
 Pseudo-Boolean satisfiability (SAT) problem, Integer linear programming

Embedded Low-Power Laboratory 78


IVC (Input Vector Control)
 Percentage of leakage reduction using an IVC technique
Reduction (%)

Embedded Low-Power Laboratory 79


Combining VT and IVC
 Given a known input state in standby mode, only off
transistors set to high VT
 All other transistors are kept at low VT

1.2

1.0

0.8

0.6

0.4

0.2

0
All Low
All low VT
Vt Dual
DualVT
Vt Dual VT w/input state

Embedded Low-Power Laboratory 80


MTCMOS: sleep transistor insertion
 Basic concept
 Multi-threshold CMOS: sleep transistor insertion
 To use both high-VT and low-VT cells in a logic block
 Based on the observation that a circuit’s overall performance is often
determined by a few critical paths
 Transistors and gates along the critical paths are set to a low-VT
 Transistor size is fixed
 Overall circuit performance can be enhanced significantly
 Leakage is kept within bounds
 Operating frequency of a logic block is limited by the maximum path
delay

Embedded Low-Power Laboratory 81


MTCMOS: Sleep Transistor Insertion
 Multi-threshold CMOS: sleep transistor insertion
 Uses both high- and low-threshold voltage MOSFETs
 Active mode: SL is set to high/sleep mode: SL is set to low
 The “on” resistance of sleep transistors is small
 Some designs only use either header or footer
 Cell-based MTCMOS: area penalty / easy to design
 Block-based MTCMOS: area efficiency / hard to design

Embedded Low-Power Laboratory 82


MTCMOS: Sleep Transistor Insertion
 Sleep transistor
 Also called guarding, power gating, ground gating, using sleep
transistor, etc.
 Sleep transistor is inserted between the VDD and logic, and logic
and GND.

Embedded Low-Power Laboratory 83


MTCMOS: Sleep Transistor Insertion
 One sleep transistor can be used
 Mostly NMOS is used due to the
higher mobility
 However, PMOS usually has less
leakage
 Shared sleep transistor
 Small number of sleep transistors
 Less area overhead, dynamic power
and leakage power

Embedded Low-Power Laboratory 84


MTCMOS: Sleep Transistor Insertion
 Active-mode operation

Embedded Low-Power Laboratory 85


MTCMOS: Sleep Transistor Insertion
 Idle-mode operation

Embedded Low-Power Laboratory 86


MTCMOS: Sleep Transistor Insertion
 Principle of operation
 A low-VT block is gated with high-VT power switches that are
controlled by SLEEP signal
 When SLEEP=1
 The high-VT transistor is turned on
 The low-VT logic gates are connected to virtual GND and power
 The sleep transistor can be realized as a resistor R
 Vx=IR

Embedded Low-Power Laboratory 87


MTCMOS: Sleep Transistor Insertion
 Proper sleep transistor sizing is a key element to efficient
MTCMOS design
 Critical path input vector is highly dependent on the internal
discharge pattern
 Critical path depends on the sleep transistor size
 Exponential complexity
 Many techniques that can complete the computation in reasonable
amount of time

Embedded Low-Power Laboratory 88


MTCMOS: Sleep Transistor Insertion
 Principle of operation
 Voltage drop: Vx=IR
 Reduces the gate driving capability from VDD to VDD-Vx
 Decreases the threshold voltage low-VT pull down devices due to
body effect
 Both effects degrade the speed of the circuit
 Low R is desirable
 Larger sleep transistor is required
 Expense of extra area and power
 DSM circuits use low VDD
 Even larger size sleep transistors are required

Embedded Low-Power Laboratory 89


MTCMOS: Sleep Transistor Insertion
 The worst case
 Low-VT blocks switch at the same time
 I=I1+I2+I3
 The best case
 Low-VT blocks switch exclusively (no time overlap)
 I=max(I1, I2, I3)
 In general
 Partially overlap

Embedded Low-Power Laboratory 90


MTCMOS: Sleep Transistor Insertion
 Average current method
 Estimation of the optimum size of the sleep transistor
 If average current flow thought the sleep transistor and
maximum speed penalty of the MTCMOS block are known, the
minimum size of the sleep transistor can be estimated
 The current consumed in the MTCMOS block is constant, the
voltage drop across the sleep transistor is constant

Embedded Low-Power Laboratory 91


MTCMOS: Sleep Transistor Insertion
 MTCMOS delay calculation
 Switch-level simulation for when N gates shares the same virtual
GND and discharge simultaneously

 Saturation current

 Virtual GND potential

 Gain factor for MOSFET j

Embedded Low-Power Laboratory 92


MTCMOS: Sleep Transistor Insertion
 Drawback of the switch-level simulation
 Modeling the discharge current purely with saturation current is
incorrect
 Impact of parasitic capacitance on the virtual GND is not
considered
 Complicated gates are modeled as a simple inverter
 Internal node glitch is not considered
 Velocity saturation and body effect are not considered

Embedded Low-Power Laboratory 93


MTCMOS: Sleep Transistor Insertion
 Samsung’s sleep transistor insertion
 Use of conventional P&R

Embedded Low-Power Laboratory 94


MTCMOS: Sleep Transistor Insertion
 Example of the sleep transistor placement

Embedded Low-Power Laboratory 95


MTCMOS: Sleep Transistor Insertion
 Design flow

Embedded Low-Power Laboratory 96


MTCMOS: Sleep Transistor Insertion
 Sleep transistor merging
 Individual sleep transistor sizing

 Merging gates based on mutually exclusive discharging

Embedded Low-Power Laboratory 97


MTCMOS: Sleep Transistor Insertion
 Sleep transistor merging
 Merging through parallel combination

 Combining two sub-circuits with very similar virtual GND transient


behaviors, result in unchanged virtual GND characteristics
 In general, equivalent resistance

Embedded Low-Power Laboratory 98


MTCMOS Limitations
 Area overhead
 Sleep transistors
 Performance overhead
 Slow operation when the circuit is active
 Wake up delay
 Process modification for dual VT

Embedded Low-Power Laboratory 99


MTCMOS Limitations
 Electro-migration effect on vias and wires
 The number of vias is determined based on the average and the
maximum allowable currents for each via VDD
 MTCMOS cannot be easily applied to filp flops
 Loss of the FF state when sleep

Embedded Low-Power Laboratory 100


MTCMOS Limitations
 MTCMOS circuit is not easily interfaced with non MTCMOS
circuits
 Short circuit current occurs if an MTCMOS circuit drives non
MTCMOS circuits
 Ground bounce
 During the sleep period, internal nodes in the MTCMOS block are
charged to VDD
 When the sleep transistor is turned on, a current spike flows to
GND due to large VDS
 Ground bounce occurs

Embedded Low-Power Laboratory 101


MTCMOS Limitations
 Impact on virtual GND capacitance
 Wire and junction capacitance actually helps to reduce the
ground bounce acting like bypass capacitors
 However, the capacitance is not large to offset the bad sleep
transistor sizing
 If the capacitance is large enough, it will also take long for the
virtual GND to discharge back to GND when wakeup
 Instead of having large capacitance, reducing the effective
resistance (appropriate sleep transistor sizing) is desirable

Embedded Low-Power Laboratory 102


MTCMOS Limitations
 Reverse conduction paths through virtual GND
 Current from the virtual GND through the low-Vt pulldown device
and charges up the load capacitance
 The output voltage rises up to Vx
 The circuit becomes faster
 However, noise margin becomes worse
 Proper sleep transistor sizing is required for adequate noise margin

Embedded Low-Power Laboratory 103


Zig Zag Sleep Transistors
 Reduce the wake up overhead

Embedded Low-Power Laboratory 104


Sleepy Stack Sleep Transistor
 Combination of the stack and sleep transistor
 State preserving

Embedded Low-Power Laboratory 105


Sleepy Stack Sleep Transistor
 Active and idle mode operations

Embedded Low-Power Laboratory 106


MTCMOS Sequential Circuits
 MTCMOS latch
 Use of always on gates

Embedded Low-Power Laboratory 107


MTCMOS Sequential Circuits
 No leakage MTCMOS FF

Embedded Low-Power Laboratory 108

You might also like