Digital Circuit-Level Low-Power Techniques

Circuit-Level Low-Power
Techniques
Naehyuck Chang
Dept. of EECS/CSE
Seoul National University
naehyuck@snu.ac.kr
1
Contents
 Traditional low-power circuits
 Power-aware circuit design techniques
 Leakage-aware circuit design techniques
Embedded Low-Power Laboratory 2

Traditional Low-Power Circuit Design
 Circuit design styles
 Non-clocked
 Fully complementary logic, pass transistor, etc.
 Slower, but consumes less power
 Clocked
 Domino
 Faster, but consumes more power
 Trade-off between performance and power
 Faster logic consumes more power
Embedded Low-Power Laboratory

Non-clocked: Fully Complementary Logic
 AKA. CMOS
 Active mode
 Switching/short-circuit current
 Glitches or spurious transitions due to different delays through
different paths of the circuit
 Stand-by mode
 Leakage current
 High noise margin
 Can reduce the threshold voltage
 Performance degrading factor
 Large PMOS
 Large input capacitance
 Weak output driving

Non-clocked: NMOS and Pseudo-NMOS
 AKA. Ratioed logic
 Pull-up resistance is higher than pull-down resistance
 Smaller area than CMOS
 Good for large fan-in gates
 Higher power dissipation than CMOS due to the static current
 DC current
NMOS Pseudo-NMOS
(depletion-mode)

Non-clocked: DCVS
 Differential cascade voltage switch
 A differential output signal is available
 Eliminates the static power in the ratioed logic
 f(network 1) = ~f(network 2)
 Larger switched capacitance
 Higher switching power
 Can be reduced by the sharing between two networks

Non-clocked: Pass Transistor Logic (PTL)
 AND: connected in series
 OR: connected in parallel
 NMOS: good to transmit “0”, but not for “1”
 CPL: Complementary Pass-transistor Logic
 Different input/output signals
 Power-delay product is 10% better than CMOS
Level restorer +
PTL: AND CPL: NAND/AND CPL: XOR/XNOR

Clocked: Domino
 Clock = 0: output is precharged
 Clock = 1: evaluated (conditionally discharged)
 Only implements non-inverting logic gates
 Good for large fan-in gates
 Clock switching: high power
Keeper
Domino NAND

Power-Aware Circuit Design Techniques
 Switched capacitance
 Linearly proportional to CMOS dynamic power consumption

 Logic-level power optimization
 Try to minimize the total switched capacitance of the target logic
 Assign the job of low switching activity to the node with large load
capacitance
 Remove unnecessary switching such as glitches

Power-Aware Circuit Design Techniques
 Combinational logic optimization
 Block the generation of unnecessary spurious transitions
 Reconstruct the logic to reduce the switched capacitance while
maintaining I/O semantics of target logic
(ex) Path balancing, logic factorization, technology mapping, “Don’t-
care” optimization, etc.
 Sequential logic optimization
 Reduce the switching in registers or invoked by their values
 Block the propagation of unnecessary async. signals
(ex) State encoding, retiming, clock gating, precomputation, etc.
 Dynamic power optimization techniques
 Pass balancing, “Don’t care” optimization, logic factorization,
technology mapping, state encoding, retiming, clock gating, etc.

Path Balancing
 Equalize the delay of input paths of each gate to reduce the
possibility of spurious transitions
 Spurious transitions are reported to amount to 10~40% of all
switching activities
 Balancing the paths
 Increase the possibility of simultaneous transition at the input
 Decrease the possibility of hazards at the output

Path Balancing
 Balance the paths by restructuring the logic circuit
 Balance the paths by inserting unit-delay buffers

“Don’t-care” Optimization
 Traditionally have been used for area minimization
 Include appropriate “don’t-care” sets in either the ON set or the
OFF set
 Exploit the “don’t-care” set so as to decrease the output
transition probability
 Include the “don’t-care” set in the ON set if Pone(F) > 0.5
 Include the “don’t-care” set in the OFF set if Pone(F) < 0.5
* Transition probability of CMOS: Ptransition(F) = 2 Pone(F) (1- Pone(F))

 Maximized when Pone(F) = 0.5 (Pone(F): the probability of F being 1)
Ptransition(F)
CMOS
gate
F

Logic Factorization
 Have been commonly used for area optimization

 Reduce literal count to minimize the number of transistors being
used to represent the target logic
 Perform the factorization to reduce the switched capacitance

 The smaller literal count does not guarantee the smaller switched
capacitance unlike the case of area optimization
 Should consider both the transition probability at the input and
the load capacitance

Logic Factorization
 Should select the circuit (a) for area optimization

 Should select the circuit (b) for power optimization
(a)
(b)

Technology Mapping
 The process of binding a set of logic equations to the gates in
target cell library
 Have been originally developed to optimize area and delay
 Hide nodes with high switching activity inside the gates
 Generally, internal capacitances in gates are much smaller than
external load capacitances
 Select the library with same function but different
capacitances while meeting the delay constraints
 Most technology libraries include the same logic element with
different sizes

Technology Mapping
 Should select the circuit (a) for area optimization

 Should select the circuit (b) for power optimization
(a)
Gate Area Intrinsic Input load

cap. cap.
INV 928 0.1029 0.0514

NAND2 1392 0.1421 0.0747
(b)
AOI22 2320 0.3410 0.1033

State Encoding
 The process of assigning a unique binary code to each state

in a FSM (Finite State Machine)
 Have been studied well for area minimization
 Assign codes with smaller Hamming distance to states with
larger state transition probability when focusing on low power
 Minimize the following cost function

State Encoding
 Gray coding example
State Gray Binary
S1 0 0
S2 1 1
S1
S8 S2 S3 11 10
S4 10 11
S7 S3 S5 110 100
S6 111 101
S6 S4 S7 101 110
S5 S8 100 111
Total # of transitions 8 14
Max. transitions / cycle 1 3
 Need to consider not only the switching activity in the state registers
but also in the combinational logic affected by assigned codewords
for further optimization
Retiming
 The process of repositioning registers (FFs) in a pipelined
circuit (while maintaining I/O functionality)
 First proposed to minimize the number of registers or the delay
of the critical path (the longest pipeline stage)
 Pipeline the circuit by adding a register
 Block the glitch propagation to the large load cap. (CL)
 Generally, input load cap. of registers are much smaller than CL

Retiming
 Move registers to nodes with higher switching activity
 Maintain I/O functionality and (sequential) timing
 May change the switching activity at one or more nodes
 Choose the circuit with less switched capacitance

Retiming
 Add a register with different clock phase
 Maintain I/O functionality and (sequential) timing while placing
more registers in the pipeline
 Replace an existing register with multiple non-overlapping level-
clocked latches or registers synchronous to different phase clocks
and reposition them over the circuit

Clock Gating
 Provide a way to selectively stop the clock
 Force the circuit to make no switching whenever the computation
at the next cycle is unnecessary
 Should be implemented as follows
 Construct an idleness-detecting circuit which is small (i.e., consume
little power) and accurate (i.e., able to stop the clock whenever idle)
 Design gated-clock distribution circuit with minimum routing
overhead
 Keeps clock skew under tight control

Clock Gating
 Gate the clock of infrequently accessed blocks for large
circuits
 For example, gate the clock of register files, which need not be
updated every cycle in general, for processors
 Synthesize FSMs with gated clock
 Add a new activation signal to selectively stop the local clock for
the FSM when no state transition occurs
 Used by high-level power management such as DPM
 Have a smaller turn-on/off performance overhead than supply
shutdown
 However, still consume static power in the target circuit and dynamic
power in the clock circuit

Clock Gating
 Applicable to sequential circuits
 Modeled as FSMs or sequential networks
 Exploit internal idleness:
 Find conditions under which state and outputs do not change
 Disable clock (by gating) under these conditions
 Save power
 On the local clock line
 In the registers
 In the combinational logic

Clock Gating
 One activation function and a latch
 fa: activation function

Clock Gating
 FSM conversion
 Transform Mealy FSM to Moore FSM

Clock Gating
 Synthesis and optimization
 Determine disabling conditions:
 Analyze state transitions and frequencies
 Split states to isolate Moore states
 Activation function:
 Idleness observation block
 Defined by Moore-states self-loops
 Implement activation function:
 Trade off power of disabling circuit with power
 Saved by stopping clock
 Implement activation sub function

Clock Gating
 Activation function design
 Given an activation function:
 Find a sub-function:
 Whose literal cost is minimum and such that its probability to be true is a
predefined fraction of that of the activation function
 Exact and heuristic solution:
 Based on prime implicant generation and on constrained covering

Clock Gating
 Overall procedure
 Transform Mealy model into locally-Moore model
 Extract the activation functions ƒa from the Moore states
 Compute probability of p (ƒa)
 Determine the primes of ƒa
 Find minimum-literal sub function fa
 While satisfying the pre-determined probability
 Use fa as don’t care set for reducing the FSM combinational
component

Clock Gating
 Impact of the clock gating
 Power reduction of about 30% on standard benchmarks with
random test vectors
 Larger saving on reactive circuits
 Computing time limited by don’t care based optimization
 Major limitation is representing explicitly FSM tables with many
states

Clock Gating
 Partitioned control
 Implement control function as interconnected FSM units
 Each FSM unit controls a portion of the overall control flow
 Determine idle FSM units
 Selectively clock FSM units
 Most of the time only one unit is not idle
 Two units active during transition of control
 Save power in the controller and (possibly) in the data path

Clock Gating
 Partitioned state diagram

Clock Gating
 Partitioned control unit

Clock Gating
 Partitioning driven by behavioral models
 Analyze the sequencing graph and determine its mutually-
exclusive sections
 Partition control-unit into corresponding blocks
 Disable clock at registers of control-unit blocks which are not
executing
 Decouple datapaths by constraining resource sharing

Body Bias Techniques (Introduction)
 Since the mid-1970s,
 RBB (Reverse Body Biasing) has been widely used in memory
chips to lower the risk of latch-up and memory data destruction
 Since mid-1990,
 It has been applied in logic chips for power reduction
 Lowest acceptable threshold voltage is determined by
 Sub-threshold leakage current
 Die-to-die and within-die threshold voltage variations
 Doping concentration in the channel area
 It can be also varied on the polarity of the voltage difference
between the source and body terminals during the circuit operations
 RBB – increasing subthreshold voltage
 FBB (Forward Body Biasing) – decreasing subthreshold voltage
 Bidirectional body bias circuit

RBB (Reverse Body Biasing) Techniques
 Apply a negative voltage across the source-to-substrate p-n
junction
NMOS transistor PMOS transistor

RBB Techniques
 Variation of the charge distribution in the depletion region and
inversion layer of the MOSFET
 Positive charge on the gate is balanced by the sum of the
electronic charge in the inversion layer and the negative ionic
charge in the depletion region
Zero body biased NMOS Reverse body biased NMOS

 The gate voltage needs to be increased to achieve the charge
balance → threshold voltage of a MOSFET increases
RBB Techniques
 Threshold voltage changes due to the body effect

 VBS: substrate potential

 VTH0: threshold voltage for VBS = 0V
 γ: body effect coefficient
 Φb: substrate Fermi potential
 tox: gate oxide thickness
 εox: dielectric constant of silicon dioxide
 εsi: permittivity of silicon
 NA: doping concentration density of the substrate
 Ni: carrier concentration in intrinsic silicon
 k: Boltzmann’s constant
 q: electronic charge
 T: absolute temperature
Impacts of the RBB Techniques
 Effects
 Reducing the sub-threshold leakage current during the standby
and burn-in modes
 Idle portion of an IC to reduce the active leakage power without
degrading speed
 Significant reduction up to 10000 times (1.2VDD, DCT processor, 0.3
um CMOS)
 Side effects
 Increasing the tunneling leakage current
 Low RBB: junction band-to-band tunneling leakage current is
dominated by gate-induced drain leakage (GIDL)
 High RBB (typically above 0.5V): band-to-band tunneling current in
the bulk is dominant component of the junction leakage current

Impacts of the RBB Techniques
 There is an optimum reverse body bias voltage (specific to a
process technology)
Variation of total standby power as a

function of reverse body bias voltage

Adaptive RBB Circuit Techniques
 Effective in reducing variations (supply voltage, temperature
and die-to-die process parameters)
 Adaptive body bias control scheme
 Dynamically varies the body bias voltage depending upon local
speed and power requirement

Adaptive RBB Circuit Techniques
 Reduces die-to-die delay
variations from 45% to 30%
 Provides further opportunity
to scale the threshold
voltage without dissipating
excessive leakage power

RBB with Technology Scaling
 Technology scaling may result in losing control of the charge
distribution in the channel area.
 Effectiveness of the RBB technique is reduced due to a
weaker body effect with technology scaling
 RBB alleviates short-channel effects by increasing the width of
the junction depletion regions
Long-channel MOSFET Short-channel MOSFET

RBB with Technology Scaling
 Effect of the RBB on short-
channel effects and
threshold voltage roll-off
 Low VT devices are more
sensitive to variations in
the critical dimensions
0.25 um CMOS technology, NBB= No Body Bias

FBB (Forward Body Biasing) Techniques
 Apply a positive voltage across the source-to-substrate p-n
junction
NMOS transistor PMOS transistor

FBB (Forward Body Biasing) Techniques
 Variation of the charge distribution in the depletion region and
inversion layer of the MOSFET
Zero body biased NMOS Reverse body biased NMOS
 In order to maintain the charge balance, threshold voltage of a

MOSFET decreases

Impacts of the FBB Techniques
 Standby mode
 Zero body bias
 High threshold voltage transistor to maintain the standby leakage
current below a target limit
 Active mode
 FBB
 Threshold voltage is reduced by applying to achieve a target
circuit speed
 Maximum FBB voltage is limited to the diode current
 Junction diode currents increase the active leakage power
 Voltage swing at an output node can be degraded due to the
junction diode currents

 Side effects
 Diode currents oppose the
transition of the voltage state of a
node
→ degrading the effective switching
current and therefore switching
speed
 Increasing source-to-body and
drain-to-body junction capacitances
(CJ1 and CJ2)
→ increasing the active mode
switching power and propagation
delay

 Variation of the  Variation of the energy-

propagation delay and delay product
energy consumption
0.18 um CMOS technology, 101 stage ring oscillator

FBB with Technology Scaling
 FBB technique reduces short-channel effect and drain –
induced barrier lowering effects while enhancing the body
effects.
Effect of forward body bias on short channel effects in NMOS transistor
 FBB techniques are more attractable as compared to the RBB

in future nanometer CMOS technology generations

Bidirectional Body Bias (BBB) Techniques
 With technology scaling, the effectiveness of the RBB circuit
technique will not satisfy the speed and power requirement beyond
70 nm technology.
 Due to the weaker body effect, FBB-only solution also fails to satisfy
these performance requirements beyond the 50 nm technology
 Beyond 50 nm technology, bidirectional body bias circuit technique
is desirable
 Transistor can be set to an intermediate value by controlling the
channel doping concentration
 Increase the circuit speed – FBB technique
 Reduce the circuit speed and leakage power – RBB technique
 Produce the wider choice of dynamically adjusted
threshold voltages

Effects of BBB in Process Variations
 RBB - die-to-die parameter ↓, within-die parameter (increase short-

channel effects) ↑
 FBB – within die parameter ↓ (reduces short-channel effects)
 62 Microprocessor test circuits (leakage and clock frequency test)
 0.15 um CMOS
 Zero body bias

 50% passed
 LFB > HFB
 BBB technique
 100% passed
LFB = Low Frequency Bin, HFB = Higher Frequency Bin

Effects of BBB in Process Variations
 Applying a single set of adaptive BBB is ineffective for
reducing within-die parameter variations
 32% of dies are acceptable in HFB (100% yield)
 Independently generated adaptive body bias voltage is
applied to each circuit zone
 99% acceptable rate in HFB (100% yield)

Generalized Multiple VT Problem
 Power minimization problem
 Given:
 A random logic network of N static CMOS gates
 The critical path delay is less than equal to Tmax
 The device technology used
 Activity profiles at each input node
 Determine:
 Supply voltage VDD
 Threshold voltage VT
 Channel width (size) W
 Such that:
 Static leakage and dynamic power are minimized
 The area is within the bound
 Generally subthreshold leakage is minimized

 Special cases
 VDD takes a single value
 VT can take one of the followings
 A continuous VTCMOS
 Multiple discrete VTCMOS
 Dual VT: high-VT and low-VT - the most popular case
 Limitation of multiple VT
 Each transistor in general can have individual VT (high or low)
 Actual VT assignment must be either stack-based or gate-based,
not individual transistor-based
 Due to closely-spaced transistors in a stack, their channels are
too close to achieve distinct channel doping
 Gate-based approach is more suitable for a standard cell design

 Problem formulation
 Two dynamic power components
Load capacitance Internal nodes

 Simultaneous transistor sizing and VT assignment
 Transistors may become unnecessarily fast in the critical paths if
they have low-VT while the size is fixed
 Lower VT causes 8 to 10% increase of the gate capacitance
 Earlier strong inversion makes the earlier channel formation during
the input transition
 Setting a low-VT to transistors w/o transistor sizing can degrade the
performance

Dual VT Circuit Optimization
 Transistor is assigned either a high or low VT
 Low-VT transistor
 Reduced delay
 Increased leakage
 Speed critical path: low-VT
 Rest: high-VT
Low-VT: 0.8 V High-VT: 0.8 V Low-VT: 1.2 V High-VT: 1.2 V

Normalized 1 0.05 1 0.049
leakage
Normalized 1 1.36 1 1.30
delay

 Objective
 Find an implementation between the two extremes of all low VT,
all high VT, trading off leakage power for delay
 Delay constraint must be met

 Example
 Dual VT assignment approach
 Transistor on critical path: low VT
 Non-critical transistor: high VT
1.2
1.0
0.8
0.6
0.4
0.2
0
All Low Vt Dual Vt

 VT assignment
 Greedy approach: backward traversal of circuit
 Select high VT gate in critical path
 Set gate to low VT
 Re-compute critical paths
0 1 2 3 4 5 6

 VT assignment granularity
 Gate based assignment
 Pull up network / Pull down network based assignment
 Single VT in P pull up or N pull down trees
 Stack based assignment
 Single VT in series connected transistors
 Individually assignment within transistor stacks
 Possible area penalty

 Examples
Gate
based
PU/PD Stack
based based

 Gate sizing after VT assignment
 Resizing necessary after VT assignment to improve obtained
trade-off
 Transistors changed to low VT become oversized
 Input capacitance increases with low VT assignment

 Simultaneous VT and sizing approach
 Determine the size (width) and threshold voltage for each
transistor
 Area
 Performance
 Leakage current
 Both the performance of the circuit and its leakage vary non-
linearly with device widths and their VT
 The width domain is continuous while the VT domain is discrete
 Heuristic approach

 Sensitivity based VT selection
 In each iteration, pick a transistor with the best trade-off
between leakage and delay, weighted by its path slack
Delay change on timing arc α when transistor T is changed to low VT: Δdα(T)

Mixed-VT (MVT) CMOS Design
 Mixed-VT (MVT) CMOS design technique
 Transistor-level dual VT design technique
 Transistors within a gate can have different VT
 Use multiple types of transistors within each gate
 MVT1: Same threshold voltage for all transistors in N or P networks
 MVT2: Same threshold voltage only for all transistors of a series stack
 No limitation (possible in some processes)

 MVT CMOS Design Algorithm
 Assume all low VT transistors
 For each transistor of each gate,
 Find the increase in the gate delay if high VT is used (Δtd)
 Find the decrease in the gate leakage if high VT is used (Δleak)
 Calculate:
 Higher value means more leakage can be saved using one unit of
slack
 The transistors are processed based on their priority (i) values
 After modifying each transistor, the slack values have to be
recalculated

 MVT design results

IVC (Input Vector Control)
 The idea is based on the transistor stack effect
Least subthreshold leakage Least gate leakage Largest gate leakage

 Subthreshold leakage and gate leakage currents are
dependent on the input vectors
X0
X0 X1
Input vector Leakage Input Leakage (nA)

(X0) (nA) (X0X1)
0 100.3 0 37.84
1 100.30
1 227.2
10 95.17
11 454.50
Cadence spectra simulation, 0.18um technology

 Example of IVC (32-Bit Full Adder)
 Leakage current varies by 30-40%, depending on the input
vector
Distribution of standby leakage current in the 32-bit adder (random input vector)

 Implementation of IVC
 Concept
 IVC During the Sleep Mode
 Providing the minimum leakage vector (MLV) to the target logics during
the sleep (or standby) mode
Primary
input 0 Target
vector
Logic
MLV 1
Sleep

 Implementation of IVC
 Modification of a scan-chain
registers
 Original MLV is stored in left
FFs
 Sleep mode
 Sleep = 1
 Test = 1
 MLB is applied (right FF’s)
 Operational mode
 Sleep= 0
 Test = 0
 Inputs are directly applied to
the target logic

 Main Advantages of the IVC
 Reduction is not as high as the one achieved by the power gating
method
 However, IVC technique does not suffer from the implementation
overheads
 No modification in the process technology
 No change in internal logic gates of the circuit
 No reduction in voltage swing
 Technology scaling does not have a negative effect (even stronger
effect with technology scaling as DIBL worsens)
 From 10% to 55% reduction in the leakage is expected

 Modifying the internal logic gates for further leakage
reduction
 Due to logic dependencies of the internal signals, driving a circuit
with its MLB does not guarantee
 Increase controllability in the standby mode
Replacing an internal signal line

with a two-input AND gate Modifying CMOS gate

 IVC solution methods
 Finding MLV (minimum leakage vector)
 NP-complete
 Finding the MLV
 Random search
 Heuristic Algorithms
 Genetic algorithm, greedy algorithm (leakage observability for each
primary input, Node controllability)
 Formulations of existing problems
 Pseudo-Boolean satisfiability (SAT) problem, Integer linear programming

 Percentage of leakage reduction using an IVC technique
Reduction (%)

Combining VT and IVC
 Given a known input state in standby mode, only off
transistors set to high VT
 All other transistors are kept at low VT
1.2
1.0
0.8
0.6
0.4
0.2
0
All Low
All low VT
Vt Dual
DualVT
Vt Dual VT w/input state

MTCMOS: sleep transistor insertion
 Basic concept
 Multi-threshold CMOS: sleep transistor insertion
 To use both high-VT and low-VT cells in a logic block
 Based on the observation that a circuit’s overall performance is often
determined by a few critical paths
 Transistors and gates along the critical paths are set to a low-VT
 Transistor size is fixed
 Overall circuit performance can be enhanced significantly
 Leakage is kept within bounds
 Operating frequency of a logic block is limited by the maximum path
delay

MTCMOS: Sleep Transistor Insertion
 Multi-threshold CMOS: sleep transistor insertion
 Uses both high- and low-threshold voltage MOSFETs
 Active mode: SL is set to high/sleep mode: SL is set to low
 The “on” resistance of sleep transistors is small
 Some designs only use either header or footer
 Cell-based MTCMOS: area penalty / easy to design
 Block-based MTCMOS: area efficiency / hard to design

 Sleep transistor
 Also called guarding, power gating, ground gating, using sleep
transistor, etc.
 Sleep transistor is inserted between the VDD and logic, and logic
and GND.

 One sleep transistor can be used
 Mostly NMOS is used due to the
higher mobility
 However, PMOS usually has less
leakage
 Shared sleep transistor
 Small number of sleep transistors
 Less area overhead, dynamic power
and leakage power

 Active-mode operation

 Idle-mode operation

 Principle of operation
 A low-VT block is gated with high-VT power switches that are
controlled by SLEEP signal
 When SLEEP=1
 The high-VT transistor is turned on
 The low-VT logic gates are connected to virtual GND and power
 The sleep transistor can be realized as a resistor R
 Vx=IR

 Proper sleep transistor sizing is a key element to efficient
MTCMOS design
 Critical path input vector is highly dependent on the internal
discharge pattern
 Critical path depends on the sleep transistor size
 Exponential complexity
 Many techniques that can complete the computation in reasonable
amount of time

 Principle of operation
 Voltage drop: Vx=IR
 Reduces the gate driving capability from VDD to VDD-Vx
 Decreases the threshold voltage low-VT pull down devices due to
body effect
 Both effects degrade the speed of the circuit
 Low R is desirable
 Larger sleep transistor is required
 Expense of extra area and power
 DSM circuits use low VDD
 Even larger size sleep transistors are required

 The worst case
 Low-VT blocks switch at the same time
 I=I1+I2+I3
 The best case
 Low-VT blocks switch exclusively (no time overlap)
 I=max(I1, I2, I3)
 In general
 Partially overlap

 Average current method
 Estimation of the optimum size of the sleep transistor
 If average current flow thought the sleep transistor and
maximum speed penalty of the MTCMOS block are known, the
minimum size of the sleep transistor can be estimated
 The current consumed in the MTCMOS block is constant, the
voltage drop across the sleep transistor is constant

 MTCMOS delay calculation
 Switch-level simulation for when N gates shares the same virtual
GND and discharge simultaneously
 Saturation current
 Virtual GND potential
 Gain factor for MOSFET j

 Drawback of the switch-level simulation
 Modeling the discharge current purely with saturation current is
incorrect
 Impact of parasitic capacitance on the virtual GND is not
considered
 Complicated gates are modeled as a simple inverter
 Internal node glitch is not considered
 Velocity saturation and body effect are not considered

 Samsung’s sleep transistor insertion
 Use of conventional P&R

 Example of the sleep transistor placement

 Design flow

 Sleep transistor merging
 Individual sleep transistor sizing
 Merging gates based on mutually exclusive discharging

 Sleep transistor merging
 Merging through parallel combination
 Combining two sub-circuits with very similar virtual GND transient

behaviors, result in unchanged virtual GND characteristics
 In general, equivalent resistance

MTCMOS Limitations
 Area overhead
 Sleep transistors
 Performance overhead
 Slow operation when the circuit is active
 Wake up delay
 Process modification for dual VT

MTCMOS Limitations
 Electro-migration effect on vias and wires
 The number of vias is determined based on the average and the
maximum allowable currents for each via VDD
 MTCMOS cannot be easily applied to filp flops
 Loss of the FF state when sleep

MTCMOS Limitations
 MTCMOS circuit is not easily interfaced with non MTCMOS
circuits
 Short circuit current occurs if an MTCMOS circuit drives non
MTCMOS circuits
 Ground bounce
 During the sleep period, internal nodes in the MTCMOS block are
charged to VDD
 When the sleep transistor is turned on, a current spike flows to
GND due to large VDS
 Ground bounce occurs

MTCMOS Limitations
 Impact on virtual GND capacitance
 Wire and junction capacitance actually helps to reduce the
ground bounce acting like bypass capacitors
 However, the capacitance is not large to offset the bad sleep
transistor sizing
 If the capacitance is large enough, it will also take long for the
virtual GND to discharge back to GND when wakeup
 Instead of having large capacitance, reducing the effective
resistance (appropriate sleep transistor sizing) is desirable

MTCMOS Limitations
 Reverse conduction paths through virtual GND
 Current from the virtual GND through the low-Vt pulldown device
and charges up the load capacitance
 The output voltage rises up to Vx
 The circuit becomes faster
 However, noise margin becomes worse
 Proper sleep transistor sizing is required for adequate noise margin

Zig Zag Sleep Transistors
 Reduce the wake up overhead

Sleepy Stack Sleep Transistor
 Combination of the stack and sleep transistor
 State preserving

Sleepy Stack Sleep Transistor
 Active and idle mode operations

MTCMOS Sequential Circuits
 MTCMOS latch
 Use of always on gates

MTCMOS Sequential Circuits
 No leakage MTCMOS FF

Digital Circuit-Level Low-Power Techniques

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Digital Circuit-Level Low-Power Techniques

Uploaded by

Copyright:

Available Formats

Circuit-Level Low-Power

Embedded Low-Power Laboratory 2

Embedded Low-Power Laboratory

Embedded Low-Power Laboratory

Embedded Low-Power Laboratory

Embedded Low-Power Laboratory

PTL: AND CPL: NAND/AND CPL: XOR/XNOR

Embedded Low-Power Laboratory

 Linearly proportional to CMOS dynamic power consumption

Embedded Low-Power Laboratory 9

Embedded Low-Power Laboratory 10

Embedded Low-Power Laboratory 11

 Balance the paths by inserting unit-delay buffers

Embedded Low-Power Laboratory 12

* Transition probability of CMOS: Ptransition(F) = 2 Pone(F) (1- Pone(F))

Embedded Low-Power Laboratory 13

 Have been commonly used for area optimization

 Perform the factorization to reduce the switched capacitance

Embedded Low-Power Laboratory 14

 Should select the circuit (a) for area optimization

Embedded Low-Power Laboratory 15

Embedded Low-Power Laboratory 16

 Should select the circuit (a) for area optimization

Gate Area Intrinsic Input load

INV 928 0.1029 0.0514

Embedded Low-Power Laboratory 17

 The process of assigning a unique binary code to each state

Embedded Low-Power Laboratory 18

Embedded Low-Power Laboratory 20

Embedded Low-Power Laboratory 21

Embedded Low-Power Laboratory 22

Embedded Low-Power Laboratory 23

Embedded Low-Power Laboratory 24

Embedded Low-Power Laboratory

Embedded Low-Power Laboratory

Embedded Low-Power Laboratory

Embedded Low-Power Laboratory

Embedded Low-Power Laboratory

Embedded Low-Power Laboratory

Embedded Low-Power Laboratory

Embedded Low-Power Laboratory

Embedded Low-Power Laboratory 33

Embedded Low-Power Laboratory 34

Embedded Low-Power Laboratory

Embedded Low-Power Laboratory 36

NMOS transistor PMOS transistor

Embedded Low-Power Laboratory 37

Zero body biased NMOS Reverse body biased NMOS

 VBS: substrate potential

Embedded Low-Power Laboratory 40

Variation of total standby power as a

Embedded Low-Power Laboratory 41

Embedded Low-Power Laboratory 42

Embedded Low-Power Laboratory 43

Long-channel MOSFET Short-channel MOSFET

0.25 um CMOS technology, NBB= No Body Bias

Embedded Low-Power Laboratory 45

NMOS transistor PMOS transistor

Embedded Low-Power Laboratory 46

Zero body biased NMOS Reverse body biased NMOS

 In order to maintain the charge balance, threshold voltage of a

Embedded Low-Power Laboratory 47

Embedded Low-Power Laboratory 48