You are on page 1of 102

Synthesis for Low Power:

A New VLSI Design Paradigm


for DSM

Ajit Pal

Professor
Department of Computer Science and
Engineering
Indian Institute of Technology Kharagpur
INDIA -721302
Outline
• Why Low Power?
• Sources of power dissipation
– Dynamic Power
– Static Power
• Degrees of Freedom
• Low Power Techniques
– Supply Voltage Scaling Techniques
– Minimizing switching capacitances
– Leakage Power reduction Techniques
Ajit Pal IIT Kharagpur
Why Low-power?
• Until recently performance has been
synonymous with circuit speed or
processing power, e.g. MIPS or MFLOPS.
• Implementation involved Area-Time
tradeoff. Power Consumption = k.A.f,
where k= 0.063 W/cm2.MHz, A is the area
in cm2 and f is the frequency in MHz.
• Power consumption were of secondary
concern.

Ajit Pal IIT Kharagpur


Why Low-power?
¾ Contemporary high performance
processors consume heavy power
¾ Cost associated with packaging and
cooling such devices is prohibitive
¾ Low-power methodology to be used to
reduce cost of packaging and cooling
Clock Technology Vdd Peak Power
Processor (MHz) (mm) (Volt) (Watt)
Ultra Sparc 167 0.45 3.3 30
Intel Pentium 200 0.50 3.3 26
Alpha 21064 200 0.50 3.3 30
Alpha 21164 300 0.45 3.3 50
Alpha 21264 667 0.35 2.0 72
Alpha 21364 1000 0.25 1.5 100
Processor Power

Ajit Pal IIT Kharagpur


Why Low-power?

Ajit Pal IIT Kharagpur


Why Low-power?
¾ Emergence of portable computing and
communication equipment, such as laptops,
palmtops, cell-phones, etc. Growth rate of
these portable equipment are very high.
¾ As these devices are battery operated,
battery life is of primary concern.
Unfortunately, the battery technology has not
kept up with the energy requirement of the
portable equipment.
¾ Commercial success of these products
depend on weight, cost and battery life.
¾ Low power design methodology is very
important to make them commercially viable.
Ajit Pal IIT Kharagpur
Why Low-power?
¾ Reliability is closely related to power
dissipation – Every 10ºC rise in temperature
roughly doubles the failure rate
Thermal runway
Gate dielectric
Junction diffusion
Electromigration diffusion
Electrical parameter shift
Package related failure
Silicon interconnect fatigue
0 100 200 300
o
C above normal operating temperature

Onset temperatures of various failure mechanism


Ajit Pal IIT Kharagpur
Why Low-power?
¾ According to an estimate of the U.S.
Environmental Protection Agency (EPA),
80% of the power consumption by office
equipment are due to computing equipment
and a large part from unused equipment
¾ Power is dissipated mostly in the form of
heat. The cooling techniques, such as AC
transfer the heat to the environment.
¾ To reduce adverse effect on environment
efforts such as EPA’s Energy Star program
leading to power management standard for
desktops and laptops has emerged.
¾ Drive towards Green PC
Ajit Pal IIT Kharagpur
Sources of Power Dissipation

¾ CMOS has emerged as the technology of


choice for low power applications and is likely
to remain so in the near future.
¾ The sources of power dissipation in CMOS
circuits
¾Dynamic Power
¾Static

Ajit Pal IIT Kharagpur


Dynamic Power Dissipation
¾Dynamic Power
ƒ Switching power
ƒ Short-circuit power
ƒ Glitching power
¾Static Power
ƒ Due to reverse-biased junction diode
currents when the transistors are off
ƒ Due to sub-threshold leakage current

Ajit Pal IIT Kharagpur


Dynamic Power Dissipations
¾ Switching Power
ƒ Due to the charging and discharging of load
and parasitic capacitors

Pdynamic = α L ⋅ C L ⋅ VDD
2
⋅ f + ∑α i ⋅ Ci ⋅ VDD ⋅ (VDD − VT )
i

Vdd Vdd Vdd

Charging
Pull-up current Pull-up
Pull-up
network network
network
ON OFF

OUT IN IN
IN

Pull-down Pull-down
Pull-down
CL network network CL
network CL
OFF ON

Discharging
current
Ajit Pal IIT Kharagpur
Switching Power
¾During transition of the output from 0 to Vdd, the
energy drawn from the power supply is given by
Vdd
dV 0
E0→1 = ∫ p(t )dt = ∫V .i(t )dt
0
dd i(t ) = C L
dt
Vdd
Substituting this we get E0→1 = Vdd ∫ CL dV0 = CLVdd2
If a square wave of 0
repetition frequency f (I/T)
1
is applied at the input then Pd = .C LV dd2 = C LV dd2 f
the power dissipated per T
unit time is given by
Ajit Pal IIT Kharagpur
Contd…
Dynamic Power Dissipation
¾ Short Circuit Power
Dissipation
ƒ As input changes slowly,
power dissipation takes ISC
place even when there is no
load or parasitic capacitor.
This is known as the short
circuit current.
ƒ Note that the short circuit
power dissipation is greatly 1 kτf
affected by the power I sc = ⋅ ⋅ (VDD − VT )3
supply scaling and is also 12 VDD
proportional to the
frequency and rise/fall time
of the input signal.
Ajit Pal IIT Kharagpur
Short Circuit Power
Dissipation

Ajit Pal IIT Kharagpur


Short Circuit Power Dissipation
⎡ t2 t3

= 2 × ⎢ ∫ i (t )dt + ∫ i (t )dt ⎥
1
I mean
T ⎢⎣ t1 t2 ⎥⎦
Because of symmetry we may write

⎡ t2 ⎤
⎢ ∫ i (t )dt ⎥
4
I mean =
T ⎢⎣ t 1 ⎥⎦
For the nMOS transistor is operating in the saturation region

⎡ t2 β ⎤
I mean =
4
⎢∫ (V in (t ) − V t ) 2
⎥ dt Contd…
T ⎢⎣ t 1 2 ⎥⎦

Ajit Pal IIT Kharagpur


Short Circuit Power Dissipation
τ
2 2
2β ⎛ Vdd ⎞
Imean =
2
τ
∫ ⎜
⎝ τ
t −Vt ⎟ dt

Vt
Vdd

This results in β τ
Imean = (Vdd −2Vt ) .
3

12Vdd T
Short circuit power is given by

β
Psc = Vdd .I mean = (Vdd − 2Vt ) τ . f .
3

12
Ajit Pal IIT Kharagpur
Glitching Power

Output waveform showing glitch at output O2

Ajit Pal IIT Kharagpur


Leakage Current Mechanisms of
Deep-submicrometer Transistors

Ajit Pal IIT Kharagpur


Static Power Dissipation

™ I1= Reverse-bias p-n junction diode leakage


current
™ I2 = Band-to-band tunneling current
™ I3 = Subthreshold leakage current
™ I4 = Gate Oxide tunneling current
™ I5 = Gate current due to hot-carrier injection
™ I6 = Channel punch-through
™ I7 = Gate induced drain-leakage current

Ajit Pal IIT Kharagpur


Reverse Biased Leakage

nMOS is ON pMOS is OFF

Vdd Vdd
Vdd

V out ="0"

n+ n+ p+ p+ n+

Drain leakage
n-well
Reverse leakage
current
p-type substrate

Ajit Pal IIT Kharagpur


p-n Junction Reverse-Biased Current

nMOS inverter and its physical structure

Ajit Pal IIT Kharagpur


The current for one diode is given by
⎛ qV d

I r d lc = A J s ⎜ e
nK T
− 1⎟
Where, ⎝ ⎠
Js = reverse saturation current density, Vd =
diode voltage,
n = emission co-efficient of the diode
(sometimes equal to 1),
q = charge of an electron (1.602 ×10-19 ),
K = Boltzmann constant (1.38 × 10-23 j / k)
T = temperature in ˚K
Then total static power dissipation due to diode
leakage current for 1 million transistors is given by:
6
≈ 0 . 01 μ W
10
P = V dd ∑ i=1
I di

Ajit Pal IIT Kharagpur


Band-to-Band Tunneling Current

High electric field across reverse-biased p-n


junction causes significant current known as
BTBT current, which dominates the p-n junction
leakage current
Ajit Pal IIT Kharagpur
Band-to-Band Tunneling Current

The tunneling current density is given by

EV app ⎛ 3
Eg 2 ⎞
J b −b = A exp ⎜ − B ⎟
1
Eg 2 ⎜ E ⎟
⎝ ⎠
Where,

2m * q 3 and B =
4 2m *
A = 3qη
4π 3 η 2

Ajit Pal IIT Kharagpur


Sub-threshold Leakage Current
¾ Static power due sub-threshold leakage
current:
q
(VG - VS - Vtho - δ 'VS + η VDS ) ⎛ -q VDS ⎞
I sub = Ae n'kT ⎜1 − e kT ⎟
⎜ ⎟
⎝ ⎠
¾ This current increases drastically with
temperature
¾ It also increases as threshold voltage is
scaled down along with the power supply
voltage for better performance.

Ajit Pal IIT Kharagpur


Subthreshold Leakage Current

¾Various mechanism which affect the


subthreshold leakage current are:

ƒ Drain induced Barrier Lowering


ƒ Body effect
ƒ Narrow-width effect
ƒ Effect of channel length and Vth Roll off
ƒ Effect of temperature

Ajit Pal IIT Kharagpur


Contributions of Various Power
Dissipations
0% 100%

Switching power 80%-90%

Leakage power 10%-30%

Short-circuit power 0%-5%

Ajit Pal IIT Kharagpur


Why Leakage Power is an Issue?
¾ In stand-by application leakage component
becomes significant % of total power
¾ Leakage current approaches 10% of total power
in sub deep micron technology
2
10

1
10 Active Power
0
10
Power (W)

-1
10

-2 Stand by Power
10

-3
10

-4
10

-5
10

-6
10

1.0 0.8 0.6 0.5 0.35 0.25 0.18

Technology Generation (μm)


Why Leakage Power is an Issue?

„ Leakage power is becoming a large


component of total power dissipation
Ajit Pal IIT Kharagpur
Degrees of freedom
¾ Three degrees of freedom inherent in the
low-power design space:
ƒ Supply voltage
ƒ Physical capacitance
ƒ Switching activity
¾ Optimizing power consumption invariably
involves reducing one or more of three
parameters
¾ The supply voltage has the most
dominating effect on power dissipation
because of quadratic relationship
Ajit Pal IIT Kharagpur
Low-Power Design Methodology
¾ Low-power design methodologies are to be
applied throughout the design process from
system-level to layout-level, gradually refining or
detailing the abstract specification or model of
the design.
¾ Starting with the system specification the
following steps are performed to get the layout:
ƒ System Specification =>System-level Design
ƒ Behavioral Description => High-level Synthesis
ƒ Structural RTL Description => Logic Synthesis
ƒ Logic-level netlist => Layout Synthesis =>
Layout
Ajit Pal IIT Kharagpur
Power Reduction at Different Levels

¾ Power optimization
approaches at the
high-level are
significant since
research results
indicate that higher
levels of abstraction
have greater
potential for power
reductions.

Ajit Pal IIT Kharagpur


Supply Voltage Scaling

¾Device feature size scaling


¾Architectural level approaches
ƒ Parallelism
ƒ Pipelining
¾ Voltage scaling using high-level
transformations
¾ Dynamic voltage scaling

Ajit Pal IIT Kharagpur


Supply Voltage Scaling for Low Power
¾ A factor of two reduction in supply voltage yields
a factor of four decrease in energy
¾ Theoretical lower limit of supply voltage for
CMOS circuit is 0.2V
¾ Unfortunately, as supply voltage is lowered
delays increases leading to dramatic reduction in
performance
1.0 1.0
Normalized Energy

Normalized Delay
0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0.0 0.0
0 1 2 3 4 5 0 1 2 3 4 5
Vdd Vdd
Ajit Pal IIT Kharagpur
Device Feature Size Scaling
GATE
Source Drain
1 t ox ' = t ox / S
2 N D' = N D × S
5
OXIDE 1 3 L′ = L / S
4 2 3 2
4 X ′j = X j / S

Substrate doping 6 5 W′=W /S

6 N A' = N A × S

Ajit Pal IIT Kharagpur


Constant Field Scaling

Ajit Pal IIT Kharagpur


Constant Voltage Scaling

Ajit Pal IIT Kharagpur


Architecture-Level Approaches:
Parallelism
• Parallel processing can be an important
technique for reducing power consumption
in CMOS circuits.
• Key approach is to trade area for power
while maintaining the same throughput.
• In simple terms, if the supply voltage is
reduced by half, the power is reduced by
one-fourth and performance is lowered by
half.
• The loss is performance can be
compensated by parallel processing.
Ajit Pal IIT Kharagpur
Architecture-Level Approaches:
Parallelism
• Example: Two 16 bit

LATCH
registers supplies two A
operands to a Adder. 16 16 bit
Adder
Delay of the critical 16
path of the adder is 10

LATCH
nsec. Operating B

frequency = 100 MHz 16

• The estimated dynamic fref


power of the circuit is

Pref = C ref .Vref . f ref .
2

Ajit Pal IIT Kharagpur


Architecture-Level Approaches:
Parallelism
• Here the

LATCH
multiplier has A
been duplicated
16-bit
twice, but the Adder

LATCH
input registers
have been
clocked at half the 16 MUX
16

LATCH
frequency of fref.
This helps to 16

reduce the supply 16-bit


Adder
fref

voltage such that

LATCH
B
the critical path
delay is not more
than 20 nsec. fref/2

2
⎛ V ref ⎞ f
P par = 2 . 2 C ref .⎜⎜ ⎟⎟ × ref
•The estimated dynamic power is ⎝ 2 ⎠ 2

Ajit Pal IIT Kharagpur


Architecture-Level Approaches:
Pipelining
In this realization,
instead of 16-bit
addition 8-bit addition
fref is performed in each
0 stage. The critical path
s0-7
delay through the 8-bit
a0-7 8-bit
adder
adder stage is about
b0-7 half that of 16-bit adder
LATCH

LATCH

LATCH
s8-15
s0-15 stage. Therefore, the 8-
a8-15 8-bit bit adder will operate
adder
b8-15 at a clock frequency of
100 mHz with a
reduced power supply
voltage of Vref /2.
2

•Estimated power is: Ppipe = C pipe .V pipe


2
. f pipe = (1.15C ref ).⎛⎜⎜ Vref ⎞
⎟⎟ . f = 0.28 Pref .
⎝ 2 ⎠

Ajit Pal IIT Kharagpur


Architecture-Level Approaches:
Pipelining and Parallelism
0 s0-7
a0-7 8-bit
• Here, More than one
adder
b0-7
s8-15
parallel structure is
s0-15
a8-15 8-bit used and each
adder
structure is pipelined.
LATCH

b8-15

LATCH
MUX
LATCH

0 s0-7
s0-15
s0-15 Both power supply
a0-7 8-bit

b0-7
adder and frequency of
s8-15

a8-15
operation are reduced
8-bit

b8-15
adder fref to achieve substantial
overall reduction in
fref/2
power dissipation.

⎛ f ref ⎞
Parpipe = (2.5C ref )(0.4Vref ) ⎜⎜ ⎟⎟
2
Estimated power
⎝ 2 ⎠
Ajit Pal IIT Kharagpur
Dynamic Voltage Scaling (DVS)
2.5
Destructive

Processor voltage
2.0

1.5 Operational

1.0

0.5
Non-functional
1.2
0
0
1.0 No voltage scaling 74 103 133 162 192 221
CPU clock frequency
Normalized

0.8
Energy

0.6

0.4 The DAG after unrolling


0.2 Ideal DVS and using distributivity
.1 .2 .3 .4 .5 .6 .7 .8 .9 1 and constant
W ork load (r)
propagation
Ajit Pal IIT Kharagpur
Dynamic Voltage Scaling (DVS)
r Worked
Vfixed DC / DC
Monitor
CONVERTER

V(r) f(r)
λ1 w
Basic scheme of DVS
λ2 λ Variable Voltage
Processor μ (r )
X
Μ λn

Task Queue

S
Low-pass
Filter
VF
DC/DC Converter Pulse-width
V0 RL

to be used in DVS Modulation

Reference
Voltage
Comparator

Ajit Pal IIT Kharagpur


Minimizing Switched Capacitance

• Hardware Software Tradeoff


• By choice of suitable data representation
(Bus Encoding)
• Two’s complement Vs Sign Magnitude
• Architectural optimization
• Logic styles

Ajit Pal IIT Kharagpur


Hardware Software Tradeoff
• Same functionality can be either realized by
hardware or by software or by a combination
of both.
¾ Hardware-based approach:
– Faster
– Costlier
– Consumes more power
¾ Software-based approach:
– Cheaper
– Slower
– Consumes lesser power

Ajit Pal IIT Kharagpur


Superscaler Versus VLIW approach

Conventional superscalar out-of-order CPUs use


hardware to create and dispatch micro-ops that can
be executed in parallel.

Ajit Pal IIT Kharagpur


Superscaler Versus VLIW approach

A compiler generates long instructions having


multiple operations meant for different
functional units

Ajit Pal IIT Kharagpur


Transmeta’s Crusoe Processor

¾Long instruction word, called molecule,


can be 64 or 128 bits long
¾A molecule can contain up to 4 RISC like
instructions, called atoms
¾All atoms get executed in parallel
¾Molecules are executed in order

Ajit Pal IIT Kharagpur


Code Morphing Software

The Code
Morphing
software
mediates
between
x86
software
and the
Crusoe
Processor

Ajit Pal IIT Kharagpur


Code Morphing Software
¾ It is fundamentally a dynamic translation
system
ƒ A program that compiles instructions for
instruction set architecture into instructions for
another ISA
ƒ Here, x86 code is compiled into VLIW code
¾ Code Morphing s/w insulates x86 programs
from the h/w engine’s native instruction set
ƒ The native instruction set can be changed
arbitrarily without affecting any x86 software at all
ƒ Only Code Morphing s/w needs to be ported

Ajit Pal IIT Kharagpur


Comparison of Heat Dissipation

A Pentium III processor plays a


DVD at 105°C

A Crusoe processor model TM5400


plays a DVD at 48°C

Ajit Pal IIT Kharagpur


Bus Encoding
¾ Communicating data bits in an appropriately
coded form can reduce the switching activity
¾ Goals of coding
ƒ Remove undesired correlation among information
bits, or
ƒ Introduce controlled correlation
¾ Coding for reduced switching activity falls
under the second category
ƒ Introducing sample to sample correlation such that
total number of bit transitions is reduced

Ajit Pal IIT Kharagpur


One Hot Coding
¾ Two chips are connected using m=2n wires, and
a n-bit data word is encoded by placing ‘1’ on
the ith wire, where 0<=i<=2n-1 is the binary value
corresponding to the bit pattern and ‘0’ on the
remaining m-1 wires.
¾ Guarantees precisely one 0Æ1 and one 1Æ0 bit
transition when a different data word is sent
¾ Number of wires required increases
exponentially with the word size of the data

Ajit Pal IIT Kharagpur


Gray Coding Vs Binary Coding
¾ A gray code
sequence is
a set of
numbers in
which
adjacent
numbers
only have
one bit
difference

Ajit Pal IIT Kharagpur


Gray Coding Vs Binary Coding
¾It is useful when the data is sequential
and highly correlated, like Instruction
addresses
¾No of bit transitions is limited to 2 for
sequential data
¾For random data, the no of transitions
for binary and gray code were
approximately equal

Ajit Pal IIT Kharagpur


Comparison of the temporal activity

Bit transitions per instruction executed


Ajit Pal IIT Kharagpur
Temporal Transition Activity Comparison for Instruction
Addresses

2.43
Chat 1.54

2.51
Browse 1.64

2.76
Boyer 2.09

2.42
Nand 1.57

2.68 Binary Coded


Semigroup 1.99
Gray Coded
2.33
Circuit 1.47

2.57
Reducer 1.71

2.64
Qsort 1.33

2.46
Fastqueens 1.03

0 0.5 1 1.5 2 2.5 3

Bit Transitions Per Instruction Executed

Ajit Pal IIT Kharagpur


Temporal Transition Activity Comparison for Data
Addresses

1.32
Chat 1.2

1.32
Browse 1.4

1.76
Boyer 1.72

1.25
Nand 1.16

1.38 Binary Coded


Semigroup 1.34
Gray Coded
1.33
Circuit 1.18

1.47
Reducer 1.4

1.32
Qsort 1.25

0.85
Fastqueens 0.91

0 0.5 1 1.5 2
Bit Transitions per Instruction Executed
Ajit Pal IIT Kharagpur
Bus Inversion Coding
¾ It is a redundant coding scheme where m=n+1
¾ If the ith data word is Si, then either Si or ~Si is
transmitted depending on which would result
in fewer no of bit transitions
¾ An extra bit P encodes the polarity of the data
word
¾ The coding technique works better for smaller
values of n
ƒ For n=2, switching activity reduction is 25%
ƒ For n=32, switching activity reduction is 11%

Ajit Pal IIT Kharagpur


Bus Inversion Coding
• For larger value of n, the n-bit data bus
can be divided into smaller groups and
each group is coded independently by
associating polarity bit with each of
the group
• When a 32-bit bus is divided into 4
groups of 8-bits, it gives 18.3%
reduction in switching activity as
opposed to 11%
Ajit Pal IIT Kharagpur
Bus Inversion Coding

Predicted Reduction in Switching Activity


Ajit Pal IIT Kharagpur
T0 Encoding
¾ The Gray coding provides an asymptotic best
performance of a single transitions for each address
generated when infinite streams of consecutive
addresses are considered.
¾ However, the code is optimum only in the class of
irredundant codes, i.e. codes that employ exactly n-bit
patterns to encode a maximum of 2n words.
¾ By adding some redundancy to the code, better
performance can be achieved by adapting the T0
encoding scheme, which requires a redundant line INC.
¾ The T0 code provides, zero transition property for
infinite streams of consecutive addresses.
⎧ B(t − 1),1, if t > 0, b(t ) = b(t − 1) + S
ƒEncoding ( B(t ), INC (t )) = ⎨
⎩b(t ), 0, otherwise
⎧b(t − 1) + S if INC = 1 and t > 0
ƒDecoding b(t ) = ⎨
⎩ B (t ) if INC = 0
Ajit Pal IIT Kharagpur
Reducing Glitching Activity
¾ Static designs can exhibit spurious
transitions due to finite propagation delays
from one logic block to the next
¾ “Extra” transitions can be minimized by
ƒ Balancing all signal paths
ƒ Reducing logic depth

Figure: Reducing the glitching activity


Ajit Pal IIT Kharagpur
Logic Styles for
High Performance and Low Power

¾Potential Logic Styles


ƒ Static CMOS Logic
ƒ Dynamic CMOS Logic
ƒ Pass-Transistor Logic (PTL)
¾Experimental Results
¾Conclusions

Ajit Pal IIT Kharagpur


VDD

Static CMOS Logic pull


up
network

INPUT f

¾ Advantages pull
down
CL

ƒ Ease of fabrication network

ƒ Good noise margin VSS

ƒ Robust
ƒ Lower switching ¾ Disadvantages
activity ƒ Larger number of transistors
(larger chip area and delay)
ƒ Good input/output ƒ Spurious transitions (glitch) due
decoupling to finite propagation delays
ƒ No charge sharing leading to extra power dissipation
problem and incorrect operation
ƒ Short circuit power dissipation
ƒ Availability of matured
logic synthesis tools ƒ Weak output driving capability
and techniques ƒ Large number of standard cells
requiring substantial engineering
effort for technology mapping
Ajit Pal IIT Kharagpur
VDD

Dynamic CMOS Logic φ

f precharge evaluation

¾ Advantages INPUT
pull
down
CL f H|H L
φ=1

ƒ Combines the
network
f H
φ=0

advantages of low φ

power of static CMOS VSS

and low chip area of


pseudo-nMOS
ƒ Reduced number of ¾ Disadvantages
transistors compared ƒ Higher switching activity
to static CMOS (n+2 ƒ Not as robust as static
versus. 2n) CMOS logic
ƒ Faster than static ƒ Clock skew problem in
CMOS logic cascaded realization
ƒ Suffers from charge
ƒ No short circuit power sharing problem
dissipation ƒ Matured synthesis tools are
ƒ No spurious transition not available
and glitching power
dissipation Ajit Pal IIT Kharagpur
Pass-Transistor Logic
¾ Advantages ¾ Disadvantages
¾ Lower area due to ƒ Increased delay due to
smaller number of long chain pf pass-
transistors and smaller transistors
input loads ƒ Multi-threshold voltage
¾ Ratio-less PTL allows drop
minimum dimension ƒ Dual-rail logic to
transistors and hence provide all signals in
makes area efficient complementary form
circuit realization ƒ There is possibility of
¾ No short circuit current sneak path
leading to lower power
dissipation
Ajit Pal IIT Kharagpur
Problems in PTL Synthesis

¾ Multi-threshold voltage drop Vdd


Vdd-Vt Vdd-2Vt

¾ Sneak path
Vdd Vdd-mVt
¾ Long chain of pass transistors Vdd-Vt

f From other pass logic

T1 T2 T3 Tn

B C CL

A D
n(n + 1)
1 0 T = 0.69 R.C L
2
Ajit Pal IIT Kharagpur
Experimental Results
¾ Static CMOS circuits have been realized using
Berkeley SIS tool (script.rugged to optimize the netlist
and technology mapping with 44-2.genlib and option
of minimum area)
¾ A large number of benchmark circuits are realized
using the three logic styles with C/C++ programming
in Sun system
¾ Requirements of area are approximated with the
number of transistors
¾ Estimation models for calculating delay and switching
power dissipation for the circuits with three different
logic styles have been proposed and their accuracies
are verified with Spice and Design Analyzer in
Cadence
¾ MOSFET parameters are used from 0.18mm process
technology Ajit Pal IIT Kharagpur
Experimental Results
1 2 3 4 5 6 7 8 9 10
Static CMOS circuits Dynamic CMOS circuits PTL circuits
Benchma Dela Powe Powe
rk Area
Delay Power Area y r Area Delay r
(#Transistor)
(ns) (mW) (ns) (mW) (ns) (mW)
C432 692 3.32 122 581 1.82 88 546 2.08 91
C499 1880 2.23 367 1506 1.69 248 1428 1.62 167
C880 1412 2.21 293 1249 1.79 166 988 1.18 267
C1355 1880 2.61 400 1603 1.51 280 1203 1.04 379
C1908 1756 2.91 367 1689 1.80 251 1088 1.57 298
C2670 1804 2.94 493 1584 1.74 395 1010 1.54 449
C3540 4214 4.57 409 2815 2.63 314 2782 2.58 294
C5315 7058 3.65 830 5970 2.69 515 5364 1.62 778
C6288 11222 11.84 409 8716 4.64 504 6060 4.69 445
C7552 8214 2.99 1604 7328 1.98 1173 5682 1.66 1328
Average % reduction compared to static
-16% -37% -33% -47%
CMOS circuits -25% -17%

Ajit Pal IIT Kharagpur


Comparison of area in terms of
the # of Transistors
12000

10000

8000
#Transistor

Static CMOS
6000 Dynamic CMOS

PTL
4000

2000

0
C432

C499

C880

C1355

C1908

C2670

C3540

C5315

C6288

C7552
Ajit Pal IIT Kharagpur
Comparison of Delay
14

12

10
Delay (ns)

Static CMOS
8
Dynamic CMOS

6 PTL

0
C432

C499

C880

C1355

C1908

C2670

C3540

C5315

C6288

C7552

Ajit Pal IIT Kharagpur


Comparison of Energy requirement

6000
Switching energy (fJ)

5000

4000 Static CMOS

3000 Dynamic CMOS

2000 PTL

1000

0
C1355

C1908

C2670
C3540
C5315

C6288
C7552
C432

C499

C880

Ajit Pal IIT Kharagpur


Leakage Power Limits Vt Scaling

Ajit Pal IIT Kharagpur


Threshold Voltage Scaling
• Scale down the threshold voltage
– As Vth is reduced, the subthreshold
leakage current increases leading to
increase in power dissipation

Subthreshold Leakage current


1.00 10e3 50
Normalized Delay

Isub

[Normalized]
0.80 10e2 40

0.60 10e1 30

0.40 10e0 20
Delay
0.20 10e-1 10

0.00 10e-2
0.0 0.2 0.4 0.6 0.8
Threshold Voltage
Ajit Pal IIT V th [V]
Kharagpur
Threshold Voltage (VT) Scaling
Scale down the threshold voltage for low
voltage low power circuits to increase
performance
VT ↓ = Delay ↓ + Ileakage ↑
Low -VT : Provides high performance

VT ↑ = Delay ↑ + Ileakage ↓
High -VT : Reduces subthreshold leakage

0.2VDD ≤ VT ≤ 0.5VDD
Ajit Pal IIT Kharagpur
Threshold Voltage Scaling
• Fabrication of multiple threshold voltages:
•Multiple channel doping
•Multiple Oxide thickness
•Multiple channel length
•Multiple body bias
•Various Approaches:
•Variable-threshold-voltage CMOS
(VTCMOS) approach
•Multi-threshold-voltage CMOS
(MTCMOS) approach
•Dual-Vt assignment approach

Ajit Pal IIT Kharagpur


VTCMOS
p+ n+ n+ p+ p+ n+
Approach
n-well
p substrate
Typical n-well CMOS

Ajit Pal IIT Kharagpur


Multi-threshold-voltage CMOS (MTCMOS)
zMTCMOS (Multi-threshold CMOS)
(S. Mutoh et al. 1996)
Vdd Vdd

SL Q1 SL Q1
VDDV VDDV

Circuit with
low-Vth
Transistors

GNDV GNDV

SL Q2 SL Q2

Q1, Q2 = Sleep control transistors

MTCMOS
Ajit Pal IITCircuit Scheme
Kharagpur
MTCMOS Performance
• Simulation results
5.0 2.0
Conv. CMOS
Conv. CMOS
(full H-Vth)

Normalized Energy
Normalized Delay

4.0 (full H-Vth) 1.6

3.0 1.2
MTCMOS
2.0 0.8 MTCMOS

1.0 0.6 Conv. CMOS


(full L-Vth)
0.0 0.2
0.5 1.0 1.5 2.0 0.5 1.0 1.5 2.0
Supply Voltage (Volt) Supply Voltage (Volt)

Ajit Pal IIT Kharagpur


Advantages and Limitations of MTCMOS

• MTCMOS can be easily implemented using


existing circuits
• MTCMOS reduces only the standby power
• Large inserted MOSFETs will increase
area and delay
• Extra Vth memory circuit is needed to
maintain the data in the standby mode

Ajit Pal IIT Kharagpur


Dual-Vth Assignment

Reference: L. Wei, Z. Chen, M.


Johnson, K. Roy, and V. De, Design
and Optimization of Low Voltage
High Performance dual threshold
CMOS Circuits, IEEE/ACM Proc. of
DAC-1998

Ajit Pal IIT Kharagpur


Dual Threshold CMOS Technology
•Low-Vth transistors in critical path for high
performance
•Some high-Vth transistors in non-critical paths
to reduce leakage
c
e nodes in critical
d path (low-Vth)

h nodes with low-Vth

nodes with high-Vth


b g

a f
Ajit Pal IIT Kharagpur
c
e nodes in critical
d path (low-Vth)

h nodes with low-Vth

nodes with high-Vth


b g

a f

Darker gates on the critical path


Ajit Pal IIT Kharagpur
c
e nodes in critical
d path (low-Vth)

h nodes with low-Vth

nodes with high-Vth


b g

a f

HighVt = 0.25 assigned to all gates in the off-critical path


Ajit Pal IIT Kharagpur
c
e nodes in critical
d path (low-Vth)

h nodes with low-Vth

nodes with high-Vth


b g

a f

HighVt = 0.396 assigned to some gates in the off-critical path


Ajit Pal IIT Kharagpur
c
e nodes in critical
d path (low-Vth)

h nodes with low-Vth

nodes with high-Vth


b g

a f

HighVt = 0.46 assigned to some gates in the off-critical path


Ajit Pal IIT Kharagpur
Dual-Vth Assignment Problem
• However, not all
the transistors in 2.0

Standby Leakage Power


Vdd = 1V
non-critical paths 1.8
Leff = 0.32u
can be assigned 1.5
Wpeff = 10.5u
a high-Vth 1.2 Wneff = 3u
Tox = 9.8nm
• How to 1.0

selectively assign 0.8


0.5
dual Vth to 0.15 0.25 0.35 0.45 0.55
achieve the best 0.20 0.30 0.40 0.50
leakage saving Vth2(V)
under
performance
constraint?
Ajit Pal IIT Kharagpur
Solution of Dual-Vth Assignment (Contd.)
PI PO
Represents the circuits as a DAG
where each node represents a gate :
and each edge represents a :
connection :
:
1. Initialize all nodes with low-Vth.
2. Compute the critical path(s)
3. Using BFS traversal, assign high-Vth to a node
such that it does not alter the critical path
4. Optimal high-Vth calculation
Repeat the assignment with different high-Vth
( 0.2Vdd<high-Vth<0.5Vdd , Vdd=1V) for which
maximum number of node assignment and hence
minimum leakage power is possible
Ajit Pal IIT Kharagpur
Optimal Dual-Vth Assignment
another Approach

• N.Tripathy, A.Bhosle, D. Samanta and A. Pal,


“Optimal Assignment of High Threshold
Voltage for Synthesizing Dual Threshold CMOS
Circuits”, Proc. VLSI Design 2001, pp.227-232,
Bangalore, January 2001.

Ajit Pal IIT Kharagpur


Delay-Constrained Dual-VT
Static CMOS Circuits
Critical path

Assigned with high-VT transistors

Ajit Pal IIT Kharagpur


Delay-Constrained Dual-VT Assignment
Assume the circuit as a DAG where
each node represents a gate and PI PO
each edge represents a
connection

Algorithm
1. Assume low-VT<high-VT<0.5VDD
2. Initialize all nodes with high-VT
3. Compute the critical path(s)
4. Using DFS traversal, assign low-VT
to a node on the critical path
5. Go to Step 3 until all the nodes on
the critical path are assigned with
low-VT
Ajit Pal IIT Kharagpur
Delay-Constrained Dual-VT Assignment
Repeat the
assignment with 20

different high-VT

Leakage power (μ W)
15

(0.2VDD<high-
VT<0.5VDD ) for 10

which maximum
number of nodes 5

assignment and Optimal high-VT

hence minimum 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55
VT in volt

leakage power is
possible
Ajit Pal IIT Kharagpur
Experimental Results
Comparison of our results with [Wei+99]
With approach [Wei+99] Our approach
%Redn %Redn
in %Redn CPU in %Redn CPU
Benchmark
#Transistor standby in total time #Transistor standby in total time
leakage power (s) leakage power (S)
power power
C432 278 59.65 16.52 20 348 87.35 26.17 36
C499 604 51.09 7.18 118 796 64.45 12.68 174
C880 1126 84.87 14.07 55 1208 88.65 19.41 89
C1355 1232 49.36 8.51 198 1346 59.95 13.15 346
C1908 1430 76.21 15.46 225 1684 83.45 21.75 412
C2670 2736 81.24 19.27 269 3092 92.96 24.80 485
C3540 3430 85.60 21.43 301 3698 90.42 32.29 541
C5315 5432 83.12 18.44 342 5516 89.69 31.05 619
C6288 5768 43.38 19.89 564 8950 83.69 45.42 890
C7552 7102 76.41 20.35 387 7786 87.65 22.36 609
69.01% 16.11% 82.82% 24.92%

Ajit Pal IIT Kharagpur


ISCAS Benchmarks Results
250
Leakage Power in uW

200

150 Single low Vth


100 Dual Vth

50

Average 85.84% savings in leakage power


compared to 60.91%
Ajit Pal of the earlier result
IIT Kharagpur
Comparison
More reduction in leakage power
(Average 25% more reduction in leakage power)
Leakage Power in uW

250
200
150
100
50
0

Single low Vth Dual Vth(Old) Dual Vth (New)


Ajit Pal IIT Kharagpur
Comparison
More number of transistor assignment

10000
transistors

8000
No. of

6000 Old approach


4000 New approach
2000
0

Ajit Pal IIT Kharagpur


Comparison
Higher time complexity

Old approach New approach

2000
C PU tim e in sec.

1500

1000

500

0
Ajit Pal IIT Kharagpur
List of publications
1. Debasis Samanta, Ajit Pal, Synthesis of Low Power High
Performance Dual-VT PTL Circuits, Proc. 17th International
Conference on VLSI Design, 2004, pp.85-90, Mumbai, January
2004
2. D. Samanta, Ajit Pal, Logic Styles for High Performance and Low
Power, Proceedings of the 12th International Workshop on Logic
and Synthesis, 2003 (IWLS-2003), pp. 355-362, May, 2003
3. D. Samanta, M. C. Dharmadeep, and Ajit Pal, Synthesis of High
Performance Low Power PTL Circuits, Proc. ASP-DAC 2003,
Kitakyusyu, Japan, pp. 209-212, January 2003.
4. D. Samanta, and A. Pal, Synthesis of Dual-VT Dynamic CMOS
Circuits, Proc. VLSI Design 2003, New Delhi, India, pp. 121-128,
January 2003.
5. D. Samanta, N. Sinha, and A. Pal, Synthesis of High Performance
Low Power Dynamic CMOS Circuits, Proc. ASP-DAC/VLSI Design
2002, Bangalore, India, pp. 99-104, January 2002.
6. D. Samanta, and A. Pal, Optimal Dual-VT Assignment for Low-
Voltage Energy-Constraint CMOS Circuits, Proc. ASP-DAC/VLSI
Design 2002, Bangalore, India, pp. 193-198, January 2002.
7. N. Tripathi, A. Bhosle, D. Samanta, and Ajit Pal, Optimal
Assignment of High-VT for Synthesizing Dual-VT CMOS Circuits,
Proc. VLSI Design 2001,Ajit
Bangalore, India, pp. 227-232, January
Pal IIT Kharagpur
2001.
Thanks!

Ajit Pal IIT Kharagpur

You might also like