Jan M.

Rabaey
Low Power Design Essentials ©2008
Chapter 4
Optimizing Power @ Design Time

Circuits
Dejan Marković
Borivoje Nikolić
Low Power Design Essentials ©2008 4.2
Chapter Outline
 Optimization framework for energy-delay trade-off
 Dynamic power optimization
– Multiple supply voltages
– Transistor sizing
– Technology mapping
 Static power optimization
– Multiple thresholds
– Transistor stacking
Low Power Design Essentials ©2008 4.3
Energy/Power Optimization Strategy
 For given function and activity, an optimal operation
point can be derived in the energy-performance space
 Time of optimization depends upon activity profile
 Different optimizations apply to active and static power
Fixed
Activity
Variable
Activity
No Activity
- Standby
Active
Design time Run time Sleep
Static
Low Power Design Essentials ©2008 4.4
Maximize throughput for given energy or
Minimize energy for given throughput
Delay
Unoptimized
design
E
max
D
max
D
min
Energy/op
E
min
Energy-Delay Optimization and Trade-off
Trade-off space
Other important metrics: Area, Reliability, Reusability
Low Power Design Essentials ©2008 4.5
The Design Abstraction Stack
Logic/RT
(Micro-)Architecture
Software
Circuit
Device
System/Application
T
h
i
s

C
h
a
p
t
e
r

A very rich set of design parameters to consider!
It helps to consider options in relation to their
abstraction layer
sizing, supply, thresholds
logic family, standard cell
versus custom
Parallel versus pipelined,
general purpose versus
application specific
Bulk versus SOI
Choice of algorithm
Amount of concurrency
Low Power Design Essentials ©2008 4.6
Architecture
Micro-Architecture
Circuit (Logic & FFs)
Optimization Can/Must Span Multiple Levels
Design optimization combines top-down and bottom-up:
“meet-in-the-middle”
Low Power Design Essentials ©2008 4.7
topology A
Delay
E
n
e
r
g
y
/
o
p

Globally optimal energy-delay curve for a
given function
Energy-Delay Optimization
topology B
topology A
topology B
Delay
E
n
e
r
g
y
/
o
p

Low Power Design Essentials ©2008 4.8
Some Optimization Observations
∂E / ∂A
∂D / ∂A
A=A
0
S
A
=
S
B
S
A
f (A
0
,B)
f (A,B
0
)
Delay
E
n
e
r
g
y

D
0
(A
0
,B
0
)
Energy-Delay Sensitivities
[Ref: V. Stojanovic, ESSCIRC’02]
Low Power Design Essentials ©2008 4.9
∆E = S
A
∙(÷∆D) + S
B
∙∆D
On the optimal curve, all sensitivities must be equal
Finding the Optimal Energy-Delay Curve
f (A
0
,B)
f (A,B
0
)
Delay
E
n
e
r
g
y

D
0
(A
0
,B
0
)
∆D
f (A
1
,B)
Pareto-optimal:
the best that can be achieved without disadvantaging at least one metric.
Low Power Design Essentials ©2008 4.10
 Reducing voltages
– Lowering the supply voltage (V
DD
) at the expense of clock
speed

– Lowering the logic swing (V
swing
)
 Reducing transistor sizes (C
L
)
– Slows down logic
 Reducing activity (o)
– Reducing switching activity through transformations
– Reducing glitching by balancing logic
f V V C P
DD swing L active
· · · · o ~
DD swing L active
V V C E · · · o ~
Reducing Active Energy @ Design Time
Low Power Design Essentials ©2008 4.11
 Downsizing and/or lowering the supply on the critical path
lowers the operating frequency
 Downsizing non-critical paths reduces energy for free, but
– Narrows down the path delay distribution
– Increases impact of variations, impacts robustness
t
p
(path)
#

o
f

p
a
t
h
s

target
delay
t
p
(path)
#

o
f

p
a
t
h
s

target
delay
Observation
Low Power Design Essentials ©2008 4.12
topology A
topology B
Delay
E
n
e
r
g
y
/
o
p

 Reference case
– D
min
sizing @ V
DD
max
, V
TH
ref
minimize Energy (V
DD
, V
TH
, W)
subject to Delay (V
DD
, V
TH
, W) ≤ D
con

Constraints
V
DD
min
< V
DD
< V
DD
max
V
TH
min
< V
TH
< V
TH
max
W
min
< W

Circuit Optimization Framework
[Ref: V. Stojanovic, ESSCIRC’02]
Low Power Design Essentials ©2008 4.13
i i+1
C
w
¸C
i
C
i
C
i+1
Optimization Framework: Generic Network
V
DD,i+1
V
DD,i
Gate in stage i loaded by fanout (stage i+1)
Low Power Design Essentials ©2008 4.14
Fit parameters: V
on
, o
d
, K
d,
¸

Alpha-power based Delay Model
V
DD
ref
= 1.2V, technology 90 nm
)
1
1 ( ) (
) (
1 1
i
i
nom
i
i w i
on DD
DD d
p
C
C
C
C C C
V V
V K
t
d
+ +
'
· + =
+ +
÷
=
¸
t
¸
¸
o
(90nm technology)
0 2 4 6 8 10
0
10
20
30
40
50
60
Fanout (C
i+1
/C
i
)
D
e
l
a
y

(
p
s
)

t
p
0.5 0.6 0.7 0.8 0.9 1
0
0.5
1
1.5
2
2.5
3
3.5
4
V
DD
/ V
DD
ref
F
O
4

d
e
l
a
y

(
n
o
r
m
.
)

V
on
= 0.37 V
o
d
= 1.53


simulation
model
t
nom
= 6 ps
¸ = 1.35
simulation
model
Low Power Design Essentials ©2008 4.15
 Parasitic delay p
i
– depends upon gate topology
 Electrical effort f
i
≈ S
i+1
/S
i
 Logical effort g
i
– depends upon gate topology
 Effective fanout h
i
= f
i
g
i
For Complex Gates
[Ref: I. Sutherland, Morgan-Kaufman’99]
Combined with Logical Effort Formulation
) (
¸
t
i i
i nom p
g f
p t + =
Low Power Design Essentials ©2008 4.16
= energy consumed by logic gate i
Dynamic Energy
i i+1
C
w
¸C
i
C
i
C
i+1
V
DD,i+1
V
DD,i
i i i i w i i e i
i DD i i i DD i w i dyn
S S C C C f S K C
V f C V C C C E
/ / ) (
) ( ) (
1 1
2
,
2
, 1
+ +
+
'
= + =
'
=
·
'
+ = · + + = ¸ ¸
) (
2
,
2
1 , i DD i DD i e i
V V S K E ¸ + =
÷
Low Power Design Essentials ©2008 4.17
· for equal h

(D
min
)
max at V
DD
(max)
(D
min
)
Depends on Sensitivity (cE/cD)
Optimizating Return on Investment (ROI)
 Gate Sizing
 Supply Voltage
) (
1 ÷
÷
÷ =
c
c
c
c
i i nom
i
i
i
h h
E
S
D
S
E
t
DD
on
d
DD
on
DD
DD
V
V
V
V
D
E
V
D
V
E
+ ÷
÷ ·
· ÷ =
c
c
c
c
1
) 1 ( 2
o
Low Power Design Essentials ©2008 4.18
 Properties of inverter chain
– Single path topology
– Energy increases geometrically from input to output
Example: Inverter Chain
C
L
1
S
1
=

1 S
2
… S
N
S
3

 Goal
– Find optimal sizing S = [S
1
, S
2
, …, S
N
], supply voltage, and
buffering strategy to achieve the best energy-delay tradeoff
Low Power Design Essentials ©2008 4.19
 Variable taper achieves minimum energy
 Reduce number of stages at large d
inc
[Ref: Ma, JSSC’94]
Inverter Chain: Gate Sizing
1 2 3 4 5 6 7
0
5
10
15
20
25
stage
e
f
f
e
c
t
i
v
e

f
a
n
o
u
t
,

h

0%
1%
10%
30%
d
inc
= 50%
nom
opt
1
2
1
1 1
2
2
1
÷
÷
+ ÷
÷
·
·
· ·
÷ =
+
·
=
i i
i
S
S nom
DD e
i
i i
i
h h
E
F
F
V K
S
S S
S
t
µ
µ
Low Power Design Essentials ©2008 4.20
 V
DD
reduces energy of the final load first
 Variable taper achieved by voltage scaling
Inverter Chain: V
DD
Optimization
1 2 3 4 5 6 7
0
0.2
0.4
0.6
0.8
1.0
stage
V

D
D

/

V

D
D

n
o
m

0%
1%
10%
30%
d
inc
= 50%
nom
opt
Low Power Design Essentials ©2008 4.21
 Parameter with the largest sensitivity has the largest
potential for energy reduction
 Two discrete supplies mimic per-stage V
DD
Inverter Chain: Optimization Results
50
inc
0 10 20 30 40
0
20
40
60
80
100
d (%)
e
n
e
r
g
y

r
e
d
u
c
t
i
o
n

(
%
)

0 10 20 30 40 50
0
0.2
0.4
0.6
0.8
1.0
d
inc
(%)
S
e
n
s
i
t
i
v
i
t
y

(
n
o
r
m
)

cV
DD
S
gV
DD
2V
DD
Low Power Design Essentials ©2008 4.22
 Tree adder
– Long wires
– Re-convergent paths
– Multiple active outputs
S
0
S
15
(A
0
, B
0
)
(A
15
, B
15
)
C
in
Example: Kogge-Stone Tree Adder
[Ref: P. Kogge, Trans. Comp’73]
Low Power Design Essentials ©2008 4.23
sizing: E (-54%)
d
inc
=10%

reference
D=D
min
2V
dd
: E (-27%)
d
inc
=10%

Tree Adder: Sizing vs. Dual-V
DD
Optimization
 Reference design: all paths are critical
 Internal energy ¬ S more effective than V
DD
– S: E(-54%), 2V
dd
: E(-27%) at d
inc
= 10%
Low Power Design Essentials ©2008 4.24
Tree Adder: Multi-dimensional Search
 Can get pretty close to optimum with only 2 variables
 Getting the minimum speed or delay is very expensive
Delay / D
min

0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
0
0.2
0.4
0.6
0.8
1
Reference
S, V
DD
V
DD
, V
TH
S, V
TH
S, V
DD
, V
TH
Low Power Design Essentials ©2008 4.25
 Block-level supply assignment
– Higher throughput/lower latency functions are
implemented in higher V
DD
– Slower functions are implemented with lower V
DD
– This leads to so-called “voltage islands” with separate
supply grids
– Level conversion performed at block boundaries

 Multiple supplies inside a block
– Non-critical paths moved to lower supply voltage
– Level conversion within the block
– Physical design challenging
Multiple Supply Voltages
Low Power Design Essentials ©2008 4.26
V
1
= 1.5V, V
TH
= 0.3V
Using Three V
DD
’s
0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
V1 (V)
V
2

(
V
)
+
V
2
(V)
V
3

(
V
)

0.4 0.6 0.8 1 1.2 1.4
0.4
0.6
0.8
1
1.2
1.4
P
o
w
e
r

R
e
d
u
c
t
i
o
n

R
a
t
i
o

0
0.5
1
1.5
0
0.5
1
1.5
0.4
0.5
0.6
0.7
0.8
0.9
1
[Ref: T. Kuroda, ICCAD’02]
© IEEE 2002
Low Power Design Essentials ©2008 4.27
1.0
0.5
V
D
D

R
a
t
i
o

1.0
0.4
0.5 1.0 1.5
V
1
(V)
P

R
a
t
i
o

V
2
/V
1
P
2
/P
1
{ V
1
, V
2
}

V
2
/V
1
V
3
/V
1
{ V
1
, V
2
, V
3
}
0.5 1.0 1.5
V
1
(V)
P
3
/P
1
V
2
/V
1
V
3
/V
1
V
4
/V
1
0.5 1.0 1.5
V
1
(V)
P
4
/P
1
{ V
1
, V
2
, V
3
, V
4
}
[Ref: M. Hamada, CICC’01]
Optimum Number of V
DD
’s
 The more V
DD
’s the less power, but the effect saturates
 Power reduction effect decreases with scaling of V
DD

 Optimum V
2
/V
1
is around 0.7
© IEEE 2001
Low Power Design Essentials ©2008 4.28
 Two supply voltages per block are optimal

 Optimal ratio between the supply voltages is 0.7

 Level conversion is performed on the voltage boundary,
using a level-converting flip-flop (LCFF)

 An option is to use an asynchronous level converter
– More sensitive to coupling and supply noise

Lessons: Multiple Supply Voltages
Low Power Design Essentials ©2008 4.29
i1 o1
V
DDH
V
DDL
V
SS
Conventional
V
DDH
circuit V
DDL
circuit
i2 o2
i1 o1
V
DDH
V
DDL
V
SS
Shared N-well
V
DDH
circuit V
DDL
circuit
i2 o2
Distributing Multiple Supply Voltages
Low Power Design Essentials ©2008 4.30
V
DDH
circuit
V
DDH
V
DDL
V
SS
N-well isolation
V
DDL
circuit
(a) Dedicated row
(b) Dedicated region
V
DDH
Row

V
DDH
Row

V
DDH

Region

V
DDL

Region

Conventional
V
DDL
Row

V
DDL
Row

Low Power Design Essentials ©2008 4.31
V
DDH
circuit
V
DDH
V
DDL
V
SS
Shared N-well
V
DDL
circuit
(a) Floor plan image
V
DDL
circuit

V
DDH
circuit

Shared N-Well
[Shimazaki et al, ISSCC’03]
Low Power Design Essentials ©2008 4.32
Lower V
DD
portion is shared
[Ref: M. Takahashi, ISSCC’98]
“Clustered voltage scaling”
Example: Multiple Supplies in a Block
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
CVS Structure Conventional Design
Critical Path
Level-Shifting F/F
Critical Path
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
© IEEE 1998
Low Power Design Essentials ©2008 4.33
Pulsed Half-Latch versus Master-Slave LCFFs
 Smaller # of MOSFETs / clock loading
 Faster level conversion using half-latch structure
 Shorter D-Q path from pulsed circuit
[Ref: F. Ishihara, ISLPED’03]
Level Converting Flip-Flops (LCFFs)
q
ck
ckb ck
clk
level conversion
ckb
ck
d
q (inv.)
ck
ck
clk
level conversion
d
mo
mf
sf
so
db
sf
so
MN1 MN2
Master-Slave Pulsed Half-Latch
© IEEE 2003
Low Power Design Essentials ©2008 4.34
 Pulsed precharge
LCFF (PPR)
– Fast level conversion by
precharge mechanism
– Suppressed
charge/discharge toggle
by conditional capture
– Short D-Q path
clk
ckd1
qb
clk level conversion
x
db
qb
ckd1
V
DDH
V
DDH
V
DDH
d
xb
IV1
q (inv.)
ck
MN1
MN2
MP1
[Ref: F. Ishihara, ISLPED’03]
Dynamic Realization of Pulsed LCFF
Pulsed Precharge Latch
© IEEE 2003
Low Power Design Essentials ©2008 4.35
carry
gen.
partial
sum
gp
gen.
5:1
MUX
ain
bin
carry
s0/s1
sum
sumb (long loop-back bus)
clk
clock gen.
: V
DDH
circuit
: V
DDL
circuit
INV1
INV2
0.5pF
sum
sel.
2:1
MUX
9:1
MUX
logical
unit
9:1
MUX
ain0
Case Study: ALU for 64-bit µProcessor
[Ref: Y. Shimazaki, ISSCC’03]

© IEEE 2003
Low Power Design Essentials ©2008 4.36
sum
keeper
pc
sumb
V
DDH
V
DDL
INV1 INV2
domino level converter (9:1 MUX)
ain0
sel
(V
DDH
)
V
DDH
V
DDL
 INV2 is placed near 9:1 MUX to increase noise immunity
 Level conversion is done by a domino 9:1 MUX
Low-Swing Bus and Level Converter
[Ref: Y. Shimazaki, ISSCC’03]

© IEEE 2003
Low Power Design Essentials ©2008 4.37
Single-supply
Shared well
(V
DDH
=1.8V)
E
n
e
r
g
y

[
p
J
]

T
CYCLE
[ns]
Room temperature
200
300
400
500
600
700
800
0.6 0.8 1.0 1.2 1.4 1.6
1.16GHz
V
DDL
=1.4V
Energy:-25.3%
Delay :+2.8%
V
DDL
=1.2V
Energy:-33.3%
Delay :+8.3%
Measured Results: Energy and Delay
[Ref: Y. Shimazaki, ISSCC’03]

© IEEE 2003
Low Power Design Essentials ©2008 4.38
Practical Transistor Sizing
 Continuous sizing of transistors only an option in
custom design
 In ASIC design flows, options set by available
library
 Discrete sizing options made possible in
standard-cell design methodology by providing
multiple options for the same cell
– Leads to larger libraries (> 800 cells)
– Easily integrated into technology mapping

Low Power Design Essentials ©2008 4.39
Larger gates reduce capacitance, but are slower
Technology Mapping
a
b
c
slack=1
d
f
Low Power Design Essentials ©2008 4.40
 (a) Implemented using 4 input NAND + INV
 (b) Implemented using 2 input NAND + 2-input NOR
Library 1:
High-Speed
Technology Mapping
Example: 4-input AND
Gate
type
Area
(cell unit)
Input
cap. (fF)
Average delay
(ps)
Average delay
(ps)
INV 3 1.8 7.0 + 3.8 C
L
12.0 + 6.0 C
L
NAND2 4 2.0 10.3 + 5.3 C
L
16.3 + 8.8 C
L
NAND4 5 2.0 13.6 + 5.8 C
L
22.7 + 10.2 C
L
NOR2 3 2.2 10.7 + 5.4 C
L
16.7 + 8.9 C
L
Library 2:
Low-Power
(delay formula: C
L
in fF)
(numbers calibrated for 90 nm)
Low Power Design Essentials ©2008 4.41
Technology Mapping – Example
4-input AND
(a) NAND4 +
INV
(b) NAND2 +
NOR2
Area 8 11
HS: Delay (ps) 31.0 + 3.8 C
L
32.7 + 5.4 C
L
LP: Delay (ps) 53.1 + 6.0 C
L
52.4 + 8.9 C
L
Sw Energy (fF) 0.1 + 0.06 C
L
0.83 + 0.06 C
L
 Area
– 4-input more compact than 2-input (2 gates vs. 3 gates)
 Timing
– both implementations are 2-stage realizations
– 2
nd
stage INV (a) is better driver than NOR2 (b)
– For more complex blocks, simpler gates will show better
performance
 Energy
– Internal switching increases energy in the 2-input case
– Low-power library has worse delay, but lower leakage (see later)
Low Power Design Essentials ©2008 4.42
 Technology mapping
 Gate selection
 Sizing
 Pin assignment
 Logical Optimizations
 Factoring

 Restructuring

 Buffer insertion/deletion

 Don’t care optimization
Gate-Level Tradeoffs for Power
Low Power Design Essentials ©2008 4.43
Logic restructuring to minimize spurious transitions
Buffer insertion for path balancing
Logic Restructuring
0
1
1
1
0
1
1
1
0
1
1
1
1
1
1
1
1
1
2
3
Low Power Design Essentials ©2008 4.44
Idea: Modify network to reduce capacitance
Caveat: This may increase activity!
p
a
= 0.1; p
b
= 0.5; p
c
= 0.5
Algebraic Transformations
a
b
c
f
f
a
a
b
c
p
1
=0.05
p
2
=0.05
p
3
=0.075
p
4
=0.75
p
5
=0.075
Low Power Design Essentials ©2008 4.45
 Joint optimization over multiple design parameters
possible using sensitivity-based optimization framework
– Equal marginal costs ⇔ Energy-efficient design

 Peak performance is VERY power inefficient
– About 70% energy reduction for 20% delay penalty

– Additional variables for higher energy-efficiency

 Two supply voltages in general sufficient; 3 or more
supply voltages only offer small advantage

 Choice between sizing and supply voltage parameters
depends upon circuit topology

 But … leakage not considered so far
Lessons from Circuit Optimization
Low Power Design Essentials ©2008 4.46
 Considering leakage as well as dynamic
power is essential in sub-100 nm
technologies
 Leakage is not essentially a bad thing
– Increased leakage leads to improved
performance, allowing for lower supply voltages
– Again a trade-off issue …
Considering Leakage @ Design Time
Low Power Design Essentials ©2008 4.47
Must adapt to process and activity variations
( )
2
ln
Lk Sw
opt
d
avg
E E
L
K
o
=
| |
÷ |
|
\ .
Topology Inv Add Dec
(E
Lk
/E
Sw
)
opt
0.8 0.5 0.2
Leakage – Not Necessarily a Bad Thing
Optimal designs have high leakage (E
Lk
/E
Sw
≈ 0.5)
10
-2
10
-1
10
0
10
1
0
0.2
0.4
0.6
0.8
1
E
static
/E
dynamic
E

n
o
r
m

V
th
ref
-180mV
0.81V
DD
max


V
th
ref
-140mV
0.52V
DD
max

Version 1
Version 2
[Ref: D. Markovic, JSSC’04]
© IEEE 2004
Low Power Design Essentials ©2008 4.48
 Switching energy

 Leakage energy

with:
I
0
(+): normalized leakage current with inputs in state +

Refining the Optimization Model
2
1 0
) (
DD e dyn
V f S K E + =
÷
¸ o
cycle DD
q kT
V V
stat
T V e SI E
DD d TH
/
0
) (
ì + ÷
+ =
Low Power Design Essentials ©2008 4.49
 Using longer transistors
– Limited benefit
– Increase in active current
 Using higher thresholds
– Channel doping
– Stacked devices
– Body biasing
 Reducing the voltage!!
Reducing Leakage @ Design Time
Low Power Design Essentials ©2008 4.50
 10% longer gates
reduce leakage by
50%
 Increases switching
power by 18% with
W/L = const.
 Doubling L reduces leakage by 5x
 Impacts performance
– Attractive when don’t have to increase W (e.g. memory)
Longer Channels
100 110 120 130 140 150 160 170 180 190 200
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Transistor length (nm)
1
2
3
4
5
6
7
8
9
10
90 nm CMOS
Switching energy
Leakage power
N
o
r
m
a
l
i
z
e
d

s
w
i
t
c
h
i
n
g

e
n
e
r
g
y

N
o
r
m
a
l
i
z
e
d

l
e
a
k
a
g
e

p
o
w
e
r

Low Power Design Essentials ©2008 4.51
 There is no need for level conversion

 Dual thresholds can be added to standard design flows
– High-V
Th
and Low-V
Th
libraries are a standard in sub-0.18µm
processes
– For example: can synthesize using only high-V
Th
and then only
in-place swap in low-V
Th
cells to improve timing.
– Second V
Th
insertion can be combined with resizing

 Only two thresholds are needed per block
– Using more than two yields small improvements
Using Multiple Thresholds
Low Power Design Essentials ©2008 4.52
V
DD
= 1.5V, V
TH.1
= 0.3V
Three V
TH
’s
0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
Vth2 (V)
V
t
h
1

(
V
)
+
V
TH.3
(V)

V
T
H
.
2

(
V
)

0.4 0.6 0.8 1 1.2 1.4
0.4
0.6
0.8
1
1.2
1.4
L
e
a
k
a
g
e

R
e
d
u
c
t
i
o
n

R
a
t
i
o

0
0.5
1
1.5
0
0.5
1
1.5
0
0.2
0.4
0.6
0.8
1
Impact of third threshold very limited
[Ref: T. Kuroda, ICCAD’02]
© IEEE 2002
Low Power Design Essentials ©2008 4.53
Using Multiple Thresholds
FF
FF
FF
FF
FF
 Cell-by-cell V
TH
assignment (not at block level)
 Achieves all-low-V
TH
performance with substantial
leakage reduction in leakage
Low V
TH
High V
TH
[Ref: S. Date, SLPE’94]
Low Power Design Essentials ©2008 4.54
Shaded transistors are
low threshold
Low-threshold transistors used only in critical paths
Dual-V
T
Domino
P
1
Inv
1
Inv
2
Inv
3
D
n+1
Clk
n
Clk
n+1
D
n


Low Power Design Essentials ©2008 4.55
 Easily introduced in standard cell design
methodology by extending cell libraries with cells
with different thresholds
– Selection of cells during technology mapping
– No impact on dynamic power
– No interface issues (as was the case with multiple
V
DD
’s)

 Impact: Can reduce leakage power substantially
Multiple Thresholds and Design Methodology
Low Power Design Essentials ©2008 4.56
High-V
TH

Only
Low-V
TH

Only
Dual V
TH
Total Slack -53 psec 0 psec 0 psec
Dynamic
Power
3.2 mW 3.3 mW 3.2 mW
Static
Power
914 nW 3873 nW 1519 nW
All designs synthesized automatically using Synopsys Flows
[Courtesy: Synopsys, Toshiba, 2004]
Dual-V
TH
Design for High-Performance Design
Low Power Design Essentials ©2008 4.57
Example: High- vs. Low-Threshold Libraries
0
1000
2000
3000
4000
5000
6000
7000
8000
i10 des C7552 seq pair AVER
LVth
LVth+HVth
HVth
HVth+LVth
L
e
a
k
a
g
e

P
o
w
e
r

(
n
W
)

Selected combinational tests
130 nm CMOS
[Courtesy: Synopsys 2004]
Low Power Design Essentials ©2008 4.58
Complex Gates Increase I
on
/I
off
Ratio
 I
on
and I
off
of single NMOS versus stack of 10 NMOS
transistors
 Transistors in stack are sized up to give similar drive
No stack
Stack
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.5
1
1.5
2
2.5
3
V
DD
(V)
I
o
f
f

(
n
A
)

No stack
Stack
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
20
40
60
80
100
120
140
I
o
n

(
µ
A
)

V
DD
(V)
(90nm technology)
(90nm technology)
Low Power Design Essentials ©2008 4.59
Complex Gates Increase I
on
/I
off
Ratio
Stacking transistors suppresses submicron effects
 Reduced velocity saturation
 Reduced DIBL effect
 Allows for operation at lower thresholds
Stack
No stack
Factor 10!
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.5
1
1.5
2
2.5
3
3.5
x 10
5
V
DD
(V)
I
o
n
/
I
o
f
f

r
a
t
i
o

(90nm technology)
Low Power Design Essentials ©2008 4.60
 Example: 4-input NAND
With transistors sized for
similar performance:
Leakage of Fan-in(2) =
Leakage of Fan-in(4) x 3
(Averaged over all possible
input patterns)
Fan-in (2) Fan-in (4)
versus
Complex Gates Increase I
on
/I
off
Ratio
2 4 6 8 10 12 14 16
0
2
4
6
8
10
12
14
Input pattern
L
e
a
k
a
g
e

C
u
r
r
e
n
t

(
n
A
)

Fan-in (2)
Fan-in (4)
Low Power Design Essentials ©2008 4.61
Example: 32 bit Kogge-Stone Adder
[Ref: S.Narendra, ISLPED’01]
%

o
f

i
n
p
u
t

v
e
c
t
o
r
s

Standby leakage current (µA)
factor 18
Reducing the threshold by 150 mV increases leakage of
single NMOS transistor by factor 60
© Springer 2001
Low Power Design Essentials ©2008 4.62
 Circuit optimization can lead to substantial
energy reduction at limited performance loss
 Energy-delay plots the perfect mechanisms
for analyzing energy-delay trade-off’s.
 Well-defined optimization problem over W,
V
DD
and V
TH
parameters
 Increasingly better support by today’s CAD
flows
 Observe: leakage is not necessarily bad – if
appropriately managed.
Summary
Low Power Design Essentials ©2008 4.63
Books:
 A. Bellaouar, M.I Elmasry, Low-Power Digital VLSI Design Circuits and Systems, Kluwer
Academic Publishers, 1
st
Ed, 1995.
 D. Chinnery, K. Keutzer, Closing the Gap Between ASIC and Custom, Springer, 2002.
 D. Chinnery, K. Keutzer, Closing the Power Gap Between ASIC and Custom, Springer, 2007.
 J. Rabaey, A. Chandrakasan, B. Nikolic, Digital Integrated Circuits: A Design Perspective, 2
nd
ed,
Prentice Hall 2003.
 I. Sutherland, B. Sproul, D. Harris, Logical Effort: Designing Fast CMOS Circuits, Morgan-
Kaufmann, 1
st
Ed, 1999.

Articles:
 R.W. Brodersen, M.A. Horowitz, D. Markovic, B. Nikolic, V. Stojanovic, “Methods for True Power
Minimization,” Int. Conf. on Computer-Aided Design (ICCAD), pp. 35-42, Nov. 2002.
 S. Date, N. Shibata, S.Mutoh, and J. Yamada, "IV 30MHz Memory-Macrocell-Circuit Technology
with a 0.5urn Multi-Threshold CMOS," Proceedings of the 1994 Symposium on Low Power
Electronics, San Diego, CA, pp. 90-91, Oct. 1994.
 M. Hamada, Y. Ootaguro, T. Kuroda, “Utilizing Surplus Timing for Power Reduction,” IEEE
Custom Integrated Circuits Conf., (CICC), pp. 89-92, Sept. 2001.
 F. Ishihara, F. Sheikh, B. Nikolic, “Level conversion for dual-supply systems,” Int. Conf. Low
Power Electronics and Design, (ISLPED), pp. 164-167, Aug. 2003.
 P.M. Kogge and H.S. Stone, “A Parallel Algorithm for the Efficient Solution of General Class of
Recurrence Equations,” IEEE Trans. Comput., vol. C-22, no. 8, pp. 786-793, Aug 1973.
 T. Kuroda, “Optimization and control of V
DD
and V
TH
for low-power, high-speed CMOS design,”
Proceedings ICCAD 2002, pp. , San Jose, Nov. 2002.

References
Low Power Design Essentials ©2008 4.64
Articles (cont.):
 H.C. Lin and L.W. Linholm, “An Optimized Output Stage for MOS Integrated Circuits,” IEEE J.
Solid-State Circuits, vol. SC-10, no. 2, pp. 106-109, Apr. 1975.
 S. Ma and P. Franzon, “Energy Control and Accurate Delay Estimation in the Design of CMOS
Buffers,” IEEE J. Solid-State Circuits, vol. 29, no. 9, pp. 1150-1153, Sept. 1994.
 D. Markovic, V. Stojanovic, B. Nikolic, M.A. Horowitz, R.W. Brodersen, “Methods for True Energy-
Performance Optimization,” IEEE Journal of Solid-State Circuits, vol. 39,
no. 8, pp. 1282-1293, Aug. 2004.
 MathWorks, http://www.mathworks.com
 S. Narendra, S. Borkar, V. De, D. Antoniadis, A. Chandrakasan, “Scaling of stack effect and its
applications for leakage reduction,” Int. Conf. Low Power Electronics and Design, (ISLPED), pp.
195-200, Aug. 2001.
 T. Sakurai and R. Newton, “Alpha-Power Law MOSFET Model and its Applications to CMOS
Inverter Delay and Other Formulas,” IEEE J. Solid-State Circuits, vol. 25, no. 2,
pp. 584-594, Apr. 1990.
 Y. Shimazaki, R. Zlatanovici, B. Nikolic, “A shared-well dual-supply-voltage 64-bit ALU,” Int. Conf.
Solid-State Circuits, (ISSCC), pp. 104-105, Feb. 2003.
 V. Stojanovic, D. Markovic, B. Nikolic, M.A. Horowitz, R.W. Brodersen, “Energy-Delay Tradeoffs
in Combinational Logic using Gate Sizing and Supply Voltage Optimization,” European Solid-
State Circuits Conf., (ESSCIRC), pp. 211-214, Sept. 2002.
 M. Takahashi et al., “A 60mW MPEG video codec using clustered voltage scaling with variable
supply-voltage scheme,” IEEE Int. Solid-State Circuits Conf., (ISSCC), pp. 36-37,
Feb. 1998.
References