Low Power VLSI Design
[Link]
ECE Department
PSG College of Technology
Introduction
Progress in semiconductor technology
Minimum Feature size
Consequences
- reduced device capacitances
- higher integration densities
- performance improvements
- increased circuit complexities
Power
In the past,
Area, performance, cost and reliability
In recent years,
Power is being given comparable weight
to area and speed considerations
Power and Energy Figures of Merit
Power consumption in Watts
determines battery life in hours
Peak power
determines power ground wiring
designs
sets packaging limits
impacts signal noise margin and
reliability analysis
Energy efficiency in Joules
rate at which power is consumed over
time
Energy = power * delay
Joules = Watts * seconds
lower energy number means less power
to perform a computation at the same
frequency
Low Power VLSI Design
Art of power analysis and
optimization
INTEREST IN LOW POWER CHIPS
AND SYSTEMS
BUSINESS NEEDS
- Growing class of personal computing
devices
as
well
as
wireless
communications and imaging systems
which demand high-speed computations,
complex functionalities and often real-time
processing capabilities with low power
consumption.
TECHNICAL NEEDS
- excessive power consumption is
becoming
limiting
factor
in
integrating more transistors on a
single chip or on a Multichip module
- Unless power consumption is
reduced, the resulting heat will limit
the
feasible
packing
and
performance of VLSI circuits and
systems
Need for Low Power VLSI Chips
Evolution forces of Integrated Circuits
Increased Market Demand
High Performance Computing Systems
Environmental concerns
Why Power Matters
Packaging costs
Power supply rail design
Chip and system cooling costs
Noise immunity and system reliability
Battery life (in portable systems)
Environmental concerns
Office equipment accounted for 5% of
total US commercial energy usage in 1993
Energy Star compliant systems
CMOS Energy & Power Equations
E = CL VDD2 P01 + tsc VDD Ipeak P01 + VDD
Ileakage
f01 = P01 * fclock
P = CL VDD2 f01 + tscVDD Ipeak f01 + VDD Ileakage
Dynamic
power
Short-circuit
power
Leakage
power
Dynamic Power Dissipation
Vdd
Vin
Vout
CL
Energy / Transition C L Vdd2
Power (Energy / Transition)Frequency C L V f
2
dd
CMOS Inverter: Transient
Response
V DD
V DD
tpHL = f([Link])
Rp
= 0.69 RonCL
V out
V out
CL
CL
Rn
V in 5 0
V in 5 V DD
(a) Low-to-high
(b) High-to-low
Dynamic power Charging and
Discharging of a Capacitance
According to the Laws of Physics
dvc (t )
ic (t ) CL
dt
During the charging cycle, energy drawn
t1
Es
vi
(t ) dt
t0
Es C LV
Dynamic power contd.
Energy stored in the capacitor
t1
Ecap
(t )ic (t ) dt
t0
Ecap
C LV
2
Energy dissipated at Rc
Ec Es Ecap
1
CLV 2
2
Dynamic power contd.
during discharge cycle, Energy dissipated in Rd
t1
Ecap
(t )ic (t ) dt
t0
Ecap
C LV
2
Energy dissipated
at Rd
t2
1
Ed vc (t )ic (t )dt CLV 2
2
t1
Dynamic Power
Consumption
Vdd
Vin
Vout
CL
Energy/transition = CL * VDD * P01
2
f01
Pdyn = Energy/transition * f = CL * VDD2 * P01 * f
Pdyn = CEFF * VDD2 * f where CEFF = P01 CL
Not a function of transistor sizes!
Data dependent - a function of switching activity!
Lowering Dynamic Power
Capacitance:
Function of fan-out,
wire length,
transistor sizes
Supply Voltage:
Has been dropping
with successive
generations
Pdyn = CL VDD2 P01 f
Activity factor:
How often, on
average, do wires
switch?
Clock frequency:
Increasing
Short Circuit Power
Consumption
Vin
Isc
Vout
CL
Finite slope of the input signal causes a direct current path
between VDD and GND for a short period of time during
switching when both the NMOS and PMOS transistors are
conducting.
Short Circuit Currents
Determinates
Esc = tsc VDD Ipeak P01
Psc = tsc VDD Ipeak f01
Duration and slope of the input signal, t sc
Ipeak determined by
the saturation current of the P and N
transistors which depend on their sizes,
process technology, temperature, etc.
strong function of the ratio between input
and output slopes
a function of CL
Impact of CL on Psc
Isc 0
Vin
Isc Imax
Vout
CL
Vin
Vout
CL
Large capacitive load
Small capacitive load
Output fall time significantly
larger than input rise time.
Output fall time substantially
smaller than the input rise
time.
Ipeak as a Function of CL
x 10-4
When load capacitance
is small, Ipeak is large.
Ipeak (A)
CL = 20 fF
CL = 100 fF
CL = 500 fF
x 10-10
time (sec)
500 psec input slope
Short circuit dissipation
is minimized by
matching the rise/fall
times of the input and
output signals - slope
engineering.
Psc as a Function of Rise/Fall
Times
When load capacitance
is small (tsin/tsout > 2 for
VDD > 2V) the power is
dominated by Psc
P normalized
VDD= 3.3 V
VDD = 2.5 V
VDD = 1.5V
If VDD < VTn + |VTp| then
Psc is eliminated since
both devices are never
on at the same time.
tsin/tsout
W/Lp = 1.125 m/0.25 m
W/Ln = 0.375 m/0.25 m
CL = 30 fF
normalized wrt zero input
rise-time dissipation
Leakage (Static) Power
Consumption
VDD Ileakage
Vout
Drain junction
leakage
Gate leakage
Sub-threshold current
Sub-threshold current is the dominant factor.
All increase exponentially with temperature!
Leakage as a Function of VT
Continued scaling of supply voltage and the subsequent
scaling of threshold voltage will make subthreshold
conduction a dominate component of power dissipation.
10-2
10-7
10-12
An 90mV/decade VT
roll-off - so each
255mV increase in
VT gives 3 orders of
magnitude reduction
in leakage (but
adversely affects
performance)
Leakage
Vdd
Vout
Drain Junction
Leakage
Sub-Threshold
Current
Sub-threshold current one of most compelling issues
in low-energy circuit design!
Sub-Threshold Current Dominant Factor
Reverse-Biased Diode Leakage
GATE
p+
p+
ReverseLeakageCurrent
+
V
dd
IDL=JSA
Occurs when Source or Drain of N transistor
is at Vdd
2
=15pA/m
fora1.2mCMOStechnology
J
S
PN junctions are formed
at S or D of transistors because of a Parasitic effect of the bulk CMOS
device structure .
o
Junction currentJat
sdoublewithevery9
the s or d of transistors isCincreaseintemperature
picked up though bulk or well contact
JS = 10-100 pA/m2 at 25 deg C for 0.25m CMOS
JS doubles for every 9 deg C!
Subthreshold Leakage Component
even though the transistor is logically turned off there is a Non zero
leakage current through the channel at the microscopic level
TSMC Processes Leakage and
VT
CL018
G
CL018
LP
CL018
ULP
CL018
HS
CL015
HS
CL013
HS
Vdd
1.8 V
1.8 V
1.8 V
2V
1.5 V
1.2 V
Tox (effective)
42
42
42
42
29
24
Lgate
0.16 m
0.16 m
0.18 m
0.13 m 0.11 m
0.08 m
IDSat (n/p)
(A/m)
600/260
500/180
320/130
780/360
860/370
920/400
20
1.60
0.15
300
1,800
13,000
0.42 V
0.63 V
0.73 V
0.40 V
0.29 V
0.25 V
30
22
14
43
52
80
Ioff (leakage)
(A/m)
VTn
FET Perf.
(GHz)
Ileakage(nA/m)
Exponential Increase in Leakage Currents
Temp(C)
From De,1999
Static Power Consumption
Vdd
Istat
Vin =5V
Vout
CL
Pstat = P(In=1) .Vdd . Istat
Dominates
Wasted
energy over dynamic consumption
Should
beaavoided
in almost
all cases,frequency
Not
function
of switching
Review: Energy & Power Equations
E = CL VDD2 P01 + tsc VDD Ipeak P01 + VDD Ileakage
f01 = P01 * fclock
P = CL VDD2 f01 + tscVDD Ipeak f01 + VDD Ileakage
Dynamic power
(~90% today and
decreasing
relatively)
Short-circuit
power
(~8% today and
decreasing
absolutely)
Leakage power
(~2% today and
increasing)
Power and Energy Design
Space
Constant
Throughput/Latency
Energy
Design Time
Variable
Throughput/Latency
Non-active Modules
Logic Design
Active
Reduced Vdd
Sizing
Run Time
DFS, DVS
Clock Gating
Multi-Vdd
(Dynamic
Freq, Voltage
Scaling)
Sleep Transistors
Leakage
+ Multi-VT
Multi-Vdd
Variable VT
+ Variable VT
Approaches for Low Power
Reducing Chip and Package Capacitance
- achieved through process development such as SOI
- Closer packing of P and N transistors
- Lower Parasitic substrate Capacitance
- Achieved through advanced interconnect structures
Scaling down the supply voltage
- Power Consumption is quadratically dependent on supply
voltage
- Supporting circuits can be employed
Employing better power Management techniques
- Careful Management of performance and throughput of the
system based on its computational needs
Employing better design techniques
CMOS INVERTER with CL=1 pF
W =2 2 u
L =2 u
vin
vout
W =2 2u
L =2 u
+
-
C=1pF
Simulation output of Inverter with CL = 1 pF
CL=1pF
v (v out)
5.0
Voltage (V)
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
0
10
20
30
40
50
60
70
80
90
100
Time (ns)
CL=1pF
5.0
v (v i n)
Voltage (V)
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
0
10
20
30
40
50
Time (ns)
60
70
80
90
100
INVERTER with CL=10 pF
W = 22 u
L = 2u
vin
vout
W= 2 2 u
L= 2 u
+
-
C=10pF
Simulation output of Inverter with CL = 10 pF
CL=10pF
5.0
v (v out)
Voltage (V)
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
0
10
15
20
25
30
35
40
45
50
Time (ns)
CL=10pF
5.0
v (v in )
Voltage (V)
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
0
10
15
20
25
Time (ns)
30
35
40
45
50
CMOS INVERTER with CL=25 pF
W=2 2u
L=2 u
vin
vout
W= 22u
L= 2u
+
-
C=25pF
Simulation output of Inverter with CL = 25 pF
CL=25pF
Voltage (V)
v (v ou t)
1.5
1.0
0.5
0.0
0
10
20
30
40
50
60
70
80
90
100
Time (ns)
CL=25pF
Voltage (V)
5.0
v (v in )
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
0
10
20
30
40
50
Time (ns)
60
70
80
90
100
POWER RESULTS
1.25 Technology
CL
POWER (Watts)
1 pF
3.16 x 10-4
10 pF
3.6 x 10-3
25 pF
1.486 x 10-2
Simulation output of Inverter (1.25 Technology)
1.25 micron Technology
v (v o ut)
Voltage (V)
5.0
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
-0.5
0
10
20
30
40
50
60
70
80
90
100
Time (ns)
1.25 micron Technology
5.0
v (v i n)
Voltage (V)
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
0
10
20
30
40
50
Time (ns)
60
70
80
90
100
Simulation output of Inverter (0.18 Technology)
0.18 m icron Technology
Voltage (V)
v (v ou t)
2.0
1.5
1.0
0.5
0.0
-0.5
0
10
20
30
40
50
60
70
80
90
100
Time (ns)
0.18 m icron Technology
Voltage (V)
1.5
v (v in )
1.0
0.5
0.0
0
10
20
30
40
50
Time (ns)
60
70
80
90
100
POWER RESULTS
Technology
POWER (Watts)
1.25
2.9 x 10-3
0.18
6.73 x 10-5
Modification for Circuits with Reduced Swing
Vdd
Vdd
VddVt
CL
E 0 1 = CL Vdd V dd Vt
Can exploit reduced swing to lower power
(e.g., reduced bit-line swing in memory)
Simulation output of Inverter with input 1111100000
SV=11110000
5.0
v (v o u t)
Voltage (V)
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
-0.5
0
10
15
20
25
30
35
40
45
50
Time (ns)
SV=11110000
5.0
v (v i n )
Voltage (V)
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
0
10
15
20
25
Time (ns)
30
35
40
45
50
Simulation output of Inverter with input 1010101010
SV=10101010
v (v ou t)
Voltage (V)
5.0
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
-0.5
0
10
15
20
25
30
35
40
45
50
Time (ns)
SV=10101010
5.0
v (v i n)
Voltage (V)
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
0
10
15
20
25
Time (ns)
30
35
40
45
50
Simulation output of Inverter with input 1111100000
SV=11101011
v (v ou t)
Voltage (V)
5.0
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
-0.5
0
10
15
20
25
30
35
40
45
50
Time (ns)
SV=11101011
5.0
v (v i n)
Voltage (V)
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
0
10
15
20
25
Time (ns)
30
35
40
45
50
Bit Stream
Number of Switching
Transitions
Average Power (Watts)
1111100000
9.5 x 10-5
1010101010
2.8 x 10-4
1110000000
1.44 x 10-4
NAND GATE
clear all;
close all;
clc;
P=inline('(1-pa*pb)*(pa*pb)');
figure(1);
title('NAND GATE');
ezsurfc(P,[0 ,1.0, 0, 1.0]);
view(40,60);
NOR GATE
clc;
clear all;
P=inline('(1-pa-pb+(pa*pb))*(pa+pb-(pa*pb))');
figure(3);
title('NOR GATE');
ezsurf(P,[0 ,1.0, 0, 1.0]);
view(30,30);
XOR GATE
clc;
clear all;
P=inline('(pa+pb-2*(pa*pb))*(1-pa-pb+2*(pa*pb))');
figure(5);
title('XOR GATE');
ezsurfc(P,[0 ,1.0, 0, 1.0]);
NOT GATE
clc;
clear all;
P=inline('(1-pa)*pa');
figure(7);
title('NOT GATE');
ezsurfc(P,[0 ,1.0]);
COMPARISION TABLE
GATE
EXPRESSION
P0->1 FOR
Pa=Pb=0.5
NAND / AND
PaPb(1-(PaPb))
0.1875
NOR / OR
(Pa+Pb-PaPb)(1-Pa-Pb+PaPb)
0.1875
XNOR / XOR
(1-Pa-Pb+2PaPb)(Pa+Pb-2PaPb)
0.25
NOT
Pa(1-Pa)
0.25
SWITCHING POWER DISSIPATION
D
x1
x2
x3
x4
(a)Chain Structure
x1
x2
F
x3
x4
(b)Tree Structure
Four-input AND gate built using two-input AND gates
(a) Chain Structure (b) Tree Structure
Probabilities for Tree and Chain Topologies
Chain
3/16
7/64
15/256
Tree
3/16
3/16
15/256
Implementing the function F=A(B+C) by using TREE and CHAIN
CHAIN IMPLEMENTATION
SCHEMATIC DIAGRAMS:B+C
bin
W=22u
cin
W=22u
L=2u
L=2u
out
W=22u
cin
L=2u
L=2u
W=22u
bin
AND
xin
INVERTER
W=22u
ain
L=2u
L=2u
W=22u
W=22u
L=2u
W=22u
xin
L=2u
out
in
W=22u
L=2u
W=22u
ain
L=2u
out
FUNCTION F=A(B+C)
bin
inverter
B+C
inverter
and
ain
cin
WAVEFORM 0.5 PROBABILITIY
A-INPUT
5.0
B-INPUT
v (ain)
5.0
4.5
4.0
4.0
3.5
3.5
3.0
Voltage (V)
Volta ge (V)
v (bin)
4.5
2.5
2.0
1.5
1.0
0.5
3.0
2.5
2.0
1.5
1.0
0.0
0
10
20
30
40
50
60
70
80
90
100
0.5
Time (ns)
0.0
0
10
20
30
40
50
60
70
80
90
100
Time (ns)
C-INPUT
5.0
OUTPUT
v(cin)
4.5
4.5
4.0
V o lta ge (V )
4.0
3.5
Voltage (V)
v(out)
5.0
3.0
2.5
2.0
1.5
3.5
3.0
2.5
2.0
1.5
1.0
0.5
1.0
0.0
0.5
0.0
0
10
20
30
40
50
60
Time (ns)
70
80
90
100
10
20
30
40
50
60
Time (ns)
70
80
90
100
out
FUNCTION F=A(B+C) : (UNEQUAL PROBABILITY)
WAVEFORM FOR UNEQUAL PROBABILITY
B-INPUT
A-INPUT
5.0
5.0
v(ain)
4.5
4.0
4.0
3.5
3.5
3.0
3.0
Voltage (V)
Voltage (V)
v(bin)
4.5
2.5
2.0
1.5
2.5
2.0
1.5
1.0
1.0
0.5
0.5
0.0
0
10
20
30
40
50
60
70
80
90
0.0
100
10
20
30
40
Time (ns)
C-INPUT
60
70
80
90
100
OUTPUT
5.0
v(cin)
v (out)
5.0
4.5
4.5
4.0
4.0
Vo lta ge (V)
3.5
Voltage (V)
50
Time (ns)
3.0
2.5
2.0
1.5
3.5
3.0
2.5
2.0
1.5
1.0
1.0
0.5
0.5
0.0
0.0
0
10
20
30
40
50
60
Time (ns)
70
80
90
100
10
20
30
40
50
60
Time (ns)
70
80
90
100
TREE IMPLEMENTATION OF
F=A(B+C)
A NAND B
ain
A
NAND C
W=22u
bin
L=2u
ain
L=2u
W=22u
cin
L=2u
L=2u
W=22u
W=22u
out
out
W=22u
ain
L=2u
W=22u
ain
W=22u
bin
L=2u
L=2u
W=22u
cin
L=2u
INVERTER
NOR
YIN
W=22u
L=2u
W=22u
L=2u
XIN
W=22u
L=2u
OUT
W=22u
XIN
L=2u
L=2u
W=22u
YIN
XIN
W=22u
L=2u
XOUT
TREE F=A(B+C)
bin
ain
A NAND B
I NV E R T E R
N OR G A T E
ain
cin
A NAND C
I NV E R T E R
INVERTER
out
WAVEFORM 0.5 PROBABILITIY
A-INPUT
5.0
B-INPUT
v (ain)
5.0
4.5
4.0
4.0
V o lta g e (V )
Vo lta ge (V)
v (bin)
4.5
3.5
3.0
2.5
2.0
1.5
3.5
3.0
2.5
2.0
1.5
1.0
1.0
0.5
0.5
0.0
0
10
20
30
40
50
60
70
80
90
0.0
100
10
20
30
Time (ns)
40
50
70
80
90
100
OUTPUT
C-INPUT
5.0
v(cin)
4.5
4.0
4.0
Voltage (V)
3.0
2.5
2.0
1.5
v(out)
5.0
4.5
3.5
Voltage (V)
60
Time (ns)
3.5
3.0
2.5
2.0
1.5
1.0
0.5
1.0
0.0
0.5
-0.5
0
0.0
0
10
20
30
40
50
60
Time (ns)
70
80
90
100
10
20
30
40
50
60
Time (ns)
70
80
90
100
WAVEFORM FOR UNEQUAL PROBABILITY
A-INPUT
5.0
B-INPUT
v (ain)
5.0
4.5
4.0
4.0
Volta ge (V)
3.5
V o lta ge (V )
v (bin)
4.5
3.0
2.5
2.0
1.5
3.5
3.0
2.5
2.0
1.5
1.0
1.0
0.5
0.5
0.0
0.0
0
10
20
30
40
50
60
70
80
90
100
10
20
30
40
50
C-INPUT
70
80
90
100
OUTPUT
5.0
v (cin)
v (out)
5.0
4.5
4.5
4.0
4.0
3.5
3.5
Voltage (V)
V o lta ge (V )
60
Time (ns)
Time (ns)
3.0
2.5
2.0
1.5
1.0
3.0
2.5
2.0
1.5
1.0
0.5
0.5
0.0
0.0
0
10
20
30
40
50
60
Time (ns)
70
80
90
100
-0.5
0
10
20
30
40
50
60
Time (ns)
70
80
90
100
COMPARATION OF POWER FOR VARIOUES CONFIGURATION
FUNCTION F=A(B+C)
I/P PROBABILITY
CKT IMPLEMENTATION
0.5 PROBABILITY
UNEQUAL
PROBABILITY
CHAIN IMPLEMENTATION
1.245 e-003 watts
3.321 e-004 watts
TREE IMPLEMENTATION
1.913 e-003 watts
5.282 e-004 watts
POWER ANALYSIS
CIRCUIT LEVEL
GATE LEVEL
ARCHITECTURAL LEVEL
REDUCING THE POWER DISSIPATION AT
THE DEVICE AND CIRCUIT LEVELS.
AT THE DEVICE LEVEL,
Silicon on insulator (SOI) technology.
Place and route optimization.
Transistor sizing.
Using submicron devices.
Reducing the sub threshold voltages.
AT THE CIRCUIT AND LOGIC LEVELS,
Reduce gate capacitance.
Reduced logic swing.
Low power support circuitry.
Logic level power down.
Multi threshold circuit technology.
Scaled multi buffer stages.
Circuit Level
Transistor and Gate Sizing
Network Restructuring
Special Latches and Flip Flops
Transistor sizing for leakage power reduction and
speed increase
W=20
W=20
W=20
L=1
L=0.5
W=10
W=10
W=10
L=1
L=2
L=1
L=1
Wp/Lp = 20/1
Wn/Ln = 10/1
Trise = 1
Tfall = 1
Pleakage = 1
Wp/Lp = 20/1
Wn/Ln = 10/2
Trise = 1
Tfall = 2
Pleakage = 0.1
Wp/Lp = 20/0.5
Wp/Lp = 10/1
Trise = 0.5
Tfall = 1
Pleakage = 1
For all cases static probability is 0.99
Network Restructuring
Four different circuit implementation of Y = A ( B +
C)
VDD
VDD
A
B
(a)
(b)
VDD
VDD
Y=A(B+C)
Y=A(B+C)
(c)
(d)
Logic Level
Signal Gating
Logic Encoding
Precomputation Logic
POWER REDUCTION AT LOGIC
LEVEL
Logic level power optimization
techniques.
Reduction of switching activities.
Logic Encoding
Data representation
Boolean function implementation
Elimination of stray switching activities
GATE REORGANIZATION
Network reorganization is applied to
the gate level network to produce
logically equivalent networks with
different qualities of Power, Area and
Delay.
Logic Reconstruction Techniques.
LOCAL RECONSTRUCTION
Transform one logic circuit to another
that is functionally equivalent.
Local reconstruction rules.
Gate reorganization applies series of
local transformations.
Best among the many generated and
evaluated circuit is chosen.
TRANSFORMATION
OPERATORS
TRANSFORMATION
OPERATORS
COMBINE -> hide high frequency nodes
inside the cell so that node capacitance
is not being switched.
DECOMPOSE and DUPLICATE ->
separate critical path from non critical
path.
DELETE -> reduces circuit size.
ADD -> provide intermediate circuit that
might eventually yield to better one.
SIGNAL GATING
Mask unwanted switching activities.
Methods
AND or OR gates.
Latch or Flip flops.
Transmission gate or Tristate buffer.
Clock signals, address bus, data bus,
signals with high frequency or glitches
are good candidates for gating.
Signal Gating
Latch/
FF
(a) Simple gate
(b) Tri-state Buffer
(c) Latch / FF
(d) Transmission gate
Signal gating ---> To mask unwanted switching activities from propagation
forward, causing unnecessary power dissipation
Different ways of implementation :
1. Put an AND / OR gate at the signal path to stop the propagation of the signal
when it needs to be masked.
2. Use a latch / FF to block the propagation of the signal.
3. Transmission gate or a Tri-state buffer can be used in place of a latch if charge
leakage is not a concern.
Binary and Gray Code Counting
Sequences
Binary Code
Gray Code
Sequence
No. Toggles
Sequence
No. Toggles
000
000
001
001
010
011
011
010
100
110
101
111
110
101
111
100
Toggle Activities of
Binary Vs Gray Code Counter
No. of bits
No. of Toggles
Binary
Bn = 2 (2n-1)
Bn / Gn
Gray
Gn = 2n
1.5
14
1.75
30
16
1.88
62
32
1.94
126
64
1.99
Data Representation
2s Complement
Sign Magnitude
Sign Extension
(MSB Sign Bits
Switch for positive to
Negative Transitions)
One bit allocated for
Sign Bit
(Switching is
Minimum)
LOGIC ENCODING
Maximum toggling reduces from n to
n/2.
For mutually uncorrelated signal,
Num. bits
Regular bus
E[P]
Invert bus
E[Q]
Invert /
Regular
E[Q]/E[P]
0.75
0.75
1.56
0.781
3.27
0.817
16
6.83
0.854
32
16
14.19
0.886
64
32
29.27
0.915
128
64
59.96
0.937
256
128
122.1
0.954
1.00
STATE MACHINE ENCODING
State transition graph.
Encoding of state machine.
TRANSITION ANALYSIS OF STATE ENCODING
Expected number of bit transitions in the state
register.
Expected number of transition of output.
E[M] <- expected number of state bit transitions.
Lower E[M] - - - > higher power efficient.
Fewer transition of state register.
Fewer transitions are propagated into combinational
logic of the machine.
0.
1
0.
1
01
11
0.
3
0.
1
0.
3
0.
4
01
00
0.
1
0.
4
11
00
0.
1
0.
1
M1
M2
OUTPUT DONT CARE
ENCODING
Proper assignment of dont care
signal.
Reduce expected number of
transitions in output signal.
PRE COMPUTATION LOGIC
Trade area for power in synchronous
digital circuit.
Identify logic conditions at some
inputs to a combinational logic that is
invariant to the output.
Those input transitions can be gated
or disabled.
Precomputation Logic
Precomputed
R1
Input
Combinational Logic
f(x)
Gated
R2
Inputs
Load
g(x)
Precomputed
Logics
Disable
Outputs
Binary Comparator using
Precomputation Logic
An
Bn
R1
n- bit comparator
An-1
Bn-1
A>B
R2
A1
B1
A n=/= B n
Load Disable
A>B
PRE COMPUTATION
CONDITION
DESIGN ISSUES IN
PRECOMPUTATION LOGIC
Select pre computation architecture.
Determine the pre computed inputs R1
and gated inputs R2 given the function
f(x).
With R1 and R2 selected, find pre
computation logic g(x).
Evaluate the probability of the pre
computation condition and the
potential power savings.
ARCHITECTURAL LEVEL POWER
ESTIMATION
BUS A
BLOCK A
BUS B
BLOCK B
BUS C
Simplified system consisting of two building blocks and
interconnection buses
ESTIMATING THE POWER
DISSIPATION
BLOCK ACTIVITY FACTOR() (no. of equations per
second).
OUTPUT SIGNAL ACTIVITY FACTOR( ).
NORMALIZED BLOCK ENERGY En ,
THE TOTAL POWER DISSIPATED BY THE BUILDING
BLOCKS IS GIVEN BY
PBB =
n n En
n R
Architecture and System
Level
Operation Reduction
Pipelining and Parallelism
Retiming
Unfolding and Folding
Power and Performance Management
Pipelining and Parallel Processing
Pipelining
leads to a reduction in the critical path
Either increases the clock speed (or sampling speed) or reduces the power
consumption at same speed in a DSP system
Parallel Processing
Multiple outputs are computed in parallel in a clock period
The effective sampling speed is increased by the level of parallelism
Can also be used to reduce the power consumption
Two Cascaded Operations :
Register
Register
(a) Non-pipelined architecture
(b) Two-stage pipelined architecture
Cap = 1.2C, Voltage = 0.6 V ,
Frequency = f
A two-datapath parallel system :
Datapath 1
DeMUX
MUX
Datapath 2
Cap =2.2C,
Voltage =0.6V ,
Frequency = 0.5f
Power dissipation of parallel and pipelined
systems
Ppar 2.2C (0.6V ) (0.5 f ) 0.396 Puni
2
Ppip (1.2C )(0.6V ) 2 f 0.432 Puni
Combining parallelism with pipelining to balance
pipe-stage delays :
Register
Register
B(1)
DeMUX
MUX
B(2)
Reducing operations maintaining throughput
X
X2
( X2 + XA )
X
X2 + XA + B
+
X
XA
+
( X +A)
X2 + XA + B
( X +A) X
X
B
Reducing operations with less throughput
A
X2A
X3+X2A+XB+C
X +X A
3
X
X2
X3
XB+C
X
+
XB
X
*
X+A
A
*
(X+A)X
X2+AX+B
+
X3+X2A+BX
X3+X2A+BX+C
Table : Power results for benchmark circuit(C17)
Power(watt)
Binary code
Chebychev distance
Hamming distance
Gray Code
Average
7.186733e-004
1.602692e-003
1.602683e-003
7.185032e-004
maximum
2.710073e-002
2.319953e-002
1.837399e-002
1.837406e-002
minimum
5.911938e-007
7.837975e-007
9.031063e-008
8.799572e-008
Power results for 5 bit even parity generator
Power
Binary code Chebychev
distance
Hamming
distance
Gray code
Average
6.983956e-006
6.985183e-006
6.662661e-006
6.506795e-006
Maximum
4.742954e-002
4.742954e-002
1.678512e-002
1.689332e-002
Minimum
6.399358e-008
2.674323e-007
2.203997e-008
2.203997e-008
power dissipation of unit distance based reordering method is better
than straight binary, chebychev and hamming distance.
Hence it can be concluded that gray code can be employed for
reducing the power dissipation of combinational logic circuits during
testing.
Also the reordering of the test vectors using distance based
techniques does not alter the fault coverage.
RETIMING
Retiming is a mapping from a given DFG, G to a retimed DFT, Gr
such that the corresponding transfer function of G and Gr differ by a pure delay
z-L.
Purposes
To reduce clock cycle time
To reduce number of registers needed.
To reduce the power consumed by the circuit.
Properties of Retiming
The weight of the retimed path p = Vo -> V1 -> -> Vk is given by
Wr (p) = w(p) + r(Vk) r(Vo)
Retiming does NOT change the total number of delays for each cycle.
Retiming does not change loop bound or iteration bound of the DFG
If the retiming values of every node v in a DFG G are added to a
constant integer j, the retimed graph Gr will not be affected. That is, the
weights (# of delays) of the retimed graph will remain the same.
Unfolding
It is a transformation technique that can be applied to a DSP Program to create
a new program describing more than one iteration of the original program.
Bit parallel adder designed by unfolding the bit serial adder using J = 4
Digit serial adder designed by unfolding the bit serial adder using J = 2
FOLDING
Clock Cycle
Adder input
( left )
Adder Input
( top )
System Output
a(0)
b(0)
a(0) + b(0)
c(0)
a(1)
b(1)
a(0) + b(0) + c(0)
a(1) + b(1)
c(1)
a(2)
b(2)
a(1) + b(1) + c(1)
a(2) + b(2)
c(2)
RESULTS
Estimation and
Optimization
Analysis precedes optimization
Accurate analysis techniques must be developed so that they can serve as
proper estimation functions for optimization tools
Optimization precedes Synthesis
A strong foundation in optimization is required before synthesis can proceed
to the next level
State of art in Commercial
EDA tools for low power
Transistor level
Power Estimation at the transistor level can be done by
computing the current flow
Transistor Level
SPICE is widely accepted reference
Epics Power Mill
Circuit-Level Power
Optimization
Transistor Sizing :
Adjusting the size of the each gate or
transistor for minimum power
Voltage Scaling :
Lower supply voltages use less power,
but go
Slower
Voltage islands :
Different blocks can be run at different
voltages, saving power
Level Shifters are required
Variable VDD:
The voltage for a single block can be
varied during operation (high & low)
Multiple threshold voltages :
Modern processes can build
transistors with different thesholds
Power can be saved by using a
mixture of CMOS transistors with two
or more different threshold voltages.
Power Gating :
Uses high Vt sleep transistors which cutoff a circuit block when the block is not
swtiching
Also known as MTCMOS
Long Channel Transistors :
Transistors of more than minimum length
leak less, but are bigger and slower
Stacking and Parking states :
Logic gates may leak differently during
logically equivalent input states
State machines may have less leakage in
certain states
Logic Styles :
Dynamic and static logic , for
example, have different speed/power
tradeoffs