Professional Documents
Culture Documents
LowPowerMethodsRTL PDF
LowPowerMethodsRTL PDF
Jin-Fu Li
Advanced Reliable Systems
y
((ARES)) Lab.
Department of Electrical Engineering
National Central University
Jhongli, Taiwan
Outline
Introduction
Low-Power Gate-Level Design
Low-Power Architecture-Level Design
Algorithmic-Level Power Reduction
RTL Techniques
T h i
for
f Optimizing
O i i i Power
P
EE4012VLSI Design
Introduction
Power consumption
Peak power
p
Average power
EE4012VLSI Design
D
Dynamic
i power di
dissipation
i ti d
during
i switching
it hi
Cinput
Cdrain
National Central University
interconnect
EE4012VLSI Design
Cinput
pMOS
network
Vout
VA
VB
Pavg
nMOS
network
T
dVout
dVout
1 T /2
[ Vout (Cload
)dt (VDD Vout )(Cload
)dt ]
T /2
T 0
dt
dt
EE4012VLSI Design
1
2
2
C load V DD
C load V DD
f CLK
T
EE4012VLSI Design
VA
Vinternal
VB
VB
Cinternal
Vinternal
Vout
VA
Cload
VB
Vout
# ofnodes
Ti C iV i V DD f CLK
i 1
EE4012VLSI Design
P i (1 P i ) C
EE4012VLSI Design
0.0384
B 0.2
A
B
C
D
0.0196
0.0099
C 0.5
D 0
0.5
5
A 0.2
0.0384
B 0.2
C 0.5
D 0
0.5
5
National Central University
0.0099
0.1875
EE4012VLSI Design
EE4012VLSI Design
10
a
Switchin
ng activity
Switching activityy
c
b
a
d
c
b
a
b
c
d
a
b
c
d
EE4012VLSI Design
11
Glitches
spurious transitions due to imbalanced path delays
A
B
D
E
C
D
E
EE4012VLSI Design
12
Chain structure
C
D
A
Tree structure
B
C
D
National Central University
EE4012VLSI Design
13
REG
R1
REG
R1
Combinational Logic
Combinational Logic
REG
R2
REG
R2
Precomputation
Logic
g
National Central University
EE4012VLSI Design
14
REG
R1
A<n-2:0>
REG
R2
Enable
Precomputation logic
B<n-2:0>
1-bit Comparator
(MSB)
(n-1)-bit
Comparator
REG
R4
REG
R3
EE4012VLSI Design
15
D Q
D Q
D Q
clk
T
D Q
D Q
D Q
D Q
clk
EE4012VLSI Design
16
clk
+
select
l t
f2
EE4012VLSI Design
17
Clock--Gating in LowClock
Low-Power Flip
Flip--Flop
CK
EE4012VLSI Design
18
Q
multiiplexer
Output
CK(f/2)
Flip-flops are operated at full voltage and half the clock frequency.
Source: Prof. V. D. Agrawal
National Central University
EE4012VLSI Design
19
Power
P
(W)
33 0
33.0
1535
16.5
887
8 25
8.25
738
10
1.0
No
ormalize
ed power
Deg. Of
D
parallelism
P = CVDD2f/n
05
0.5
0.25
00
0.0
2
4
Degree of parallelism, n
Source: Prof. V. D. Agrawal
EE4012VLSI Design
20
16
R
16x16
multiplier
fref
16
16
16
32
fref/2
16x16
multiplier
32
M 32
U
X
fref/2
fref
Assume that With the same 16x16
multiplier, the power supply can
be reduced from Vref to Vref/1.83.
Vreff
Pparallel 2.2Cref (
)
1.83
f reff
2
R
fref/2
0.33Pref
16
B
fref
16x16
multiplier
p
32
R
16
fref/2
EE4012VLSI Design
21
(A ,B)
32
REG
Half
multiplier
REG
Half
multiplier
32
fref
Ppipeline 1 .2 C ref (
National Central University
V ref
1 .83
EE4012VLSI Design
22
(1)
x(n)
( )
y(n))
y(
D
w(n)
(1)
x(n)
( )
D
2D
(1)
(2)
b
retiming
D
w1(n)
w2(n)
(2)
y(n))
y(
EE4012VLSI Design
a
(2) 2D
b
(2)
23
REG
C1
((6ns))
C2
((2ns))
REG
REG
C2
(2ns)
C3
(4ns)
fref
REG
C1
(6ns)
C3
(4ns)
fref
EE4012VLSI Design
24
EE4012VLSI Design
25
Architecture--Level Design
Architecture
Power Management
C2
C1
C1_FREEZE
C2_FREEZE
C2
C1
C1_FREEZE
FREEZE
C2_FREEZE
EE4012VLSI Design
26
Architecture--Level Design
Architecture
Bus Segmentation
A segmented
g
bus structure
Switched capacitance during each bus access is
significantly reduced
Overall routing area may be increased
EE4012VLSI Design
27
Architecture--Level Design
Architecture
Bus Segmentation
Cbus
Interface
Bus
Cbus1
Cbus1
National Central University
EE4012VLSI Design
28
Algorithmic--Level Design
Algorithmic
factivity Reduction
Gray Code
000
001
010
011
100
101
110
111
000
001
011
010
110
111
101
100
Decimal Equivalent
EE4012VLSI Design
0
1
2
3
4
5
6
7
29
Gray-code
Gray
code counter is more power efficient.
State sequence,
q
, 00 01 10 11 00
Six bit transitions in four clock cycles
6/4 = 1.5 transitions pper clock
State
St t sequence, 00 01 11 10 00
Four bit transitions in four clock cycles
4/4 = 11.0
0 ttransition
iti per clock
l k
EE4012VLSI Design
30
Present
state
a
Next state
A = ab + ab
B = ab + ab
A
B
CK
CLR
Source: Prof. V. D. Agrawal
EE4012VLSI Design
31
Next state
A = ab + ab
B = ab + ab
a
A
B
CK
CLR
Source: Prof. V. D. Agrawal
EE4012VLSI Design
32
Three--Bit Counters
Three
Binary
Gray-code
State
gg
No. of toggles
State
No. of toggles
gg
000
000
001
001
010
011
011
010
100
110
101
111
110
101
111
100
000
000
Av. Transitions/clock = 1
Source: Prof. V. D. Agrawal
EE4012VLSI Design
33
T(binary)
T(gray)
T(gray)/T(binary)
1.0
0.6667
14
0.5714
30
16
0.5333
62
32
0.5161
126
64
0.5079
0.5000
Source: Prof. V. D. Agrawal
EE4012VLSI Design
34
06
0.6
06
0.6
11
0.3
0.4
00
0.6
0.1
0.3
01
0.1
01
01
0.4
09
0.9
00
0.6
0.1
01
0.1
11
09
0.9
EE4012VLSI Design
35
FSM: ClockClock-Gating
Xk/Zk
C oc ca
Clock
can be stopped
when (Xk, Sk) combination
occurs.
Xj/Zk
EE4012VLSI Design
36
PI
Flip-fflops
Combinational
logic
Clock
activation
logic
CK
Latch
PO
L. Benini and G
L
G. De Micheli
Micheli,
Dynamic Power Management,
Boston: Springer, 1998.
Source: Prof. V. D. Agrawal
EE4012VLSI Design
37
N/2
N/2
N b off bi
Number
bit transitions
ii
EE4012VLSI Design
38
Sen
nt data
R
Receive
ed data
Polarity
decision
logic
Bus register
Polarity bit
EE4012VLSI Design
39
RTL--Level Design
RTL
Simple Decoder
module decoder (a, sel);
input [1:0[ a;
ouput [3:0] sel;
reg [3:0]
[3 0] sel;
l
always @(a) begin
case (a)
2b00: sel=4b0001;
2b01: sel=4b0010;
2b10: sel=4b0100;
2b11: sel=4b1000;
endcase
end
endmodule
Signal Gating
EE4012VLSI Design
40
RTL--Level Design
RTL
Datapath Reordering
Initial
A<B
Reordered
stable
Mux
M
Mux
glitchy
glitchy
lit h
A<B
Mux
Mux
stable
EE4012VLSI Design
41
RTL--Level Design
RTL
Memory Partition
128x32
din
addr
write
dout
32
noe
pre addr
pre_addr
[ ]
q addr[7:0]
M 32
U
dout
X
addr[7:1]
dd [7 1]
noe
clk
write
addr
din
addr0
EE4012VLSI Design
dout
32
128x32
42
RTL--Level Design
RTL
Memory Partition
64K bytes
Reads
Data
ARM
Core
Addr
R/W
28K
4K
32K
64K
EE4012VLSI Design
Addr
Range
43
RTL--Level Design
RTL
Memory Partition
Decoder
Data
Addr
R/W
CS
EE4012VLSI Design
Data
Addr
R/W
CS
Data
Addr
R/W
CS
ARM
Core
44