You are on page 1of 27

ISLPED 2004 8/10/2004

Power-Optimal Pipelining in Deep Submicron Technology


Seongmoo Heo and Krste Asanovi Computer Architecture Group, MIT CSAIL

Traditional Pipelining
Goal: Maximum performance
Vdd
Clk-Q Propagation Delay Clk Setup

Clk

Clk

Pipelining as a Low-Power Tool


Goal: Low-Power, Fixed Throughput
Vdd
Clk-Q Propagation Delay Clk Setup

Time Slack
Clk

Time Slack
Clk

Pipelining as a Low-Power Tool


Goal: Low-Power, Fixed Throughput
Vdd
Clk-Q Propagation Delay Clk Setup

Time Slack

Traded for Power


(supply voltage scaling)

Clk

Time Slack
Clk

Pipelining as a Low-Power Tool


Power

* Clock frequency fixed


Flip-flop Power Overhead

Pipelining
Time slack

Delay

Pipelining as a Low-Power Tool


Power

* Clock frequency fixed

Supply voltage scaling

Power Saving

Delay

Power-Optimal Pipelining
Power reduction from pipelining limited by power overhead of increased number of flip-flops PowerOptimal Pipelining

Power-Optimal Pipelining
Power reduction from pipelining limited by power overhead of increased number of flip-flops PowerOptimal Pipelining Power

Too shallow pipelining

Delay

Power-Optimal Pipelining
Power reduction from pipelining limited by power overhead of increased number of flip-flops PowerOptimal Pipelining Power Too deep pipelining

Too shallow pipelining

Delay

Power-Optimal Pipelining
Power reduction from pipelining limited by power overhead of increased number of flip-flops PowerOptimal Pipelining Power Too deep pipelining

Too shallow pipelining Optimal pipelining Optimal Power Saving Delay

Contribution
Pipelining is an old idea. Research focus has been on performance impact of pipelining. Idea of using pipelining [Chandrakasan 92] to lower power has not been fully explored in deep submicron technology. Analysis and circuit-level simulation of Power-Optimal Pipelining for different regimes of Vth, activity factor, clock gating

Bottom-to-Top Approach
1. Impact of pipelining on power component 2. Impact of pipelining on total power (with/without clock-gating) Power Total Power (clock-gated)
active inactive active

Time

Switching Power Component

Leakage Power Component

Idle Power Component

Bottom-to-Top Approach
1. Impact of pipelining on power component 2. Impact of pipelining on total power (with/without clock-gating) Power Total Power (not clock-gated)
active inactive active

Time

Switching Power Component

Leakage Power Component

Idle Power Component

*Idle power = power consumed when circuit is idle and not clock-gated

Methodology
Target digital system: Fixed throughput, Highly parallel computation, Logic-dominant Test bench
o

BPTM (Berkeley Predictive Technology Model) 70nm process: o LVT(0.17/-0.2), MVT(0.19/-0.22), HVT(0.21/-0.24) o Hspice simulation at 100C, Clock = 2 GHz Baseline N FO4 inverters (N = 2 ~ 24) TG flip-flops

TG flip-flops

One Pipeline Stage

Pipelining and Switching Power: Analytical Trend


Optimal Saving Flip-flop overhead Switching Power O(1/N) Optimal FO4 O(N2) Quadratic reduction of logic switching power Vdd2 N2

Number of FO4 per stage, N

Pipelining and Leakage Power: Analytical Trend

Optimal Saving O(1/N) Flip-flop overhead Leakage Power Optimal FO4 O(N ) (1< < 2) Superlinear reduction of logic leakage power Vdd * e( Vdd) N DIBL effect

Number of FO4 per stage, N

Pipelining and Idle Power: Analytical Trend


Clock-gating is not always possible
o o

Increased control complexity insufficient setup time of clock enable signal Between leakage power scaling and flip-flop switching power scaling depending on leakage level

Leakage Power + Flip-flop Switching Power


o

Pipelining and Idle Power: Analytical Trend


Leakage Power Scale
Optimal Saving

Flip-flop Switching Power Scale

Idle Power
Optimal FO4

Optimal Saving O(N)

ative Power

O(N ) (1< < 2)

O(1/N)

Linear reduction of Flip-flop switching power

Optimal FO4 O(1/N)

1/N * Vdd2

Number of FO4 per stage, N

Number of FO4 per stage, N

Simulation Results: Power Components


Fixed Throughput @ 2 GHz Power Switching Components Power Right hand side curve Saving* N* O(N2) 79(HVT)~ 82(LVT)% 6 Leakage Power O(N )
(1< < 2)

Idle Power O(N) or O(N )


(1< < 2)

70(LVT)~ 75(HVT)% 6

55(HVT)~ 70(LVT)% 8

N = Number of N* = Optimal N FO4 inverters per stage (Not including flip-flop delay)

Saving* = Optimal power saving by pipelining

Optimal Power Saving


Optimal FO4 = 6 Clock Gating Optimal FO4 = 6~8 No Clock Gating
*2 GHz *Flip-flop delay not included in optimal FO4

relative power

relative power

activity factor

activity factor

Optimal Power Saving


Optimal FO4 = 6 Clock Gating
Leakage Power

Optimal FO4 = 6~8 No Clock Idle Power Gating

relative power

relative power

Switching Power

Switching Power

activity factor

activity factor

Optimal Power Saving


Optimal FO4 = 6 Clock Gating Optimal FO4 = 6~8 No Clock Gating

relative power

relative power

LVT
activity factor activity factor

Discussion
LVT can be fast and power-efficient
o

enables lower Vdd

Flip-flop delay more important than flip-flop power for power-optimal pipelining

Limitation of This Work


Effect on Effect on optimal logic optimal power depth saving Super-linear growth of flipflops Additional memory Reduced glitches Parasitic wire capacitance

Conclusion
Pipelining is an effective low-power tool when used to support voltage scaling in digital system implementing highly parallel computation. Optimal Logic Depth: 6-8 FO4
o

~ 8-10 FO4 including flip-flop delay It depends on Vth, AF, Clock-Gating Pipelining is more effective with High AF
 Pipelining is most effective at saving switching power

Optimal Power Saving: 55 80%


o

Insights:
o o o

Pipelining is more effective with lower Vth


 Except for when leakage power is dominant.  reduced flip-flop overhead.

Pipelining is more effective with clock-gating

Acknowledgments
Thanks to SCALE group members and anonymous reviewers Funded by NSF CAREER award CCR0093354, NSF ITR award CCR-0219545, and a donation from Intel Corporation.

BACKUP SLIDES

You might also like