You are on page 1of 3

Session_22_Penmor.

qxp:Session_ 12/28/07 5:38 PM Page 404

ISSCC 2008 / SESSION 22 / VARIATION COMPENSATION & MEASUREMENT / 22.3

22.3 A Process-Variation-Tolerant Floating-Point Unit quencies measured for all 64 voltage configurations at 8 different
with Voltage Interpolation and Variable Latency VddNom and ΔV settings are overlaid onto the traditional frequen-
cy tuning versus nominal voltage curve (dark line). Voltage inter-
polation provides a well-distributed frequency tuning range about
Xiaoyao Liang, David Brooks, Gu-Yeon Wei
the nominal frequencies. This tuning range depends on the selec-
Harvard University, Cambridge, MA tion of ΔV and the nominal voltage. By linearly scaling ΔV with
respect to nominal voltage, ~30% frequency tuning range is
Process variation will greatly impact the power and performance achieved across all nominal voltage levels. Figure 22.3.5 presents
of future microprocessors. Design approaches based on multiple a scatter plot of the measured power versus the delay for all of the
supply or threshold voltage assignment provide techniques to stat- configurations in Figure 22.3.4 and demonstrates good tracking
ically tune critical path delays for energy savings [1]. However, with respect to the nominal power-delay curve. The zoomed-in
under process variation, the delay of critical paths may vary, and region of the plot shows that some voltage configurations achieve
a large number of critical paths in circuits reduces the maximum equivalent frequency with lower power consumption.
operating frequency of pipelined processors. One proposed post-
fabrication solution is to adaptively tune the back-body bias to Figure 22.3.4 (inset) shows the effectiveness of voltage interpola-
combat variations for logic structures [2]. Dynamic voltage switch- tion to combat variability across 15 measured FPU chips. The
ing between two power supplies, using level shifters to cross volt- maximum frequency and power consumption of each FPU with a
age domains, has also been proposed to primarily reduce power [3- single 1V supply is plotted, showing frequency and power varia-
5]. This paper explores two fine-grained, post-fabrication circuit- tions around a 240MHz median frequency. With voltage interpola-
tuning techniques to combat process variation for pipelined logic tion, where all FPUs use the same VddH (1.085V) and VddL
components—voltage interpolation and variable latency. These (0.915V), all FPUs can be binned to one median frequency with
techniques are applied to a single-precision floating-point unit minimum power for each. The slowest FPU (#14) can be sped up at
(FPU) designed using a standard CAD synthesis flow in a 0.13µm the expense of higher power. A faster FPU (#2) can trade frequen-
CMOS logic process with 8 metal layers. Measured results from cy for reduction in power. These results show that voltage interpo-
fabricated chips show that both techniques provide wide frequen- lation can be an effective post-fabrication performance-tuning
cy tuning range to deal with frequency fluctuations arising from knob to combat process variation.
process variations with minimal power overhead, and in some con-
figurations, power savings. Variable-latency operation further mitigates the effects of process
variation and saves energy when combined with voltage interpola-
Figure 22.3.1 illustrates the circuit architecture. The FPU is tion. If a 6-stage FPU fails to meet timing due to large delay vari-
pipelined into 6 stages with two power supplies (VddH, VddL) pro- ations, the 7-stage mode provides 17% additional frequency head-
vided across the unit. Each pipeline stage independently selects room and the opportunity to reduce power. Figure 22.3.6 shows the
one of the two voltages, resulting in 64 different voltage configura- measured power-delay space for a 7-stage FPU with voltage inter-
tions. By maintaining a small difference between VddH and VddL, polation, and compares to the power-delay curves (dashed lines)
the design does not need level shifters. Latch-based clocking for 6-stage and 7-stage modes generated by sweeping a single nom-
enables time borrowing across pipeline stages such that choosing inal voltage. To achieve the same delay, the 7-stage pipeline con-
different voltage configurations leads to different effective voltages, sumes less power, and the voltage-interpolated configurations
somewhere between VddH and VddL, across the FPU. This spatial again scatter close to the nominal 7-stage power-delay curve.
voltage dithering provides broad frequency tunability. Each Figure 22.3.6 (inset) plots the measured power in 7-stage mode
pipeline stage is divided into two clocking domains, controlled by operating at 233MHz across the 64 voltage configurations.
complementary clocks (Φ1, Φ2). To increase borrowing, an addi- Configuration #64 saves 12.5% of power when compared to the 6-
tional stage can be introduced by adding one extra latch in the stage FPU with a fixed 1V supply.
middle (Stage 3) and at the end (Stage 6) of the pipeline, as shown
in Figure 22.3.2. For long pipeline units (without tight loops) in a Leveraging time borrowing in latch-based designs, voltage interpo-
microprocessor, the additional cycle of latency introduces very lit- lation offers fine-grain effective voltage tuning with only two sup-
tle system-level performance degradation. When the system is con- ply voltages. Variable latency provides an additional knob to com-
figured in 6-stage mode, the extra latches are set to let data flow bat process variations. This tunability is important for variation-
through. In 7-stage mode, the two latches connected to the pipeline tolerant design since different units on the same chip may have
add two half stages. These extra stages are purely used for time localized worst-case operating frequencies that deviate from the
borrowing to provide almost one cycle of timing slack into the over- nominal. Combining voltage interpolation with traditional voltage-
all pipeline. Clock selection circuits feed each latch with the prop- frequency binning covers both fine- and coarse-grain variations
er clock phase, as shown in Figure 22.3.2. and global adjustment of VddH and VddL balances the current
loads on the two supplies. A die micrograph with floorplan overlay
With two supply voltages, one concern is the potential for is shown in Figure 22.3.7.
increased static current at the voltage domain boundaries. If a
Acknowledgements:
VddL stage drives a VddH stage, the interface PMOS transistors
This work is funded by NSF CCF-0429782. We thank D. Kahn and M.
connected to the VddH domain may not fully shut off, resulting in Hempstead for help in testing and UMC for chip fabrication.
short-circuit current. Figure 22.3.3 plots the static power meas-
ured for the FPU when set to a worst-case voltage configuration to References:
highlight this problem. The amount of short-circuit power con- [1] K. Usami, M. Horowitz, “Clustered voltage scaling technique for low-
sumption depends on ΔV, as well as VddH. For ΔV less then 200mV power design,” Int. Workshop on Low Power Design, Apr. 1995.
(less than Vtp), the increase in static power consumption is negligi- [2] J. Tschanz, et al., “Adaptive body bias for reducing impacts of die-to-die
ble and dominated by leakage. Measured results show that and within-die parameter variations on microprocessor frequency and leak-
ΔV ≅ 200mV is sufficient to enable ~30% frequency tuning and age,” ISSCC Dig. Tech. Papers, Feb. 2002.
[3] H. Li, et al., “Combined circuit and architecture level variable supply-
cover large delay variations. Hence, the design does not use level voltage scaling for low power,” Trans. VLSI, May 2005.
shifters and avoids associated overheads. At low voltages and large [4] C. Tran, et al., “95% Leakage-Reduced FPGA using Zigzag power-gating,
ΔV settings, circuit operation fails. dual-Vth/Vdd and micro-Vdd-hopping,” ASSCC Dig. Tech. Papers, Nov.
2005.
Measured results of the frequency tuning provided by voltage [5] K. Agarwal, K. Nowka, “Dynamic Power management by combination of
interpolation for the 6-stage mode are presented in Figure 22.3.4. dual static supply voltages,” Int. Symp. Quality Elec. Design, Mar. 2007.
The voltage-interpolated configurations use two power supplies:
VddH = VddNom + ΔV/2 and VddL = VddNom - ΔV/2. The max fre-

404 • 2008 IEEE International Solid-State Circuits Conference 978-1-4244-2010-0/08/$25.00 ©2008 IEEE
Session_22_Penmor.qxp:Session_ 12/28/07 5:38 PM Page 405

ISSCC 2008 / February 6, 2008 / 9:30 AM


VddH
VddL
cfgH[5:0]
[5] [4] [3] [2] [1] [0]
cfgL[5:0]
[5] [4] [3] [2] [1] [0]

Din[31:0] FF
(from 1 FF)

ck sel ck sel ck sel* ck sel ck sel ck sel*

ck sel ck sel*

1 1

Figure 22.3.1: Pipelined FPU block diagram with per-stage Vdd and clock selection circuitry.

1.3V
1.2V
0
10 1.1V
1.0V
0.9V
0.8
Power (mW)

0.7V
0.6V
-1
10

-2
10
0 50 100 150 200 250 300 350 400
VddH - VddL (mV)
Figure 22.3.2: Variable latency clocking schemes for 6-stage and 7-stage Figure 22.3.3: Static power vs. ΔV vs. VddH settings for worst-case voltage
modes. Illustrates extra time borrowing for 7-stage mode. Only 3 out of the 6 interpolation setting. Data points corresponding to inoperable voltage settings
stages are shown. omitted.

VddNom = 1V
25
500
21

20
VddH = 1.085V,
VddL = 0.915V 70 64 voltage configurations
22
FPU #14
(VddH = 1.085V, VddL = 0.915V)
19
Power (mW)

450 18 60 20
± 128mV
Power (mW)

17
400
Maximum Frequency (MHz)

FPU #2
16 ± 115mV
50
350 15
± 101mV
230 235 240 245 250 15
Power (mW)

Frequency (MHz)
300 40
± 85mV

VddH = VddL = VddNom


250 ± 70mV
30 10
200 220 240 260 280
200 ± 54mV
Frequency (MHz)
64 voltage configurations 20
VddH = 1.085V, VddL = 0.915V,
150 ± 38mV VddNom = 1V, V = 170mV

VddH = VddL = VddNom 10


100 ± 24mV

50 0
0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 2 4 6 8 10 12 14 16 18
Nominal Voltage (V) clock period (ns)
Figure 22.3.4: Maximum frequency vs. voltage with interpolation for 6-stage Figure 22.3.5: 6-stage pipeline power vs. clock period with voltage interpola-
pipeline. tion.

Continued on Page 623

DIGEST OF TECHNICAL PAPERS • 405


ISSCC 2008 PAPER CONTINUATIONS

22
70 7-stage VddH = 1V, VddL = 0.9V
21
(all VddH)

Power @233MHz (mW)


20
60
19 6-stage

Vdd select

Vdd select
18
power @1V
50 6-stage
17
(all VddH)

savings
16
Power (mW)

40 15

Vdd select

Vdd select
14
0 10 20 30 40 50 60
30 Voltage Configuration #

VddH = 1V,
VddL = 0.9V

Vdd select

Vdd select
20

10

2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7


clock period (ns)

Figure 22.3.6: Power vs. clock period for all 7-stage voltage configurations
across multiple voltage settings. Power savings shown for variable latency (7-
stages) with voltage interpolation. Figure 22.3.7: Die micrograph with floorplan overlay.

623 • 2008 IEEE International Solid-State Circuits Conference 978-1-4244-2010-0/08/$25.00 ©2008 IEEE

You might also like