Professional Documents
Culture Documents
A 90-Nm Low-Power FPGA For Battery-Powered Applications
A 90-Nm Low-Power FPGA For Battery-Powered Applications
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 26, NO. 2, FEBRUARY 2007
I. I NTRODUCTION
A key challenge in the IC scaling era is delivering high-performance
solutions while minimizing power and cost. Programmable logic devices such as field-programmable gate arrays (FPGAs) address this
challenge by providing a cost-efficient solution from low- to midvolume applications due to low non-recurring engineering costs. Also,
with in-field programmability, FPGAs provide a platform solution
with faster time to market and longer product lifetime. Despite its
many advantages, FPGAs are not widely found in todays mobile
applications.
Mobile applications generally have two power requirements: active
power and standby power. A typical user application has extremely low
duty cycle, where the device is active for a short period of time (less
than 1 h) and then is inactive for a long period of time (days or weeks).
During the periods of activity, the device must be energy efficient,
that is, perform the necessary functions while consuming minimum
energy. During the idle periods, the device must consume little or no
power to extend battery life. The active power requirement for a typical
mobile IC is on the order of 100s of milliwatts, while its standby power
requirement is on the orders of 10s to 100s of microwatts [2], [3].
Despite FPGAs computational energy efficiency advantage over
digital signal processors (DSPs) [4], [5] today, DSPs are widely used
in battery-operated applications primarily due to their extensive power
management capabilities that enable very low-power consumption
during standby. In contrast, existing FPGAs, designed for highthroughput, high-duty-cycle applications, have little or no power management features. Current low-cost FPGAs consume up to 100s of
milliwatts of standby power [23], while high-end FPGAs can consume
over 1 W [26]. Compared to mobile ICs, the FPGAs standby power is
at least two orders of magnitude higher than what is required.
In this paper, we present the design and implementation of Pika,
a low-power FPGA core targeting battery-powered applications. We
Manuscript received March 16, 2006; revised July 7, 2006. This paper was
recommended by Associate Editor K. Bazargan.
T. Tuan, A. Rahman, S. Das, and S. Trimberger are with Xilinx, Inc., San
Jose, CA 95124 USA (e-mail: tim.tuan@xilinx.com; arif.rahman@xilinx.com;
satyaki.das@xilinx.com; steve.trimberger@xilinx.com).
S. Kao is with Newport Media, Lake Forest, CA 92630 USA.
Digital Object Identifier 10.1109/TCAD.2006.885731
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 26, NO. 2, FEBRUARY 2007
297
Fig. 3 shows the speed and the static power of a test circuit, a 4LUT
driving two double switches, at different voltages. Looking purely at
energy efficiency, one would typically set the supply voltage at a point
where the power-delay product (PDP) is minimal. However, in this
example, PDP continues to drop even below 0.8 V, where performance
degradation becomes prohibitively large and reliability becomes a
concern. Consequently, considering performance, energy efficiency,
and reliability, we choose 1.0 V as our core operating voltage. This
leads to power reduction in all core blocks except for the configuration
memory, which are excluded because they can be more effectively
addressed as described in the next section.
B. Low-Leakage Configuration Memory
Fig. 1.
multipliers, and digital clock managers that provide efficient implementation of complex functions common to many applications.
These specialty blocks are beyond the scope of this paper, but the
optimizations described in this paper can be applied to these blocks
to achieve similar savings.
The power consumption of an FPGA is dependent on block capacitance, block leakage, switching activity, configuration state, resource
utilization, and temperature. To estimate typical FPGA power, we
obtain block capacitance and block leakage through exhaustive SPICE
simulations of each block under a wide range of input states and
configuration states. Typical switching activity is set to 12.5%. Typical
utilization for the given architecture is obtained by analyzing over
100 proprietary benchmark designs. Temperature for a typical design
is defined as 25 C for an idle device and 85 C for an active device.
The accuracy of this characterization methodology had been validated
in previous work [24], [25].
Fig. 2 shows the typical core power consumption of the baseline
architecture excluding the specialty blocks. Active power consists of
the dynamic and static power of an active device, while standby power
consists of the static power of an idle device. For the array sizes shown,
active power is on the orders of 10s100s of milliwatts, while standby
power is on the order of 1s10s of milliwatts. This active power is
comparable to that of existing mobile devices, while the standby power
is about two orders of magnitude higher. Therefore, we prioritize our
efforts to target dramatic standby power reduction, and approach active
power as a secondary goal.
We further break down total power to identify high-power components. Routing switches make up most of the total active power, while
both routing switches and configuration memory represent significant
parts of the total static power.
IV. P OWER O PTIMIZATIONS
In this section, we describe each of the power optimizations applied
to the baseline architecture.
A. Voltage Scaling
Voltage scaling is particularly effective for reducing power because dynamic power and static power are quadratic and exponential
functions of the supply voltage, respectively, while circuit speed is
approximately a linear function of the supply voltage [1]. Designs such
as FPGAs that are optimized for performance typically use relatively
high supply voltages to gain speed at the expense of higher power.
Consequently, lowering the supply voltage of a high-speed design can
often yield a more energy efficient solution.
298
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 26, NO. 2, FEBRUARY 2007
Fig. 3. Delay, static power, and PDP of the test circuit at various voltages.
a separate work [22]. We opt for power gating at the level of individual
tiles as a compromise between the two extremes and because it leads to
efficient physical layout. On the average of over 100 benchmarks, we
find that 25% of the tiles are unused and can be power gated. A diagram
of the power gating architecture is shown in Fig. 5. Configuration
memory cells are not power gated because they consume very little
power due to midoxide design and the ability to retain state in a lowpower mode is valuable.
2) Power Gate Design: Another design consideration is the implementation of the power gating transistors. The most straightforward
power gating implementation is to use both PMOS and NMOS power
gates, but by using only one or the other, one can achieve comparable
power savings with much less area overhead. For the same size,
NMOS power gates generally give better speed characteristics than
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 26, NO. 2, FEBRUARY 2007
Fig. 6.
299
(a) Sleep transistors introduce new gate leakage paths. (b) Sizing of thin-oxide and midoxide power gate produces leakage-delay tradeoffs.
TABLE I
S UMMARY OF E ACH T ECHNIQUE S IMPACT ON T OTAL CORE
Fig. 7. Active and standby power comparison between baseline and Pika for
various size arrays.
V. R ESULTS
A. Power
We designed and laid out Pika in a 90-nm dual-Vt, triple-oxide
CMOS process. To characterize Pikas power, we determined the
capacitance and leakage of individual resources with postlayout simulations, and then used this characterization data to estimate the active
and standby power of a typical user design with the same methodology
described in Section III.
Overall, Pikas active power is 46% less than an equivalent
Spartan-3 core, while its standby power is reduced by 99%. The active
power improvement comes from voltage scaling as well as static power
improvements in the configuration memory cell design. The standby
power reduction primarily comes from static power improvements and
the standby mode. Because the implemented techniques are not circuit
specific, the power breakdown by resource type is approximately the
same as that in Fig. 2. Table I shows a breakdown of the power reduction. For each technique, the reduction is what we would achieve if we
apply that technique alone. Since the techniques are not independent,
the parts do not add up to the sum.
Fig. 7 compares the power of Spartan-3 cores and Pika cores of
equivalent sizes. Since Pika does not have specialty blocks, those
blocks are excluded from the Spartan-3 cores in this comparison.
For equivalent cores of approximately 1500 logic cells to 15 000
logic cells (representing low to medium density Spartan-3 parts),
Pikas typical active power consumption ranges from 13 to 130 mW,
Fig. 8.
and its standby power (in sleep mode) ranges from 46 to 460 W. The
latter range falls within the aforementioned requirements of 10s100s
of microwatts.
B. Area
Area results are obtained by measuring the physical layout area.
The Pika tile is 40% larger than the equivalent Spartan-3 tile. This
additional area increases dynamic power consumption and delay by
approximately 5%. These increases are included in our results. These
increases are modest because large contributors of the total delay and
dynamic power, such as interconnect buffers and logic blocks, are
300
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 26, NO. 2, FEBRUARY 2007