You are on page 1of 5

296

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 26, NO. 2, FEBRUARY 2007

A 90-nm Low-Power FPGA for


Battery-Powered Applications
Tim Tuan, Arif Rahman, Satyaki Das,
Steve Trimberger, and Sean Kao

AbstractProgrammable logic devices such as field-programmable gate


arrays (FPGAs) are useful for a wide range of applications. However,
FPGAs are not commonly used in battery-powered applications because
they consume more power than application-specified integrated circuits
and lack power management features. In this paper, we describe the design
and implementation of Pika, a low-power FPGA core targeting batterypowered applications. Our design is based on a commercial low-cost FPGA
and achieves substantial power savings through a series of power optimizations. The resulting architecture is compatible with existing commercial
design tools. The implementation is done in a 90-nm triple-oxide CMOS
process. Compared to the baseline design, Pika consumes 46% less active
power and 99% less standby power. Furthermore, it retains circuit and
configuration state during standby mode and wakes up from standby mode
in approximately 100 ns.
Index TermsField-programmable gate array (FPGA), programmable
logic devices, standby power.

I. I NTRODUCTION
A key challenge in the IC scaling era is delivering high-performance
solutions while minimizing power and cost. Programmable logic devices such as field-programmable gate arrays (FPGAs) address this
challenge by providing a cost-efficient solution from low- to midvolume applications due to low non-recurring engineering costs. Also,
with in-field programmability, FPGAs provide a platform solution
with faster time to market and longer product lifetime. Despite its
many advantages, FPGAs are not widely found in todays mobile
applications.
Mobile applications generally have two power requirements: active
power and standby power. A typical user application has extremely low
duty cycle, where the device is active for a short period of time (less
than 1 h) and then is inactive for a long period of time (days or weeks).
During the periods of activity, the device must be energy efficient,
that is, perform the necessary functions while consuming minimum
energy. During the idle periods, the device must consume little or no
power to extend battery life. The active power requirement for a typical
mobile IC is on the order of 100s of milliwatts, while its standby power
requirement is on the orders of 10s to 100s of microwatts [2], [3].
Despite FPGAs computational energy efficiency advantage over
digital signal processors (DSPs) [4], [5] today, DSPs are widely used
in battery-operated applications primarily due to their extensive power
management capabilities that enable very low-power consumption
during standby. In contrast, existing FPGAs, designed for highthroughput, high-duty-cycle applications, have little or no power management features. Current low-cost FPGAs consume up to 100s of
milliwatts of standby power [23], while high-end FPGAs can consume
over 1 W [26]. Compared to mobile ICs, the FPGAs standby power is
at least two orders of magnitude higher than what is required.
In this paper, we present the design and implementation of Pika,
a low-power FPGA core targeting battery-powered applications. We
Manuscript received March 16, 2006; revised July 7, 2006. This paper was
recommended by Associate Editor K. Bazargan.
T. Tuan, A. Rahman, S. Das, and S. Trimberger are with Xilinx, Inc., San
Jose, CA 95124 USA (e-mail: tim.tuan@xilinx.com; arif.rahman@xilinx.com;
satyaki.das@xilinx.com; steve.trimberger@xilinx.com).
S. Kao is with Newport Media, Lake Forest, CA 92630 USA.
Digital Object Identifier 10.1109/TCAD.2006.885731

base our design on a low-cost commercial FPGA [23]. Due to practical


concerns, we constrain the design to be compatible with existing
software and process technology. Compared to the baseline Spartan-3
core, Pika consumes 46% less active power and 99% less standby
power. Furthermore, it retains circuit and configuration state during
standby mode and wakes up from standby mode in approximately
100 ns.
II. R ELATED W ORK
Low-power FPGA design has recently become an area of research
interest. Software solutions for power reduction have been proposed
to optimize FPGA power by performing power-aware technology
mapping, placement, routing, and lookup-table (LUT) reprogramming
[10][13]. In the area of hardware design, low-power techniques
such as low-swing interconnect, heterogeneous interconnect, multi-Vt,
multi-Vdd, and fine-grain power gating have been proposed to improve
FPGA power consumption [14][21].
Some of the power optimization techniques presented in this paper
have been applied to application-specified integrated circuit (ASIC)
and processor design. An early application of power gating was a
0.5-m multithreshold CMOS (MTCMOS) DSP that achieved 1000X
power reduction in standby mode. An example of aggressive voltage
scaling is a commercial 0.18-m microprocessor that operates down
to 0.75 V to achieve dramatic power reduction at low frequencies [7].
More recently, power gating was applied to processor IP blocks in a
0.13-m process [8] to achieve 300X standby power reduction and
40X leakage reduction in a 90-nm DSP processor when combined with
multithreshold design [9].
The main contribution of this paper is the application of proven lowpower ASIC and processor design techniques to a 90-nm commercial
FPGA. In doing so, we uncover and address new design issues associated with FPGA architecture, advanced process challenges, cost and
performance constraints, and software compatibility.
III. B ASELINE A RCHITECTURE AND P OWER
For our baseline architecture, we use the Xilinx Spartan-3
FPGA [23] a low-cost FPGA built in a 1.2-V 90-nm CMOS process.
We choose a low-cost architecture because many low-power applications are also cost sensitive. To facilitate comparison and to ensure
our FPGA can be manufactured and programmed without substantial
effort on process technology and software design, we keep our design changes compatible with existing manufacturing processes and
computer-aided-design tools.
The Spartan-3 core architecture (Fig. 1) comprises an array of
configurable logic blocks (CLBs). Each CLB is coupled with a
programmable interconnect switch matrix that connects the CLB to
adjacent and nearby CLBs. We will refer to each CLB/switch matrix
pair as a tile.
Each CLB has four logic slices. Each logic slice has two fourinput LUTs (4LUTs), two configurable flip-flops (FFs), and some additional circuitry for fast arithmetic operations and wide-input functions.
Each interconnect switch matrix comprises numerous programmable
switches that drive CLB outputs to other CLBs or select CLB inputs
from signals in the interconnect. Each programmable switch is a
buffered multiplexer controlled by a set of configuration memory cells.
Although the programmable switches are simple in structure, they
typically dominate the CLB power consumption because they make
up a large portion of the total area.
In addition to the CLBs and the switch matrices, the FPGA core
also has a number of specialty blocks such as block RAMs (BRAMs),

0278-0070/$25.00 2007 IEEE

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 26, NO. 2, FEBRUARY 2007

297

Fig. 3 shows the speed and the static power of a test circuit, a 4LUT
driving two double switches, at different voltages. Looking purely at
energy efficiency, one would typically set the supply voltage at a point
where the power-delay product (PDP) is minimal. However, in this
example, PDP continues to drop even below 0.8 V, where performance
degradation becomes prohibitively large and reliability becomes a
concern. Consequently, considering performance, energy efficiency,
and reliability, we choose 1.0 V as our core operating voltage. This
leads to power reduction in all core blocks except for the configuration
memory, which are excluded because they can be more effectively
addressed as described in the next section.
B. Low-Leakage Configuration Memory

Fig. 1.

Architecture of the baseline Spartan-3 core.

multipliers, and digital clock managers that provide efficient implementation of complex functions common to many applications.
These specialty blocks are beyond the scope of this paper, but the
optimizations described in this paper can be applied to these blocks
to achieve similar savings.
The power consumption of an FPGA is dependent on block capacitance, block leakage, switching activity, configuration state, resource
utilization, and temperature. To estimate typical FPGA power, we
obtain block capacitance and block leakage through exhaustive SPICE
simulations of each block under a wide range of input states and
configuration states. Typical switching activity is set to 12.5%. Typical
utilization for the given architecture is obtained by analyzing over
100 proprietary benchmark designs. Temperature for a typical design
is defined as 25 C for an idle device and 85 C for an active device.
The accuracy of this characterization methodology had been validated
in previous work [24], [25].
Fig. 2 shows the typical core power consumption of the baseline
architecture excluding the specialty blocks. Active power consists of
the dynamic and static power of an active device, while standby power
consists of the static power of an idle device. For the array sizes shown,
active power is on the orders of 10s100s of milliwatts, while standby
power is on the order of 1s10s of milliwatts. This active power is
comparable to that of existing mobile devices, while the standby power
is about two orders of magnitude higher. Therefore, we prioritize our
efforts to target dramatic standby power reduction, and approach active
power as a secondary goal.
We further break down total power to identify high-power components. Routing switches make up most of the total active power, while
both routing switches and configuration memory represent significant
parts of the total static power.
IV. P OWER O PTIMIZATIONS
In this section, we describe each of the power optimizations applied
to the baseline architecture.
A. Voltage Scaling
Voltage scaling is particularly effective for reducing power because dynamic power and static power are quadratic and exponential
functions of the supply voltage, respectively, while circuit speed is
approximately a linear function of the supply voltage [1]. Designs such
as FPGAs that are optimized for performance typically use relatively
high supply voltages to gain speed at the expense of higher power.
Consequently, lowering the supply voltage of a high-speed design can
often yield a more energy efficient solution.

As shown in Fig. 2, configuration memory represents 44% of the


core leakage power. Because it is not timing critical, it is a good
candidate for aggressive power optimization where performance is
adversely affected. It has been suggested to use high-Vt transistors for
configuration MEMORY to save leakage power [11], [19]. However,
such a scheme does not address gate leakage, which can be more
than 50% of the total leakage power at low temperatures. Therefore,
any technique that fails to address gate leakage cannot reduce power
consumption by more than 50%. Should future process generations
successfully adopt high- dielectrics to achieve dramatic gate leakage
reduction, high-Vt devices will again become suitable for power
gating.
The presence of significant gate leakage mandates that a lowleakage device must have thicker gate oxide. Standard thick-oxide
devices used for IO buffers are much too large to be used for millions
of configuration memory cells. Alternatively, we use a midoxide, highVt device for the configuration memory cells. The midoxide transistor
is available in the triple-oxide process used by the Virtex-4 FPGA
family [26]. Therefore, its use does not impose on us a new process
technology.
Using midoxide, high-Vt transistors dramatically cuts both subthreshold leakage and gate leakage in the configuration memory. As
shown in Fig. 4, total memory leakage is reduced by nearly two orders
of magnitude. The use of midoxide transistors does require additional
mask cost and causes some die area increase (the latter will be covered
in Section V), but the power savings sufficiently justify the added cost.
C. Power Gating for Active Leakage Reduction
Power gating is a well-known power reduction technique in ASICs
and microprocessors [6], [8], [9]. To support power gating, one or more
power transistors are inserted between a circuit block and its power
and/or ground. When the power gates are turned on, active operation
is unaffected except for a small performance penalty due to the
on-resistance of the power gates. When the power gates are turned off,
circuit current is limited to the leakage current of the power gates.
Typically, power gating is applied to coarse-grain functional blocks
to reduce standby leakage when the block is temporarily idle. In this
paper, we also use power gating to reduce the chips active leakage
power by power gating unused blocks.
1) Granularity: One of the main design decisions in power gating
design is the granularity of the smallest block that can be independently power gated. At one extreme, each LUT, FF, and routing
switch may be independently gated [16]. This approach leads to a
larger fraction of unused blocks and hence greater power savings from
switching them OFF, but the area overhead is also greater. At the other
extreme, clusters of 20 or more tiles may be controlled by a single
power gate [11]. This approach has the benefit of less area overhead
from power gating [8], but fewer clusters will be completely unused.
The tradeoff of power gating granularity is studied in greater detail in

298

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 26, NO. 2, FEBRUARY 2007

Fig. 2. Typical power consumption of Spartan-3 cores.

Fig. 3. Delay, static power, and PDP of the test circuit at various voltages.

Fig. 5. Proposed power gating architecture, where configuration memory cells


are not power gated.

Fig. 4. Leakage of thin-oxide, regular-Vt and midoxide, and high-Vt


memory cell.

a separate work [22]. We opt for power gating at the level of individual
tiles as a compromise between the two extremes and because it leads to
efficient physical layout. On the average of over 100 benchmarks, we
find that 25% of the tiles are unused and can be power gated. A diagram
of the power gating architecture is shown in Fig. 5. Configuration
memory cells are not power gated because they consume very little
power due to midoxide design and the ability to retain state in a lowpower mode is valuable.
2) Power Gate Design: Another design consideration is the implementation of the power gating transistors. The most straightforward
power gating implementation is to use both PMOS and NMOS power
gates, but by using only one or the other, one can achieve comparable
power savings with much less area overhead. For the same size,
NMOS power gates generally give better speed characteristics than

PMOS power gates due to greater carrier mobility. In our simulations,


we find the power savings are similar for both. Thus, we choose
NMOS-only power gates.
Conventional power gating uses thin-oxide high-Vt transistors [6],
which are susceptible to gate leakage in a 90-nm process. Fig. 6(a)
shows the gate leakage path introduced by thin-oxide high-Vt power
gates. Fig. 6(b) shows the power and delay behavior of thin-oxide and
midoxide power gating. Each data point represents a different power
gate size. Although thin-oxide power gating is potentially faster, it
also consumes higher power than midoxide power gating due to the
additional gate leakage. Consequently, we implement midoxide power
gates. Using simulation, we size the power gates such that performance
is degraded by no more than 10%.
D. Standby Modes
A simple extension to tile-level power gating enables a low-power
standby mode where all tiles are power gated while the circuit configuration is retained. To retain the circuit state stored in FFs, we capture
the FF content into dedicated configuration memory cells before
entering the standby mode, and restore the FF states when coming out
of the standby mode. A simple controller is designed to coordinate
the necessary standby and wake up sequences to prevent contention.
Many applications need a small amount of functionality to remain
active during standby mode to detect wake-up events. Tile-based
power gating enables such partial standby mode. Since each tile can
be independently power gated, any combination of tiles can be selected

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 26, NO. 2, FEBRUARY 2007

Fig. 6.

299

(a) Sleep transistors introduce new gate leakage paths. (b) Sizing of thin-oxide and midoxide power gate produces leakage-delay tradeoffs.
TABLE I
S UMMARY OF E ACH T ECHNIQUE S IMPACT ON T OTAL CORE

to remain active during standby mode. This decision is programmable


by setting the value of a single configuration bit per tile. In Fig. 5,
a configurable multiplexer (labeled partial standby) chooses whether
each tile remains awake during partial standby mode.

Fig. 7. Active and standby power comparison between baseline and Pika for
various size arrays.

V. R ESULTS
A. Power
We designed and laid out Pika in a 90-nm dual-Vt, triple-oxide
CMOS process. To characterize Pikas power, we determined the
capacitance and leakage of individual resources with postlayout simulations, and then used this characterization data to estimate the active
and standby power of a typical user design with the same methodology
described in Section III.
Overall, Pikas active power is 46% less than an equivalent
Spartan-3 core, while its standby power is reduced by 99%. The active
power improvement comes from voltage scaling as well as static power
improvements in the configuration memory cell design. The standby
power reduction primarily comes from static power improvements and
the standby mode. Because the implemented techniques are not circuit
specific, the power breakdown by resource type is approximately the
same as that in Fig. 2. Table I shows a breakdown of the power reduction. For each technique, the reduction is what we would achieve if we
apply that technique alone. Since the techniques are not independent,
the parts do not add up to the sum.
Fig. 7 compares the power of Spartan-3 cores and Pika cores of
equivalent sizes. Since Pika does not have specialty blocks, those
blocks are excluded from the Spartan-3 cores in this comparison.
For equivalent cores of approximately 1500 logic cells to 15 000
logic cells (representing low to medium density Spartan-3 parts),
Pikas typical active power consumption ranges from 13 to 130 mW,

Fig. 8.

Power consumption of a single tile entering and exiting standby mode.

and its standby power (in sleep mode) ranges from 46 to 460 W. The
latter range falls within the aforementioned requirements of 10s100s
of microwatts.
B. Area
Area results are obtained by measuring the physical layout area.
The Pika tile is 40% larger than the equivalent Spartan-3 tile. This
additional area increases dynamic power consumption and delay by
approximately 5%. These increases are included in our results. These
increases are modest because large contributors of the total delay and
dynamic power, such as interconnect buffers and logic blocks, are

300

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 26, NO. 2, FEBRUARY 2007

unaffected by the increases in routing wire length. Area increase also


negatively affects chip cost, although when accounting for the costs of
testing, assembly, and packaging, die cost is only a fraction of the total
chip cost.
C. Performance
We estimated a user designs performance by determining the delay
of individual resources through postlayout simulation and totaling the
delays of a typical critical path. The total performance impact in Pika is
approximately 27%. Of that, approximately 7% is due to power gating,
5% is due to layout area increase, and the rest is from voltage scaling.
Performance penalty from power gating is less than what we intended
because in our physical layout, we are able to up size the power gates
to fill in open spaces.
D. Mode Transition Behavior
One of the objectives of this design is fast wake-up time from
standby mode, which involves restoring power and state to the power
gated tiles. Not having to reconfigure because configuration data is
retained in the device saves a considerable amount of wake-up time.
Fig. 8 shows the power curve of a single tile entering and exiting
standby mode. The exit or wake-up time is shown to be approximately
100 ns. Most of this time is spent to charge the gate of the power
gate and to discharge the virtual ground. Since all tiles are woken in
parallel, the entire core also wakes up in approximately 100 ns.
VI. C ONCLUSION
FPGAs present numerous advantages in deep-submicrometer IC
design. However, high-power consumption, in particular high standby
power, has thus far prevented FPGAs from being widely adopted in
battery-powered applications. We have presented a low-power FPGA
core based on a commercial FPGA architecture that achieves dramatic
active and standby power reductions with limited performance and
area degradation.
R EFERENCES
[1] J. Rabaey, A. Chandrakasan, and B. Nikolic, Digital Integrated Circuits:
A Design Perspective, 2nd ed. Englewood Cliffs, NJ: Prentice-Hall,
2003.
[2] Intel Corp., PXA270 Processor Datasheet. [Online]. Available: http://
www.intel.com/design/pca/products/pxa27x/techdocs.htm

[3] Texas Instruments, OMAP5910 Dual-Core Processor Data Manual.


[Online].
Available:
http://focus.ti.com/docs/prod/folders/print/
omap5910.html
[4] T. Claasen, High speed: Not the only way to exploit the intrinsic computational power of silicon, in Proc. ISSCC, 1999, pp. 2225.
[5] P. Schumacher et al., An efficient JPEG2000 encoder implemented on a
platform FPGA, in Proc. SPIE Annu. Meeting, 2003, pp. 306313.
[6] S. Mutoh et al., A 1 V multi-threshold voltage CMOS DSP with an
efficient power management technique for mobile phone application, in
Proc. ISSCC, 1996, pp. 168169.
[7] L. Clark, N. Deutscher, F. Ricci, and S. Demmons, Standby power
management for a 0.18 m microprocessor, in Proc. ISLPED, 2002,
pp. 712.
[8] R. Puri, L. Stok, and S. Bhattacharya, Keeping hot chips cool, in Proc.
Des. Autom. Conf., 2005, pp. 285288.
[9] P. Royannez, 90 nm low-leakage SoC design techniques for wireless
applications, in Proc. Int. Solid-State Circuits Conf., 2005, pp. 138589.
[10] J. Lamoureux and S. Wilton, On the interaction between power-aware
FPGA CAD algorithms, in Proc. Int. Conf. CAD, 2003, pp. 701708.
[11] A. Gayasen et al., Reducing leakage energy in FPGAs using regionconstrained placement, in Proc. Int. Symp. FPGA, 2004, pp. 5158.
[12] J. Anderson and F. Najm, Power-aware technology mapping for LUTbased FPGAs, in Proc. FPT, 2002, pp. 211218.
[13] J. Anderson, F. Najm, and T. Tuan, Active leakage power optimization
for FPGAs, in Proc. Int. Symp. FPGAs, 2004, pp. 3341.
[14] V. George, H. Zhang, and J. Rabaey, The design of low energy FPGA,
in Proc. Int. Symp. Low Power Electron. Des., 1999, pp. 188193.
[15] A. Rahman, Evaluation of low leakage design techniques for field programmable gate arrays, in Proc. Int. Symp. FPGA, 2004, pp. 2330.
[16] B. Calhoun, F. Honore, and A. Chandrakasan, Design methodology
for fine-grained leakage control in MTCMOS, in Proc. ISLPED, 2003,
pp. 104109.
[17] J. Anderson and F. Najm, Low-power programmable routing circuitry for
FPGAs, in Proc. Custom Integr. Circuits Conf., 2004, pp. 602609.
[18] A. Rahman, S. Das, T. Tuan, and A. Rahut, Heterogeneous routing architecture for low power FPGA fabric, in Proc. CICC, 2005, pp. 183186.
[19] F. Li, Y. Lin, L. He, and J. Cong, Low-power FPGA using predefined Dual-Vdd/Dual-Vt fabrics, in Proc. Int. Symp. FPGA, 2004,
pp. 4250.
[20] , FPGA power reduction using configurable Dual-Vdd, in Proc.
Des. Autom. Conf., 2004, pp. 735740.
[21] F. Li, Y. Lin, and L. He, Vdd programmability to reduce FPGA interconnect power, in Proc. Int. Conf. CAD, 2004, pp. 760765.
[22] A. Rahman, S. Das, T. Tuan, and S. Trimberger, Determination of power
gating granularity for FPGA fabric, in Proc. CICC, 2006.
[23] Xilinx Inc., Spartan-3 FPGA Family Datasheet. [Online]. Available:
http://direct.xilinx.com/bvdocs/publications/ds099.pdf
[24] T. Tuan and B. Lai, Leakage power analysis of a 90 nm FPGA, in Proc.
Custom Integr. Circuits Conf., 2003, pp. 5760.
[25] V. Degalahal and T. Tuan, Methodology for high level estimation of
FPGA power consumption, in Proc. ASP-DAC, 2005, pp. 657660.
[26] Xilinx Inc., Virtex-4 FPGA Family Datasheet. [Online]. Available: http://
direct.xilinx.com/bvdocs/publications/ds302.pdf

You might also like