Professional Documents
Culture Documents
2
BITS Pilani, Pilani Campus
Today’s Lecture
3
BITS Pilani, Pilani Campus
FPGA Programming
Technologies
FPGA
Technology
Antifuse Memory
Based Based
• The technology defines how the different blocks (logic blocks, interconnect, input/output) are
physically realized on silicon chip.
• antifuse technology is limited to the realization of interconnections,
• the memory-based paradigm is used for the computation as well as the interconnections.
FPGA Programming
Technologies
Anti-fuse
• An anti-fuse,--opposite of regular fuse
• Normally presenting a high- impedance state can be “fused” into a
low-impedance state when programmed by a high voltage.
• Force high programming current (about 5 mA) to blow the anti-fuse
• The anti-fuse used in each of FPGAs from different company differs in
construction
• Link is permanent
• ACT-FPGA -- Rows of logic modules are scattered amongst horizontal
routing channels containing predefined wiring segments of various
lengths and offsets.
• Other wiring segments run vertically through the modules and across
the channels.
• Each logic module computes a single output function of several
inputs. Each module input is connected to a dedicated vertical wiring
segment spanning either the channel above or below the module.
• Each output signal appears on a dedicated vertical wiring segment of
somewhat longer length.
• An antifuse is provided at each intersection of a horizontal and vertical
segment, permitting them to be connected.
• For example, the output of the driver, Module 3, is connected to the ACT FPGA
Module 2 input by programmed antifuses to horizontal segments
which, in tum, are connected to input segments.
• In the top channel, an antifuse is used to link two adjacent horizontal
segments end-to-end, making it possible to reach an input of Module
1.
FPGA Programming
Technologies
Poly-diffusion Anti-fuse: ACTEL PLICE
programmable low-impedance circuit element
Poly-silicon terminal
Oxide-Nitride-Oxide dielectric sandwiched between
n+ diffusion and polysilicon
Melting the dielectric establish connection by
applying high voltage (16 V).
High current density breaks down the dielectric and
forms a conducting link, approx. 20 nm in dia and
resistance value of 100-600 ohms
Programming process drives dopant atoms from
poly and diffusion electrodes into link
Final level of doping determines resistance value of
the link.
Metal Anti-fuse: Q-Logic ViaLink
2 Metal terminal layers (Titanium-Tungsten)
The dielectric approach used in PLICE replaced by metal-to-metal-based
anti-fuses.
formed by sandwiching an insulating material such as amorphous silicon
or silicon oxide between two metal layers.
Again, a high voltage breaks down the anti-fuse and causes the fuse to
conduct.
the on resistance can be between 20 and 100 ohms and the fuse itself
requires no silicon area.
used in recent FPGAs from Actel and QuickLogic .
two advantages over a poly–diffusion antifuse.
connections to a metal–metal antifuse are direct to metal which
are the wiring layers. Connections from a poly–diffusion antifuse
to the wiring layers require extra space and create additional
parasitic capacitance.
the direct connection to the low-resistance metal layers makes it
easier to use larger programming currents to reduce the antifuse
resistance. 7
• Advantages:
– Low area. With metal-to-metal anti-fuses, no silicon area is required to make connections,
hence it helps in decreasing the area overhead of programmability
– Have lower on resistances and parasitic capacitances than other programming
technologies. The low area, resistance, and capacitance of the fuses means it is possible
to include more switches per device than is practical in other technologies
– Non-volatile -- meaning that the device works instantly once programmed. This lowers
system costs since additional memory for storing the programming information is not
required and it also allows the FPGA to be used in situations that require operation
immediately upon power up
– since programming, and hence transmitting the bitstream to the FPGA, need only be
done once, this can be done in a secure environment which improves the security of the
design on the FPGA. To further enable this security, current devices offer security modes
which disable accesses through the programming interface once the device is
programmed
Disadvantages:
• The size of an antifuse is limited by the resolution of the lithography equipment used to makes
ICs.
• To connect the antifuse to the metal layers requires contacts that take up more space than the
antifuse itself, reducing the advantage of the small antifuse size.
• Uses Non-standard CMOS process
• unwanted parasitic elements can add considerable RC interconnect delay if the number of
antifuses connected in series is not kept to an absolute minimum.
• the fundamental mechanism of programming, which involves significant changes to the
properties of the materials in the fuse, leads to scaling challenges when new IC fabrication
processes are considered. Indeed, there is some evidence that anti-fuses are no longer scaling
as the most advanced devices use 0.15μm technology which is many generations behind the
technology used for new standard CMOS devices.
• The inability to reprogram anti-fuses also makes them unsuitable for applications where
configuration changes are required. Unlike alternative technologies, in-system programming is
not possible with these devices. Instead, special programmers must be used to program a
device before it is mounted on a final product
• Finally, the one-time programmability of anti-fuses makes it impossible for manufacturing tests
to detect all possible faults. Some faults will only be uncovered after programming and,
therefore, the yield after programming will be less than the 100% yield of SRAM or floating-
gate devices.
9
FPGA Programming
Technologies
SRAM (Memory based)
• the most widely used method for storing configuration information in commercially available FPGAs
• An SRAM is used to store all possible values of a function
• Value of a function for a given input is retrieved using the inputs as SRAM-Address
• SRAM implementing a function is called a look-up table (LUT)
• popular because it provides fast and infinite reconfiguration in a well-known technology. In these devices, static memory cells, are
distributed throughout the FPGA to provide configurability
• There are two primary uses for the SRAM cells.
• Most are used to set the select lines to multiplexers that steer interconnect signals.
• The majority of the remaining SRAM cells are used to store the data in the lookup-tables (LUTs) that are typically used in SRAM-
based FPGAs to implement logic functions. Figures (b) and (c) illustrate these two different approaches.
• The SRAM cell can also be used to control wire interconnection using a pass transistor, either NMOS as shown in figure (a) or PMOS
transistor can also be used
FPGA Programming
Technologies
• Two primary advantages: re-programmability and the use of standard CMOS
process technology.
• From a practical standpoint, an SRAM cell can be programmed an indefinite
number of times.
• Dedicated circuitry on the FPGA initializes all the SRAM bits on power up and
configures the bits with a user-supplied configuration.
• Unlike other programming technologies, the use of SRAM cells requires no
special integrated circuit processing steps beyond standard CMOS. As a result,
SRAM-based FPGAs can use the latest CMOS technology available and,
therefore, benefit from the increased integration, the higher speeds and the
lower dynamic power consumption of new processes with smaller minimum
geometries.
• A new function is implemented by writing new values into the LUT.
• SRAM-based FPGA can therefore be reprogrammed (configured) on the fly.
Since a LUT is volatile, a LUT configuration is lost when switching off the
system.
• Chip area required by the SRAM is relatively large.
A Xilinx SRAM Cell
• dissipates significant static power because of leakage current
• SRAM does not maintain its contents without power, which means that at
power-up the FPGA is not configured and must be programmed using off-chip
logic and storage.
• This can be accomplished with a nonvolatile memory store to hold the
configuration and a micro-controller to perform the programming procedure.
While this may seem to be a trivial task, it adds to the component count and
complexity of a design and prevents the SRAM-based FPGA from being a truly
single-chip solution.
FPGA Programming
Technologies
EPROM
• Altera MAX 5000 EPLDs and Xilinx EPLDs both use UV-erasable electrically programmable
read-only memory ( EPROM ) cells as their programming technology
• EPROM devices are based on a floating gate.
• If the floating gate is not charged (i.e. neutral), then the device operates almost like a normal
MOSFET
• If however the floating gate is negatively charged, Vt of the transistor is increased, causing
the transistor to turn off at nominal voltages.
• Programming (putting electrons into the floating gate) means writing a 0, erasing (removing
the charge from the floating gate) means resetting the flash memory contents to 1; or in
other words: a programmed cell stores a logic 0, an erased (a.k.a. flashed) cell stores a logic
1.
• The device can be permanently programmed by applying a high voltage (10–21 V) at the
control gate and comparatively small voltage at the drain of the transistor (12 V).
• This causes the floating gate to be permanently negatively charged as electrons tunnel
through the transistor channel via lower gate oxide and make floating gate negatively
charged, a tunneling process called as Fowler Nordheim tunneling.
• Another technique is using channel hot carrier injection that uses a high current in the
channel to give electrons sufficient energy to sort of "boil" out of the channel and break
through the tunnel oxide layer changing the threshold voltage of the floating gate.
• This negative potential on the floating gate compensates the voltage on the control gate and
keeps the transistor closed.
• When electrons are present on the floating gate, current can't flow through the transistor
and the bit state is 0. This is the normal state for a floating gate transistor, when a bit is
programmed.
• When electrons are removed from the floating gate (on exposure to UV light), current is
allowed to flow and the bit state is 1
EPROM Cell
12
Flash EEPROM
• In the flash-EEPROMs that are used as logic tile cell in the Actel ProASIC chips , two transistors share the
floating gate, which store the programming information.
• The smaller programming/sensing transistor is used for programming the floating gate (injecting charge
that remains even while the power is off) while the larger switching transistor serves as the
programmable switch.
• Switching transistor also helps when erasing the device.
• The sensing transistor is only used for writing and verification of the floating gate voltage whereas the
other is used as switch. This can be used to connect or disconnect routing nets to or from the
configured logic.
• This flash-based programming technology offers several unique advantages, most importantly non-
volatility.
• This feature eliminates the need for the external resources required to store and load configuration data
Additionally, a flash-based device can function immediately upon power-up instead of having to wait for
the loading of configuration data.
• The flash approach is also more area efficient than SRAM-based technology which requires up to six
transistors to implement the programmable storage.
• The drawback is that the programming circuitry, such as the high and low voltage buffers which are
needed to program the cell, contributes an area overhead which is not present in SRAM-based devices.
However, this cost is relatively modest as it is amortized across numerous programmable elements.
• In comparison to anti-fuses, flash-based FPGAs are reconfigurable and can be programmed without
being removed from a printed circuit board.
• The use of a floating-gate to control the switching transistor adds design complexity because care must
be taken to ensure the source–drain voltage remains sufficiently low else it would cause charge
injection into the floating gate.
• One other disadvantage of flash-based devices is that they cannot be reprogrammed an infinite number
of times. Charge buildup in the oxide eventually prevents a flash-based device from being properly
erased and programmed .
• For example, devices such as the Actel ProASIC3 are only rated for 500 programming cycles. For most
uses of FPGAs, this programming count is more than sufficient.
13
FPGA Structures
15
Row based
• consists of alternating rows of
logic blocks and channels
• Horizontal routing via
horizontal channels for non-
dedicated signal nets
• Channels divided in segments
(L-2, L-4, L-8….)
• Vertical connections via
dedicated vertical tracks
• Dedicated vertical routing
tracks are used for the global
clock networks and for power
and ground tracks
Sea of gates
• Is not an array of logic blocks embedded in a
general routing structure,
• But architecture consists of fine-grain logic
blocks covering the entire floor of the device
• Connectivity is realized using dedicated
neighbor to neighbor routes and over gates
as many metal layers were available
• Actel ProoASIC FPGA family
• device uses a four level of hierarchy routing
resource to connect the logic tiles: the local
resources, the long-line resources, the very
long-line resources and the global networks
• The local resources allow the output of the tile
to be connected to the inputs of one of the
eight surrounding tiles
• The long-line resources provide routing for
longer distances and higher fanout
connections
• very long lines span the entire device. They
are used to route very long or very high fanout
nets. 17
Island style (mesh based)
18
Hierarchical FPGA
• FPGA designers also started facing challenges like long and unpredictable route times, too
many design iterations, and difficulty achieving timing closure, like ASIC designers
• Started adopting ASIC-style hierarchical design methodologies coupled with advanced
analysis capabilities applied early in the design cycle enable designers to catch potential
physical implementation problems prior to place and route.
• Most logic designs exhibit locality of connections, which implies a hierarchy in the placement
and routing of connections between logic blocks
• Hierarchical routing architectures exploit this locality by dividing FPGA logic blocks into
separate groups/clusters with minimum interconnections between them
• These clusters are recursively connected to form a hierarchical structure.
• Saves on time and effort to optimize multiple parameters
• Design locking possible with advancement in EDA tools.
• Altera’s Stratix II, Flex10K, Apex Architecture
• For instance, in Altera APEX20 and APEX II FPGAs, 10
or so logic elements are connected to form what
Altera calls a Logic Array Block (LAB), and then
several LABs are connected to form a MEGALAB.
Thus, there is a hierarchy in the organization of these
FPGAs.
• These FPGAs contain clusters of logic blocks with
localized resources for interconnection.
• The global interconnect network is used for the
interconnections between the clusters of logic
blocks.
• Elements with the lowest granularity are at the
lowest level hierarchy. They are grouped to form the
elements of the next level. Each element of a level i
consists of a given number of elements from level
i−1
20
Hybrid FPGA
• Manufacturers have been trying to put more pre-designed
and well-tested hard macros in their chips.
• Many modern FPGAs contain an entire processor core.
• This is extremely useful when designers use hybrid
solutions, where part of a system is in a programmable
processor but part of the system is implemented in
hardware.
• Resources, such as memory, that are used in almost all
designs can be directly found on the chip
• allows the designer to use well-tested and efficient
modules.
• hard macros are more efficiently implemented and are
faster than macro implemented on the universal function
generators.
• The resources often available on hybrid FPGAs are
RAMs, clock managers, arithmetic modules, network
interface modules and processors.
• The Xililnx Virtex II Pro contains up to four embedded
IBM Power PC 405 RISC hard core processors, which
can be clocked at more than 300 MHZ
21
Case Study – I
Xilinx XC 4000
• Third Generation Field-Programmable Gate Arrays
– Abundant flip-flops
– Flexible function generators
– On-chip ultra-fast RAM –
– Dedicated high-speed carry-propagation circuit
– Wide edge decoders
– Hierarchy of interconnect lines
– Internal 3-state bus capability
– Eight global low-skew clock or signal distribution network •
• Flexible Array Architecture
– Programmable logic blocks and I/O blocks –
– Programmable interconnects and wide decoders
• 0.35 micron CMOS Process – High-speed logic and Interconnect – Low
power consumption