You are on page 1of 26

Reconfigurable Computing

ES ZG554 / MEL ZG 554


Session 6 Pawan Sharma
BITS Pilani ps@pilani.bits-pilani.ac.in
Pilani Campus 13/02/2021
Last Lecture

• Reconfigurable Computing Hardware (T1. Ch. 2)


• Overview of FPGA Architecture
• Example

2
BITS Pilani, Pilani Campus
Today’s Lecture

• FPGA Programming Technologies [T1. Chapter 2. 4.1]


• FPGA Structures [T1 4.3]
• FPGA granularity [T1 5] (self study)

3
BITS Pilani, Pilani Campus
FPGA Programming
Technologies
FPGA
Technology

Antifuse Memory
Based Based

SRAM EEPROM Flash

• The technology defines how the different blocks (logic blocks, interconnect, input/output) are
physically realized on silicon chip.
• antifuse technology is limited to the realization of interconnections,
• the memory-based paradigm is used for the computation as well as the interconnections.
FPGA Programming
Technologies
Anti-fuse
• An anti-fuse,--opposite of regular fuse
• Normally presenting a high- impedance state can be “fused” into a
low-impedance state when programmed by a high voltage.
• Force high programming current (about 5 mA) to blow the anti-fuse
• The anti-fuse used in each of FPGAs from different company differs in
construction
• Link is permanent
• ACT-FPGA -- Rows of logic modules are scattered amongst horizontal
routing channels containing predefined wiring segments of various
lengths and offsets.
• Other wiring segments run vertically through the modules and across
the channels.
• Each logic module computes a single output function of several
inputs. Each module input is connected to a dedicated vertical wiring
segment spanning either the channel above or below the module.
• Each output signal appears on a dedicated vertical wiring segment of
somewhat longer length.
• An antifuse is provided at each intersection of a horizontal and vertical
segment, permitting them to be connected.
• For example, the output of the driver, Module 3, is connected to the ACT FPGA
Module 2 input by programmed antifuses to horizontal segments
which, in tum, are connected to input segments.
• In the top channel, an antifuse is used to link two adjacent horizontal
segments end-to-end, making it possible to reach an input of Module
1.
FPGA Programming
Technologies
 Poly-diffusion Anti-fuse: ACTEL PLICE
 programmable low-impedance circuit element
 Poly-silicon terminal
 Oxide-Nitride-Oxide dielectric sandwiched between
n+ diffusion and polysilicon
 Melting the dielectric establish connection by
applying high voltage (16 V).
 High current density breaks down the dielectric and
forms a conducting link, approx. 20 nm in dia and
resistance value of 100-600 ohms
 Programming process drives dopant atoms from
poly and diffusion electrodes into link
 Final level of doping determines resistance value of
the link.
Metal Anti-fuse: Q-Logic ViaLink
 2 Metal terminal layers (Titanium-Tungsten)
 The dielectric approach used in PLICE replaced by metal-to-metal-based
anti-fuses.
 formed by sandwiching an insulating material such as amorphous silicon
or silicon oxide between two metal layers.
 Again, a high voltage breaks down the anti-fuse and causes the fuse to
conduct.
 the on resistance can be between 20 and 100 ohms and the fuse itself
requires no silicon area.
 used in recent FPGAs from Actel and QuickLogic .
 two advantages over a poly–diffusion antifuse.
 connections to a metal–metal antifuse are direct to metal which
are the wiring layers. Connections from a poly–diffusion antifuse
to the wiring layers require extra space and create additional
parasitic capacitance.
 the direct connection to the low-resistance metal layers makes it
easier to use larger programming currents to reduce the antifuse
resistance. 7
• Advantages:
– Low area. With metal-to-metal anti-fuses, no silicon area is required to make connections,
hence it helps in decreasing the area overhead of programmability
– Have lower on resistances and parasitic capacitances than other programming
technologies. The low area, resistance, and capacitance of the fuses means it is possible
to include more switches per device than is practical in other technologies
– Non-volatile -- meaning that the device works instantly once programmed. This lowers
system costs since additional memory for storing the programming information is not
required and it also allows the FPGA to be used in situations that require operation
immediately upon power up
– since programming, and hence transmitting the bitstream to the FPGA, need only be
done once, this can be done in a secure environment which improves the security of the
design on the FPGA. To further enable this security, current devices offer security modes
which disable accesses through the programming interface once the device is
programmed
Disadvantages:
• The size of an antifuse is limited by the resolution of the lithography equipment used to makes
ICs.
• To connect the antifuse to the metal layers requires contacts that take up more space than the
antifuse itself, reducing the advantage of the small antifuse size.
• Uses Non-standard CMOS process
• unwanted parasitic elements can add considerable RC interconnect delay if the number of
antifuses connected in series is not kept to an absolute minimum.
• the fundamental mechanism of programming, which involves significant changes to the
properties of the materials in the fuse, leads to scaling challenges when new IC fabrication
processes are considered. Indeed, there is some evidence that anti-fuses are no longer scaling
as the most advanced devices use 0.15μm technology which is many generations behind the
technology used for new standard CMOS devices.
• The inability to reprogram anti-fuses also makes them unsuitable for applications where
configuration changes are required. Unlike alternative technologies, in-system programming is
not possible with these devices. Instead, special programmers must be used to program a
device before it is mounted on a final product
• Finally, the one-time programmability of anti-fuses makes it impossible for manufacturing tests
to detect all possible faults. Some faults will only be uncovered after programming and,
therefore, the yield after programming will be less than the 100% yield of SRAM or floating-
gate devices.

9
FPGA Programming
Technologies
 SRAM (Memory based)
• the most widely used method for storing configuration information in commercially available FPGAs
• An SRAM is used to store all possible values of a function
• Value of a function for a given input is retrieved using the inputs as SRAM-Address
• SRAM implementing a function is called a look-up table (LUT)
• popular because it provides fast and infinite reconfiguration in a well-known technology. In these devices, static memory cells, are
distributed throughout the FPGA to provide configurability
• There are two primary uses for the SRAM cells.
• Most are used to set the select lines to multiplexers that steer interconnect signals.
• The majority of the remaining SRAM cells are used to store the data in the lookup-tables (LUTs) that are typically used in SRAM-
based FPGAs to implement logic functions. Figures (b) and (c) illustrate these two different approaches.
• The SRAM cell can also be used to control wire interconnection using a pass transistor, either NMOS as shown in figure (a) or PMOS
transistor can also be used
FPGA Programming
Technologies
• Two primary advantages: re-programmability and the use of standard CMOS
process technology.
• From a practical standpoint, an SRAM cell can be programmed an indefinite
number of times.
• Dedicated circuitry on the FPGA initializes all the SRAM bits on power up and
configures the bits with a user-supplied configuration.
• Unlike other programming technologies, the use of SRAM cells requires no
special integrated circuit processing steps beyond standard CMOS. As a result,
SRAM-based FPGAs can use the latest CMOS technology available and,
therefore, benefit from the increased integration, the higher speeds and the
lower dynamic power consumption of new processes with smaller minimum
geometries.
• A new function is implemented by writing new values into the LUT.
• SRAM-based FPGA can therefore be reprogrammed (configured) on the fly.
Since a LUT is volatile, a LUT configuration is lost when switching off the
system.
• Chip area required by the SRAM is relatively large.
A Xilinx SRAM Cell
• dissipates significant static power because of leakage current
• SRAM does not maintain its contents without power, which means that at
power-up the FPGA is not configured and must be programmed using off-chip
logic and storage.
• This can be accomplished with a nonvolatile memory store to hold the
configuration and a micro-controller to perform the programming procedure.
While this may seem to be a trivial task, it adds to the component count and
complexity of a design and prevents the SRAM-based FPGA from being a truly
single-chip solution.
FPGA Programming
Technologies
EPROM
• Altera MAX 5000 EPLDs and Xilinx EPLDs both use UV-erasable electrically programmable
read-only memory ( EPROM ) cells as their programming technology
• EPROM devices are based on a floating gate.
• If the floating gate is not charged (i.e. neutral), then the device operates almost like a normal
MOSFET
• If however the floating gate is negatively charged, Vt of the transistor is increased, causing
the transistor to turn off at nominal voltages.
• Programming (putting electrons into the floating gate) means writing a 0, erasing (removing
the charge from the floating gate) means resetting the flash memory contents to 1; or in
other words: a programmed cell stores a logic 0, an erased (a.k.a. flashed) cell stores a logic
1.
• The device can be permanently programmed by applying a high voltage (10–21 V) at the
control gate and comparatively small voltage at the drain of the transistor (12 V).
• This causes the floating gate to be permanently negatively charged as electrons tunnel
through the transistor channel via lower gate oxide and make floating gate negatively
charged, a tunneling process called as Fowler Nordheim tunneling.
• Another technique is using channel hot carrier injection that uses a high current in the
channel to give electrons sufficient energy to sort of "boil" out of the channel and break
through the tunnel oxide layer changing the threshold voltage of the floating gate.
• This negative potential on the floating gate compensates the voltage on the control gate and
keeps the transistor closed.
• When electrons are present on the floating gate, current can't flow through the transistor
and the bit state is 0. This is the normal state for a floating gate transistor, when a bit is
programmed.
• When electrons are removed from the floating gate (on exposure to UV light), current is
allowed to flow and the bit state is 1
EPROM Cell

12
Flash EEPROM
• In the flash-EEPROMs that are used as logic tile cell in the Actel ProASIC chips , two transistors share the
floating gate, which store the programming information.
• The smaller programming/sensing transistor is used for programming the floating gate (injecting charge
that remains even while the power is off) while the larger switching transistor serves as the
programmable switch.
• Switching transistor also helps when erasing the device.
• The sensing transistor is only used for writing and verification of the floating gate voltage whereas the
other is used as switch. This can be used to connect or disconnect routing nets to or from the
configured logic.
• This flash-based programming technology offers several unique advantages, most importantly non-
volatility.
• This feature eliminates the need for the external resources required to store and load configuration data
Additionally, a flash-based device can function immediately upon power-up instead of having to wait for
the loading of configuration data.
• The flash approach is also more area efficient than SRAM-based technology which requires up to six
transistors to implement the programmable storage.
• The drawback is that the programming circuitry, such as the high and low voltage buffers which are
needed to program the cell, contributes an area overhead which is not present in SRAM-based devices.
However, this cost is relatively modest as it is amortized across numerous programmable elements.
• In comparison to anti-fuses, flash-based FPGAs are reconfigurable and can be programmed without
being removed from a printed circuit board.
• The use of a floating-gate to control the switching transistor adds design complexity because care must
be taken to ensure the source–drain voltage remains sufficiently low else it would cause charge
injection into the floating gate.
• One other disadvantage of flash-based devices is that they cannot be reprogrammed an infinite number
of times. Charge buildup in the oxide eventually prevents a flash-based device from being properly
erased and programmed .
• For example, devices such as the Actel ProASIC3 are only rated for 500 programming cycles. For most
uses of FPGAs, this programming count is more than sufficient.
13
FPGA Structures

• FPGAs consist of a set of programmable logic cells


placed on the device such as to build an array of
computing resources.
• The resulting structure is vendor-dependent.
• According to the arrangement of logic blocks and the
interconnection paradigm of the logic blocks on the
device, FPGAs can be classified in four categories:
– symmetrical array,
– row-based,
– hierarchy-based
– sea of gates
14
Symmetrical Array
• Logic blocks organized in
matrix like structure
• Coarse granularity of
logic blocks
• typically contain 8 x 8
arrays in the smaller
chips and 100 x 100 or
larger arrays in the
bigger chips
• 2 D array of processing
elements (PE)
embedded in an
interconnection network
• Interconnection points at
the horizontal-vertical
intersection
• Most Xilinx devices

15
Row based
• consists of alternating rows of
logic blocks and channels
• Horizontal routing via
horizontal channels for non-
dedicated signal nets
• Channels divided in segments
(L-2, L-4, L-8….)
• Vertical connections via
dedicated vertical tracks
• Dedicated vertical routing
tracks are used for the global
clock networks and for power
and ground tracks
Sea of gates
• Is not an array of logic blocks embedded in a
general routing structure,
• But architecture consists of fine-grain logic
blocks covering the entire floor of the device
• Connectivity is realized using dedicated
neighbor to neighbor routes and over gates
as many metal layers were available
• Actel ProoASIC FPGA family
• device uses a four level of hierarchy routing
resource to connect the logic tiles: the local
resources, the long-line resources, the very
long-line resources and the global networks
• The local resources allow the output of the tile
to be connected to the inputs of one of the
eight surrounding tiles
• The long-line resources provide routing for
longer distances and higher fanout
connections
• very long lines span the entire device. They
are used to route very long or very high fanout
nets. 17
Island style (mesh based)

• CLBs look like islands in a sea of routing


interconnect
• arranged on a 2D grid and are interconnected by
a programmable routing network.
• (I/O) blocks on the periphery of FPGA chip are
also connected to the programmable routing
network
• programmable switches are organized in
horizontal and vertical routing channels.
• The routing network of an FPGA occupies 80–
90% of total area, whereas the logic area
occupies only 10–20% area.
• The flexibility of an FPGA is mainly dependent on
its programmable routing network.
• A mesh-based FPGA routing network consists of
horizontal and vertical routing tracks which are
interconnected through switch boxes (SB).
• Logic blocks are connected to the routing network
through connection boxes (CB

18
Hierarchical FPGA
• FPGA designers also started facing challenges like long and unpredictable route times, too
many design iterations, and difficulty achieving timing closure, like ASIC designers
• Started adopting ASIC-style hierarchical design methodologies coupled with advanced
analysis capabilities applied early in the design cycle enable designers to catch potential
physical implementation problems prior to place and route.
• Most logic designs exhibit locality of connections, which implies a hierarchy in the placement
and routing of connections between logic blocks
• Hierarchical routing architectures exploit this locality by dividing FPGA logic blocks into
separate groups/clusters with minimum interconnections between them
• These clusters are recursively connected to form a hierarchical structure.
• Saves on time and effort to optimize multiple parameters
• Design locking possible with advancement in EDA tools.
• Altera’s Stratix II, Flex10K, Apex Architecture
• For instance, in Altera APEX20 and APEX II FPGAs, 10
or so logic elements are connected to form what
Altera calls a Logic Array Block (LAB), and then
several LABs are connected to form a MEGALAB.
Thus, there is a hierarchy in the organization of these
FPGAs.
• These FPGAs contain clusters of logic blocks with
localized resources for interconnection.
• The global interconnect network is used for the
interconnections between the clusters of logic
blocks.
• Elements with the lowest granularity are at the
lowest level hierarchy. They are grouped to form the
elements of the next level. Each element of a level i
consists of a given number of elements from level
i−1

20
Hybrid FPGA
• Manufacturers have been trying to put more pre-designed
and well-tested hard macros in their chips.
• Many modern FPGAs contain an entire processor core.
• This is extremely useful when designers use hybrid
solutions, where part of a system is in a programmable
processor but part of the system is implemented in
hardware.
• Resources, such as memory, that are used in almost all
designs can be directly found on the chip
• allows the designer to use well-tested and efficient
modules.
• hard macros are more efficiently implemented and are
faster than macro implemented on the universal function
generators.
• The resources often available on hybrid FPGAs are
RAMs, clock managers, arithmetic modules, network
interface modules and processors.
• The Xililnx Virtex II Pro contains up to four embedded
IBM Power PC 405 RISC hard core processors, which
can be clocked at more than 300 MHZ

21
Case Study – I
Xilinx XC 4000
• Third Generation Field-Programmable Gate Arrays
– Abundant flip-flops
– Flexible function generators
– On-chip ultra-fast RAM –
– Dedicated high-speed carry-propagation circuit
– Wide edge decoders
– Hierarchy of interconnect lines
– Internal 3-state bus capability
– Eight global low-skew clock or signal distribution network •
• Flexible Array Architecture
– Programmable logic blocks and I/O blocks –
– Programmable interconnects and wide decoders
• 0.35 micron CMOS Process – High-speed logic and Interconnect – Low
power consumption

BITS Pilani, Pilani Campus


• Max RAM bits: each CLB can be configured as a small SRAM storing up to 32 bits. 2048 bits (64*32)
• Max gates: each CLB can perform the same function as about 25 gates in a discrete design
(1600/64).
• The XC4003E is even better, at 30 gates (3000/100) per CLB.
• The last column of the table must certainly have been written by marketing folks, since the high end
of each “typical” gate range is higher than the maximum in the column before it!
• This column assumes that 20–30% of the CLBs are used for memory rather than logic, at 32 bits of
SRAM per CLB.
• The minimum gate-level implementation of an SRAM cell is a D latch using four gates , so we can
count each CLB as 128 gates (32*4) when it’s used as SRAM. The number in the last column,
therefore, is the number of gates used for logic plus the number of equivalent gates used for SRAM.

BITS Pilani, Pilani Campus


Configuration Memory
• The static memory cell used for the configuration memory has
been designed specifically for high reliability and noise immunity
as the bitstream that is used to configure the device is stored in
configuration memory, any change in stream may corrupt the
whole application, so reliability is essential.
• Basic memory cell consists of two CMOS inverters plus a pass
transistor used for writing and reading cell data.
• The cell is only written during configuration and only read
during readback.
• During normal operation, the cell provides continuous control
and the pass transistor is off and does not affect cell stability.
• This is quite different from the operation of conventional
memory devices, in which the cells are frequently read and
rewritten.
• absence of address decoding and sense amplifiers provide high
stability to the cell.
• not affected by extreme power-supply excursions or very high
levels of alpha particle radiation.

BITS Pilani, Pilani Campus


I/O Block

BITS Pilani, Pilani Campus


Two 4-Input Functions, XC4000 CLB

BITS Pilani, Pilani Campus

You might also like