You are on page 1of 45

FPGA

(Field Programmable Gate
Array)

Presenter
Abu Shohel Ahmed
Md. Kamrul Abedin Tarafder
Debashis roy
History
Programmable Read Only Memory (PROM)
address line as input
data line as output
Problem:
don’t require all the logic combination in input.
Programmable Logic Array (PLA)
- Programmable AND plane followed by
programmable or wired OR plane.
- Sum of product form
- Two level programming adds delay (problem)
Next -
PAL ( Programmable array logic)
- Programmable AND plane and fixed OR
plane.
- All these PLA and PAL are Simple
programmable logic devices.
- Logic plane structure grows rapidly with
number of inputs( problem)
Next
To mitigate the problem
Complex programmable logic devices
(CPLD)
-programmably interconnect multiple
SPLDs.
- Extending to higher density difficult
(problem)
- Less flexibility (problem)
Comparison
What is FPGA?

A field programmable gate array
(FPGA) is a semiconductor device
containing programmable logic
components and programmable
interconnects and programmable I/O
blocks.
FPGA
Logic Blocks
Purpose: implement combinational and
sequential logic functions.
Logic blocks consists of
- Transistor pairs
 Basic small gates such as two-input NAND’s or
exclusive-OR’ s.
 Multiplexers
 Look up tables( LUT)
 Wide fanin AND-OR structure.
Logic Block Architecture
Granularity:
The number of boolean function a logic block can
implement, the number of gates, transistors,
total normalized area, number of inputs and
outputs.
According to granularity Two types of Blocks
d. Fine Grain Logic Blocks
e. Coarse Grain Logic Blocks
Fine Grain
 The Cross Point
FPGA
1. Transistors are
interconnected.
2. Logic block is
implemented
using transistor
pair tiles.
Fine Grain
Advantage:
1. Blocks are fully utilized.
Disadvantage:
Require large numbers of wire segments
and programmable switches. So it is
costly in delay and area size.
Coarse Grain Logic Blocks
 Many types exists according to
implementations
-Multiplexer Based and Look-up-Table
Based are most common
The Xilinx Logic Block:
A SRAM function as a LUT.
Address line of SRAM as input
Output of SRAM gives the logic output
Xilinx Logic Block
Advantage: High
functionality
any function of k
inputs
Dis Adv: unacceptably
large
Switch Box
 Whenever a vertical and
a horizontal channel
intersect there is a
switch box.
 In this architecture,
when a wire enters a
switch box, there are
three programmable
switches that allow it to
connect to three other
wires in adjacent
channel segments.
Programming technologies
Used in switches
b. SRAM programming
technology
Use Static RAM cells to
control pass gates or
multiplexers.
1= closed switch
connection
0= open
For mux, SRAM
determines the mux
input selection
process.
SRAM
Disadvantage
SRAM volatile
Requires large area
Advantage
Fast re-programmability
Standard integrated circuit process Tech.
Programming Tech
 Antifuse
2 terminal device with an un programmed
state present very high resistance.
By applying high voltage create a low
resistance link.
Adv
5. Small size
6. Low series resistance.
Programming Tech
 Floating gate
programming Tech
Same as electrically
erasable process in
EPROM
Switch is disable by
injecting charge on the
gate 2 using high
voltage between gate1
and drain.
The charge is removed by
UV light
summary
Effects of Granularity on FPGA
Density and Performance
Tradeoff
 Granularity increase -> Blocks less

 More Functional Blocks-> more area

Area is normally measured by total number
of bits needed to implement the design.
So look the example
Example
Experimental Results
A 4-input 1-output lookup table yields the
minimum total area
 Best k is determined by ratio of memory
bit area to the fixed overhead area.
Routing Architectures

The way
programmable
switches and
wiring segments
are positioned for
interconnections.
Core Elements:
c. Wire segment
d. Track
e. Routing Channel
f. Connection Block
g. Switch Block
Why better ?
-FPGA programmed using electrically
programmable switches
-Routing architectures are complex.
-Logic is implemented using multiple levels
of lower fanin gates.
-Shorter time to market
-Ability to re-program in the field to fix bugs
-Lower non-recurring engineering costs
FPGA Disadvantage
 FPGAs are generally slower than their
application-specific integrated circuit
(ASIC)
 Can't handle as complex a design, and
draw more power.
FPGA Design and Programming
To define the behavior of the FPGA the user provides a
hardware description language (HDL) or a schematic
design.
Then, using an electronic design automation tool, a
technology-mapped net list is generated.
The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route.
The user will validate the map, place and route results via
timing analysis, simulation, and other verification
methodologies.
Once the design and validation process is complete, the
binary file generated used to configure the FPGA.
Application
1. Reconfigurable computing.
2. Applications of FPGAs include DSP,
software-defined radio.
3. The inherent parallelism of the logic
resources on the FPGA allows for
considerable compute throughput.
FPGA Optimization
 DAG map: Graph based FPGA mapping for delay optimization
 DAG-Map reduces both the network depth and the number of
lookup-tables.

Problem Formulation:
 A Boolean network can be represented as a directed acyclic
graph (DAG) where each node represents a logic gate and
there is a directed edge (i, j) if the output of gate i is an input of
gate j.
 A primary input (PI) node has no incoming edge and a primary
output (PO) node has no outgoing edge.
 We use input (v ) to denote the set of nodes which supply
inputs to gate
DAG-Map Algorithm
 The DAG-Map algorithm consists of three
major steps.
- The first step transforms an arbitrary
boolean network into a two-input network.
- The second step maps the two-input
network into a K-LUT FPGA network with
minimum delay.
- The third step performs a postprocessing
area optimization of the FPGA network without
increasing the network delay.
Fi rst S tep: Trans for ming
Ar bitr ary Netw orks into T wo-
Input Ne twor ks

algorithm decompose-multi-input-gate (DMIG)
let V = input (v) = {u1 , u2 , ..., um};
while |V|> 2 do
let ui and uj be the two nodes of V with smallest levels;
introduce a new node x;
input (x) = {ui , uj};
level(x) = max(level(ui ), level(uj)) + 1;
V = (V - {ui , uj}) υ {x}
end-while;
Connect the only two nodes left in V to v as its inputs;
Return the binary tree T (v) rooted at v;
end-algorithm.
Tran sformin g Ar bitr ary
Ne twor ks in to T wo-Inp ut
Ne twor ks
2 nd S tep: T echno logy Mapping
for Dela y Min imiz ation

 Our algorithm consists of two steps.
- We first label the network to
determine the level of each node in the
final mapping solution.
- We then generate the logically
equivalent network of K-LUTs.
Continue…
 The first step assigns a label h (v) to each
node v of the two-input network, with h (v)
equal to the level of the K-LUT containing v in
the final mapping solution.
 Clearly, we want h (v) to be as small as
possible in order to achieve delay
minimization.
 We label the nodes in a topological order
starting from the PI nodes.
 The label of each PI node is zero.
Continue..
 Ifnode v is not a PI node, let p be the
maximum label of the nodes in input (v).
 We use Np(v) to denote the set of
predecessors of v with label p.
 Then,

if input (Np(v) υ {v}) ≤ K,
we assign h (v) = p;
otherwise, we assign h (v) = p + 1.
Continue…
 The second step generates K-LUTs in the mapping
solution.
 Let L represent the set of outputs which are to be
implemented using K-LUTs.

 Initially, L contains all the PO nodes. We process the
nodes in L one by one.

 For each node v in L, we remove v from L and
generate a K-LUT v’ to implement the function of
gate v such that
input (v’) = input (Nh(v) (v)).
Continue…

 Then, we update the set L to be
L υ input (v’ ).
 The second step ends when L consists of
only PI nodes in the original network.
 Clearly, we obtain a network of K-LUTs that
is logically equivalent to the original network.
DAG-Map Algorithm
Applying DAG-Map Algorithm
Mapping by DAG-Map (3 levels)
3rd Step: Area Optimization
Without Increasing Delay

 Two operations are used:
- Gate Decomposition
- Predecessor Packing
Gate Decomposition
 Ifnode v is a simple gate of multiple
inputs in the mapping solution, for any
two of its inputs ui and uj , if ui and uj are
single fanout nodes, we can decompose
v into two nodes Vij and v’ such that v’ is
of the same type as v and vij is of the
same type as v in non-negated form, and
input (vij) = {ui , uj} and input (v’ ) = input
(v) U {vij} - {ui , uj}
Example of Gate decomposition
Predecessor Packing
 For each node v, we examine all of its input
nodes.
 If | input (v) U input (ui) | ≤ K for some input
node ui , and ui has only a single fanout, then
v and ui are merged into a single K-LUT. In
this case we also say that node ui and v are
mergeable, and call v the base of the merge.
This operation reduces the number of K-LUTs
by one.
Example of Predecessor
Packing
.

THANKS