ES - MEL - AEL ZG554 - Lec4

Reconfigurable Computing
AEL ZG554 / ES ZG 554 / MEL ZG 554

Session 4
Pawan Sharma
BITS Pilani ps@pilani.bits-pilani.ac.in
Pilani Campus 13-08-2022
Last Lecture
Complex Programmable Logic Devices
BITS Pilani, Pilani Campus

Today’s Lecture
FPGA Basics

Field‐Programmable Gate Arrays
Xilinx FPGAs are based on Configurable Logic Blocks (CLBs). More generally called
Programmable logic cells

Programmable Logic Cells
All FPGAs contain a basic programmable logic cell replicated in a

regular array across the chip
– configurable logic block, logic element, logic module, logic
unit, logic array block, …
– many other names
There are three different types of basic logic cells:
– multiplexer based
– look‐up table based
– programmable array logic (PAL‐like)

Logic Cell Considerations
How are functions implemented?
– fixed functions (manipulate inputs only)
– programmable functionality (interconnect components)
Coarse‐grained logic cells:
– support complex functions, need fewer blocks, but they are bigger
so less of them on chip
Fine‐grained logic cells:
– support simple functions, need more blocks, but they are smaller
so more of them on chip

Logic Cell Considerations
When designing (or selecting) the type of logic cell for an

FPGA, some basic questions are important:
How many inputs? How many functions?

• all functions of n inputs or eliminate some combinations?
• what inputs go to what parts of the logic cell?
Any specialized logic?

adder, etc.
What register features?
BITS Pilani, Pilani

Camp s
FPGA Architectures
How can we implement any circuit in an FPGA?

– First, focus on combinational logic
– Example: Half adder
• Combinational logic represented by truth table
• What kind of hardware can implement a truth table?

Look‐up‐tables (LUTs)
Implement truth table in small memories (LUTs)

– Usually SRAM
Logic inputs connect to

address inputs, logic
output is memory output

Alternatively, could have used a 2‐input, 2‐output LUT
– Outputs commonly use same inputs

Full adder
– Combinational logic can be implemented in a LUT with same number of
inputs and outputs
• 3‐input, 2‐ouput LUT

Why aren’t FPGAs just a big LUT?
– Size of truth table grows exponentially based on # of inputs
• 3 inputs = 8 rows, 4 inputs = 16 rows, 5 inputs = 32 rows, etc.
– Same number of rows in truth table and LUT
– LUTs grow exponentially based on # of inputs
Number of SRAM bits in a LUT = 2i * o
– i = # of inputs, o = # of outputs
– Example: 64 input combinational logic with 1 output would require 264
SRAM bits
• 1.84 x 1019
Clearly, not feasible to use large LUTs
– So, how do FPGAs implement logic with many inputs?

Fortunately, we can map circuits onto multiple LUTs
– Divide circuit into smaller circuits that fit in LUTs (same # of inputs and
outputs)
– Example: 3‐input, 2‐output LUTs

What if circuit doesn’t map perfectly?
– More inputs in LUT than in circuit
• Truth table handles this problem

• Unused inputs are ignored
– More outputs in LUT than in circuit
• Extra outputs simply not used

– Space is wasted, so should use multiple outputs whenever possible

Important Point
– The number of gates in a circuit has no effect on the mapping
into a LUT
• All that matters is the number of inputs and outputs
• Unfortunately, it isn’t common to see large circuits with a few inputs
1 gate 1,000,000 gates
Both of these circuits can be implemented in a

single 3‐input, 1‐output LUT
Sequential Logic
Problem: How to handle sequential logic
– Truth tables don’t work
Possible solution:
– Add a flip‐flop to the output of LUT
3‐in, 1‐out 3‐in, 2‐out

LUT LUT
etc.
FF FF FF

Sequential Logic
Example: 8‐bit register using 3‐input, 2‐output LUTs
– Input: x, Output: y
x(7) x(6) x(5) x(4) x(3) x(2) x(1) x(0)
3‐in, 2‐out 3‐in, 2‐out 3‐in, 2‐out 3‐in, 2‐out

LUT LUT LUT LUT
FF FF FF FF FF FF FF FF
y(7) y(6) y(5) y(4) y(3) y(2) y(1) y(0)
– What does LUT need to do to implement register?

Sequential Logic
Example, cont.
– LUT simply passes inputs to appropriate output

Sequential Logic
Isn’t it a waste to use LUTs for registers?
YES! (when it can be used for something else)
– Commonly used for pipelined circuits
• Example: Pipelined adder
+ + 3‐in, 2‐out 3‐in, 2‐out

LUT LUT ....
Register Register
FF FF FF FF
+
Register Adder and output register combined – not
a separate LUT for each
Sequential Logic
Existing FPGAs don’t have a flip flop connected to LUT
outputs
Why not?
– Flip flop has to be used!
• Impossible to have pure combinational logic

– Adds latency to circuit
Actual Solution:
– Configurable Logic Blocks (CLBs)

Configurable Logic Blocks (CLBs)
CLBs: the basic FPGA functional unit
– First issue: How to make flip‐flop optional?
• Simplest way: use a mux

– Circuit can now use output from LUT or from FF
– Where does select come from?
3‐in, 1‐out
LUT
CLB
FF
2x1

CLBs usually contain more than 1 LUT
– Why?
• Efficient way of handling common I/O between adjacent LUTs
• Saves routing resources
2x1
3-in, 2-out 3-in, 2-out

CLB LUT LUT
FF FF FF FF
2x1 2x1 2x1 2x1

Example: Ripple‐carry adder
– Each LUT implements 1 full adder
– Use efficient connections between LUTs for carry signals
A(1) B(1) A(0) B(0) Cin(0)
Cin(1)
2x1
3‐in, 2‐out 3‐in, 2‐out

CLB LUT LUT
FF FF FF FF
2x1 2x1 2x1 2x1

Cout(0)
Cout(1) S(1) S(0)

CLBs often have specialized connections between adjacent
CLBs
– Further improves carry chains
– Avoids routing resources
Some commercial CLBs even more complex
– Xilinx Virtex 4 CLB consists of 4 “slices”
• 1 slice = 2 LUTs + 2 FFs + other stuff
• 1 Virtex 4 CLB = 8 LUTs
– Altera devices has LABs (Logic Array Blocks)
• Consist of 16 LEs (logic elements) which each have 4 input LUTs

Reconfigurable Interconnect
FPGAs need some way of connecting CLBs together

– Reconfigurable interconnect
– But, we can only put fixed wires on a chip Problem:
How to make reconfigurable connections with
fixed wires?
– Main challenge:
• Should be flexible enough to support almost any
circuit
FPGA Architecture from Intel
Detailed Routing Architecture
I=2N+2 (for N less than or equal to 16) is sufficient for good logic utilization.
The ‘Logic Utilization’ is defined as the average number of BLEs per logic block that a circuit is able to use
divided by the total number of BLEs per logic block, N.
Problem 2: If FPGA doesn’t know which CLBs will be connected, where does
it put wires?
Solution:
– Put wires everywhere!
• Referred to as channel wires, routing channels, routing tracks,
many others
– CLBs typically arranged in a grid, with wires on all sides
CLB CLB CLB
CLB CLB CLB

Problem 3: How to connect CLB to wires?
Solution: Connection box
– Device that allows inputs and outputs of CLB to connect to
different wires
Connection box
CLB CLB

Connection box characteristics
– Flexibility
• The number of wires a CLB input/output can connect

to
Flexibility = 2 Flexibility = 3
CLB CLB CLB CLB
*Dots represent possible connections

Connection box characteristics
– Topology
• Defines the specific wires each CLB I/O can
connect to
• Examples: same flexibility, different topology
CLB CLB CLB CLB
*Dots represent possible connections

• How to conne ct CLBs to wire s ?
• Classify routing structures according to

connection topology:
• full cross bar type
• one ‐dimensional array type
• two‐dimensional array type
• hierarchical type
• All of them a re composed of wiring tracks and
programmable switches ,
• Routing paths a re determined according to the values of
the configuration memory bits

Full cros s bar type
• Crossbar switches are designed to implement

all permutations of connections among inputs
and outputs.
• ability to establish multiple parallel data paths the
high design complexity of crossbars limits
their scalability in size, that is, the number of
input and output terminals they interconnect.
• Thus, networks of crossbars are employed to
interconnect a large number of terminals and
also when longer distances have to be
accommodated.
• Fig shows a 2x2 buffer‐less switch, which
was named crossbar because it could be in one
of two states, cross or bar, requires additional
MUXs (2N)( for N inputs)

1- Dimensional Type
• logic blocks are arranged in a column and

the routing channels are provided in the row
direction.
• Channels are connected by feed‐through
wiring, which corresponds to the Actel’s ACT
series FPGA.
• In general, a one‐dimensional array type wiring
part tends to increase the number of switches.
• Since the ACT series FPGA uses anti‐fuse‐
type switches, even if the number of switches
is somewhat larger, the area overhead could be
mitigated.
• However, since SRAM‐based switches are
mainstream in recent FPGAs, the one‐
dimensional array type and full crossbar type
are not adopted.

Hierarchical type
• Altera Flex 10K , Apex , and Apex II have a hierarchical
routing architecture.
• The HSRA has a three‐level hierarchical structure
• There is a switch at each level, at the intersection of wire
tracks.
• At the higher hierarchical level, the number of wire tracks
per channel increases. At level 1, the lowest hierarchy,
wirings between multi‐logic blocks are performed.
• As a merit of the hierarchical FPGA, the number of
switches required for signal propagation within the same
hierarchy can be reduced.
• Therefore, high‐speed operations are possible.
• On the other hand, if the application does not match the
hierarchical FPGA, the usage rate of the logic block of each
hierarchy is extremely low.
• clear boundary between two hierarchy levels, once a
routing path crosses the hierarchy, it has to pay a delay
penalty.
• For example, if logic blocks physically close to each other
are not connected at the same hierarchical level, the delay
increases.

2-dimens ional type
Island style
• The island style is adopted by most FPGAs and there
are routing channels in the vertical and horizontal
directions among logic blocks.
• Connections between logic blocks and routing blocks
are generally a two‐point connection or a four‐point
connection,
• Also, with the uniform structure of the logic tiles, the
execution time of routing process can be reduced.

• The routing structure of an FPGA is classified into global and local

routing architectures.
• The global routing architecture does not consider switch level
details, such as connections between logical blocks and the
number of tracks per wiring channel.
• Local routing architecture includes detailed connection such as
switch arrangement between logic block and the routing channel.
• In the detailed routing architecture, the switch arrangement
between logic blocks and wire channels, and the length of the
wire segment are determined.

• W denotes tracks per channel and there are
several wire segment lengths.
• There are two types of connection blocks
(CBs), one for the input and the other for the
output.
• The flexibility of the input CB is defined as
Fcin, and as Fcout for the output.
• A switch block (SB) exists at the intersection
of the routing channel in the horizontal
direction and the vertical direction.
• The switch block flexibility is defined by Fs.
• In this example, W = 4, Fcin = 2/ 4 = 0.5, and
Fcout = 1/ 4 = 0.25.
• Also, an SB has inputs from three directions
with respect to the one output, Fs = 3.

• The wiring elements composed of a CB and a SB have a very large
influence on the area and the circuit delay of FPGA
• Regarding the area, it means that the layout area obtained by the CB and SB is
large.
• As for the delay, in the recent process technology, the wiring delay is
dominant over the gate delay. When deciding the detailed routing
•architecture, it is necessary to consider
•
• •the relationship between logic blocks and wire channels, the wiring segment
length and its ratio, and the transistor level circuit of the routing switch.
• However, there is a complicated tradeoff between routing flexibility and
performance.
• Increasing the number of switches can emphasize flexibility, but also increases
the area and delay.
• On the other hand, if the number of switches is reduced, routing resources
become insufficient, rendering routing impossible.
• The routing architecture is determined considering the balanced use of pass
transistors and tristate buffers.

Wire s egment length
• In the placement and routing using CAD tools, a

routing path that satisfies the constraints of
speed and electric power is determined.
• But difficult to conduct an ideal (shortest)
wiring for all circuits because of wiring
congestion and the number of switch stages.
• Necessary to perform shortcut routing using
medium‐distance or long‐distance segment
length.
• Various segment lengths (e.g. single, quad, and
long line) in routing tracks used.
• The distance of the wire segment length is
determined by the number of logic blocks it
crosses.
• There are single lines, double lines, and quad
lines. Here, the single lines are distributed at a
ratio of 40%, the double lines by 40%, and the
quad lines by 20%.

S witch boxe s
– S hort cha nne ls
• Us e ful for connecting a dja ce nt CLBs
– Long cha nne ls
• Us e ful for connecting CLBs tha t a re s e pa ra te d
• Allows for re duced routing dela y for non-adjace nt CLBs

Connection boxes allow CLBs to connect to routing wires

– But, that only allows us to move signals along a
single wire
– Not very useful
Problem 4: How do FPGAs connect wires together?

Solution: Switch boxes, switch matrices

– Connects horizontal and vertical routing channels. Its flexibility, Fs, defines for a
wiring segment entering the S block the number of other wiring segments it can
be connected to or the number of programming switches between one terminal
and others

• The topology of the S blocks is very important since it is possible
to choose two different topologies with the same flexibility Fs
that result in very different routabilities. For example, figure 2
shows that meanwhile topology 1 can’t connect wire A with B,
topology 2 can.

Switch block topology
Disjoint type:
• Is used in Xilinx’s XC 4000 series,
which is also called Xilinx‐type SB.
• Disjoint‐type SB connects wiring
tracks of the same number in four
directions with Fs = 3.
• When W = 4, the left track L0 is
connected to T0, R0, B0,
• The disjoint‐type SB requires a
small number of switches, but since
it only can connect tracks of the
same index value, the flexibility is
low.

Universal type
• A switch module M with W terminals on each side is said
to be universal if every set of nets satisfying the
dimensional constraint, that is, the number of nets on
each side of M is at most W, is simultaneously routable
through M.
• A universal switch module has the maximum routing
capacity, and thus it is desirable to design such a switch
module using the minimum number of switches
• two pairs of wire tracks can be connected in the SB.
• When W = 4, wire tracks 0, 3 and 1, 2 are paired,
respectively,
• If there is no pair, it has the same connection
configuration as the disjoint‐type SB.
• Total number of wiring tracks can be reduced with the
universal type when compared with disjoint‐type SB.
• However, the universal‐type SB assumes only the single
line and does not correspond to other wire lengths.

Wilton type
• In the disjoint and the universal‐type SBs, only the wiring tracks
of the same number or two pairs of wiring tracks which are
paired are connected.
• In the Wilton‐type SB , it is possible to connect wiring tracks of
different values
• W = 4, the wire track L0 in the left direction is connected to
the wire track T0, B3 and R0.
• allows change in the track number assigned to a net as it turns. This
way, two different global routes may reach two different routing
tracks at the same destination channel. This forms two disjoint
paths, a feature called the diversity of a network.
• allows greater flexibility in the initial track selection near the
source.
• At least one wire track is connected to the wiring track
(W − 1) which is the longest distance. As a result, when
routing is performed across several SBs, the routing flexibility is
higher than that of other topologies.

Why do fle xiblity a nd topology ma tte r?

– Routa bility: a me as ure of the number of circuits that can be routed
• Highe r fle xibility = be tte r routa bility
• Wilton s witch box topology = be tte r routa bility

FPGA Fabrics
FPGA layout called a “fabric”

– 2‐dimensional array of CLBs and programmable interconnect
– Sometimes referred to as an “island style” architecture
– Can implement any circuit ...

• But, should fabric include something else?
BITS Pilani,
PilaBnITi SCPaim
lanp
i, u
Pislani Campus
FPGAFabrics
What about memory?

– Could use FF’s in CLBs to create a memory
• Example: Create a 1 MB memory with:
– CLB with a single 3‐input, 2‐output LUT
• Each CLB = 2 bits of memory (because of 2 outputs)
• Total CLBs = (1 MB * 8 bits/byte) / 2 bits/CLB
– 4 million CLBs!!!!
– FPGAs commonly have tens of thousands of LUTs
» Large devices have 100‐200k LUTs
» State‐of‐the‐art devices ~800k LUTs
– Even if FPGAs were large enough, using a chip to implement 1 MB of
memory is not smart
– Conclusion:
• Bad Idea!! Huge waste of resources!

FPGA Memory Components
Solution 1: Use LUTs for logic or memory
– LUTs are small SRAMs, why not use them as memory?
– Xilinx refers to as distributed RAM
Solution 2: Include dedicated RAM components in the
FPGA fabric
– Xilinx refers to as Block RAM
• Can be single/dual‐ported
• Can be combined into arbitrary sizes
• Can be used as FIFO
– Different clock speeds for reads/writes
– Altera has Memory Blocks
• M4K: 4k bits of RAM
• Others: M9K, M20k, M144K
BITS Pilani,
PilaBnITi SCPaim
lanp
i, u
Pislani Campus
FPGA MemoryComponents
Fabric with Block RAM

– Block RAM can be placed anywhere
– Typically, placed in columns of the fabric

DSPComponents
FPGAs commonly used for DSP apps

– Makes sense to include custom DSP units instead of mapping onto LUTs
• Custom unit = faster/smaller
Example: Xilinx DSP48
– Includes multipliers, adders, subtractors, etc.
• 18x18 multiplication
• 48‐bit addition/subtraction
– Provides efficient way of implementing
• Add/subtract/multiply
• MAC (Multiply‐accumulate)
• Barrel shifter
• FIR Filter
• Square root
• Etc.
Altera devices have multiplier blocks
– Can be configured as 18x18 or 2 separate 9x9 multipliers

Example Fabric
Existing FPGAs are 2‐dimensional arrays of CLBs, DSP, Block RAM, and
programmable interconnect
– Actual layout/placement differs for different FPGAs

Other res ources
I/O
– Virte x 7 has 1,200 pins
– Communication is s till ofte n a bottleneck
• Pins don’t incre a s e with ne w FPGAs , but logic doe s
– Tre nd: High-spe e d s e rial trans ce ivers
Clock re s ource s
– Us ing re configurable interconne ct for clock introduces timing
problems
• S ke w, jitte r
– FPGAs often provided clock tree s , both globa lly and loca lly
– e .g. Virte x 7
http://www.xilinx.com/s upport/docume nta tion/use r_guide s/ug472_7S
e ries _Clocking.pdf
BITS Pilani,
PilaBnITi SCPaim
lanp
i, u
Pislani Campus
Example Fabrics
Virte x 7 (ima ge from Xilinx 7-s e rie s ove rvie w)

ES - MEL - AEL ZG554 - Lec4

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ES - MEL - AEL ZG554 - Lec4

Uploaded by

Copyright:

Available Formats

Reconfigurable Computing

AEL ZG554 / ES ZG 554 / MEL ZG 554

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

All FPGAs contain a basic programmable logic cell replicated in a

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

When designing (or selecting) the type of logic cell for an

How many inputs? How many functions?

Any specialized logic?

BITS Pilani, Pilani

How can we implement any circuit in an FPGA?

BITS Pilani, Pilani Campus

Implement truth table in small memories (LUTs)

Logic inputs connect to

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

• Truth table handles this problem

• Extra outputs simply not used

BITS Pilani, Pilani Campus

1 gate 1,000,000 gates

Both of these circuits can be implemented in a

3‐in, 1‐out 3‐in, 2‐out

BITS Pilani, Pilani Campus

x(7) x(6) x(5) x(4) x(3) x(2) x(1) x(0)

3‐in, 2‐out 3‐in, 2‐out 3‐in, 2‐out 3‐in, 2‐out

– What does LUT need to do to implement register?

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

+ + 3‐in, 2‐out 3‐in, 2‐out

• Impossible to have pure combinational logic

BITS Pilani, Pilani Campus

• Simplest way: use a mux

BITS Pilani, Pilani Campus

3-in, 2-out 3-in, 2-out

2x1 2x1 2x1 2x1

BITS Pilani, Pilani Campus

3‐in, 2‐out 3‐in, 2‐out

2x1 2x1 2x1 2x1

Cout(1) S(1) S(0)

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

FPGAs need some way of connecting CLBs together

CLB CLB CLB

CLB CLB CLB

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

• The number of wires a CLB input/output can connect

CLB CLB CLB CLB

*Dots represent possible connections

BITS Pilani, Pilani Campus

CLB CLB CLB CLB

*Dots represent possible connections

• Classify routing structures according to

BITS Pilani, Pilani Campus

• Crossbar switches are designed to implement

BITS Pilani, Pilani Campus

• logic blocks are arranged in a column and