You are on page 1of 56

Reconfigurable Computing

AEL ZG554 / ES ZG 554 / MEL ZG 554


Session 4
Pawan Sharma
BITS Pilani ps@pilani.bits-pilani.ac.in
Pilani Campus 13-08-2022
Last Lecture
Complex Programmable Logic Devices

BITS Pilani, Pilani Campus


Today’s Lecture
FPGA Basics

BITS Pilani, Pilani Campus


Field‐Programmable Gate Arrays
Xilinx FPGAs are based on Configurable Logic Blocks (CLBs). More generally called
Programmable logic cells

BITS Pilani, Pilani Campus


Programmable Logic Cells

All FPGAs contain a basic programmable logic cell replicated in a


regular array across the chip
– configurable logic block, logic element, logic module, logic
unit, logic array block, …
– many other names
There are three different types of basic logic cells:
– multiplexer based
– look‐up table based
– programmable array logic (PAL‐like)

BITS Pilani, Pilani Campus


Logic Cell Considerations
How are functions implemented?
– fixed functions (manipulate inputs only)
– programmable functionality (interconnect components)
Coarse‐grained logic cells:
– support complex functions, need fewer blocks, but they are bigger
so less of them on chip
Fine‐grained logic cells:
– support simple functions, need more blocks, but they are smaller
so more of them on chip

BITS Pilani, Pilani Campus


Logic Cell Considerations

When designing (or selecting) the type of logic cell for an


FPGA, some basic questions are important:

How many inputs? How many functions?


• all functions of n inputs or eliminate some combinations?
• what inputs go to what parts of the logic cell?

Any specialized logic?


adder, etc.
What register features?

BITS Pilani, Pilani


Camp s
FPGA Architectures

How can we implement any circuit in an FPGA?


– First, focus on combinational logic
– Example: Half adder
• Combinational logic represented by truth table
• What kind of hardware can implement a truth table?

BITS Pilani, Pilani Campus


Look‐up‐tables (LUTs)

Implement truth table in small memories (LUTs)


– Usually SRAM

Logic inputs connect to


address inputs, logic
output is memory output

BITS Pilani, Pilani Campus


Look‐up‐tables (LUTs)
Alternatively, could have used a 2‐input, 2‐output LUT
– Outputs commonly use same inputs

BITS Pilani, Pilani Campus


Look‐up‐tables (LUTs)
Full adder
– Combinational logic can be implemented in a LUT with same number of
inputs and outputs
• 3‐input, 2‐ouput LUT

BITS Pilani, Pilani Campus


Look‐up‐tables (LUTs)
Why aren’t FPGAs just a big LUT?
– Size of truth table grows exponentially based on # of inputs
• 3 inputs = 8 rows, 4 inputs = 16 rows, 5 inputs = 32 rows, etc.
– Same number of rows in truth table and LUT
– LUTs grow exponentially based on # of inputs
Number of SRAM bits in a LUT = 2i * o
– i = # of inputs, o = # of outputs
– Example: 64 input combinational logic with 1 output would require 264
SRAM bits
• 1.84 x 1019
Clearly, not feasible to use large LUTs
– So, how do FPGAs implement logic with many inputs?

BITS Pilani, Pilani Campus


Look‐up‐tables (LUTs)
Fortunately, we can map circuits onto multiple LUTs
– Divide circuit into smaller circuits that fit in LUTs (same # of inputs and
outputs)
– Example: 3‐input, 2‐output LUTs

BITS Pilani, Pilani Campus


Look‐up‐tables (LUTs)
What if circuit doesn’t map perfectly?
– More inputs in LUT than in circuit

• Truth table handles this problem


• Unused inputs are ignored
– More outputs in LUT than in circuit

• Extra outputs simply not used


– Space is wasted, so should use multiple outputs whenever possible

BITS Pilani, Pilani Campus


Look‐up‐tables (LUTs)
Important Point
– The number of gates in a circuit has no effect on the mapping
into a LUT
• All that matters is the number of inputs and outputs
• Unfortunately, it isn’t common to see large circuits with a few inputs

1 gate 1,000,000 gates

Both of these circuits can be implemented in a


single 3‐input, 1‐output LUT
BITS Pilani, Pilani Campus
Sequential Logic
Problem: How to handle sequential logic
– Truth tables don’t work

Possible solution:
– Add a flip‐flop to the output of LUT

3‐in, 1‐out 3‐in, 2‐out


LUT LUT
etc.

FF FF FF

BITS Pilani, Pilani Campus


Sequential Logic
Example: 8‐bit register using 3‐input, 2‐output LUTs
– Input: x, Output: y

x(7) x(6) x(5) x(4) x(3) x(2) x(1) x(0)

3‐in, 2‐out 3‐in, 2‐out 3‐in, 2‐out 3‐in, 2‐out


LUT LUT LUT LUT

FF FF FF FF FF FF FF FF
y(7) y(6) y(5) y(4) y(3) y(2) y(1) y(0)

– What does LUT need to do to implement register?

BITS Pilani, Pilani Campus


Sequential Logic
Example, cont.
– LUT simply passes inputs to appropriate output

BITS Pilani, Pilani Campus


Sequential Logic
Isn’t it a waste to use LUTs for registers?
YES! (when it can be used for something else)
– Commonly used for pipelined circuits
• Example: Pipelined adder

+ + 3‐in, 2‐out 3‐in, 2‐out


LUT LUT ....
Register Register
FF FF FF FF
+
Register Adder and output register combined – not
a separate LUT for each
BITS Pilani, Pilani Campus
Sequential Logic
Existing FPGAs don’t have a flip flop connected to LUT
outputs
Why not?
– Flip flop has to be used!

• Impossible to have pure combinational logic


– Adds latency to circuit

Actual Solution:
– Configurable Logic Blocks (CLBs)

BITS Pilani, Pilani Campus


Configurable Logic Blocks (CLBs)
CLBs: the basic FPGA functional unit
– First issue: How to make flip‐flop optional?

• Simplest way: use a mux


– Circuit can now use output from LUT or from FF
– Where does select come from?

3‐in, 1‐out
LUT

CLB
FF

2x1

BITS Pilani, Pilani Campus


Configurable Logic Blocks (CLBs)
CLBs usually contain more than 1 LUT
– Why?
• Efficient way of handling common I/O between adjacent LUTs
• Saves routing resources

2x1

3-in, 2-out 3-in, 2-out


CLB LUT LUT

FF FF FF FF

2x1 2x1 2x1 2x1

BITS Pilani, Pilani Campus


Configurable Logic Blocks (CLBs)
Example: Ripple‐carry adder
– Each LUT implements 1 full adder
– Use efficient connections between LUTs for carry signals
A(1) B(1) A(0) B(0) Cin(0)

Cin(1)
2x1

3‐in, 2‐out 3‐in, 2‐out


CLB LUT LUT

FF FF FF FF

2x1 2x1 2x1 2x1


Cout(0)

Cout(1) S(1) S(0)

BITS Pilani, Pilani Campus


Configurable Logic Blocks (CLBs)
CLBs often have specialized connections between adjacent
CLBs
– Further improves carry chains
– Avoids routing resources
Some commercial CLBs even more complex
– Xilinx Virtex 4 CLB consists of 4 “slices”
• 1 slice = 2 LUTs + 2 FFs + other stuff
• 1 Virtex 4 CLB = 8 LUTs
– Altera devices has LABs (Logic Array Blocks)
• Consist of 16 LEs (logic elements) which each have 4 input LUTs

BITS Pilani, Pilani Campus


Reconfigurable Interconnect

FPGAs need some way of connecting CLBs together


– Reconfigurable interconnect
– But, we can only put fixed wires on a chip Problem:
How to make reconfigurable connections with
fixed wires?
– Main challenge:
• Should be flexible enough to support almost any
circuit
FPGA Architecture from Intel
Detailed Routing Architecture

I=2N+2 (for N less than or equal to 16) is sufficient for good logic utilization.
The ‘Logic Utilization’ is defined as the average number of BLEs per logic block that a circuit is able to use
divided by the total number of BLEs per logic block, N.
Reconfigurable Interconnect

Problem 2: If FPGA doesn’t know which CLBs will be connected, where does
it put wires?
Solution:
– Put wires everywhere!
• Referred to as channel wires, routing channels, routing tracks,
many others
– CLBs typically arranged in a grid, with wires on all sides

CLB CLB CLB

CLB CLB CLB

BITS Pilani, Pilani Campus


Reconfigurable Interconnect
Problem 3: How to connect CLB to wires?
Solution: Connection box
– Device that allows inputs and outputs of CLB to connect to
different wires

Connection box

CLB CLB

BITS Pilani, Pilani Campus


Reconfigurable Interconnect
Connection box characteristics
– Flexibility

• The number of wires a CLB input/output can connect


to

Flexibility = 2 Flexibility = 3

CLB CLB CLB CLB

*Dots represent possible connections

BITS Pilani, Pilani Campus


Reconfigurable Interconnect
Connection box characteristics
– Topology
• Defines the specific wires each CLB I/O can
connect to
• Examples: same flexibility, different topology

CLB CLB CLB CLB

*Dots represent possible connections


BITS Pilani, Pilani Campus
• How to conne ct CLBs to wire s ?

• Classify routing structures according to


connection topology:
• full cross bar type
• one ‐dimensional array type
• two‐dimensional array type
• hierarchical type
• All of them a re composed of wiring tracks and
programmable switches ,
• Routing paths a re determined according to the values of
the configuration memory bits

BITS Pilani, Pilani Campus


Full cros s bar type

• Crossbar switches are designed to implement


all permutations of connections among inputs
and outputs.
• ability to establish multiple parallel data paths the
high design complexity of crossbars limits
their scalability in size, that is, the number of
input and output terminals they interconnect.
• Thus, networks of crossbars are employed to
interconnect a large number of terminals and
also when longer distances have to be
accommodated.
• Fig shows a 2x2 buffer‐less switch, which
was named crossbar because it could be in one
of two states, cross or bar, requires additional
MUXs (2N)( for N inputs)

BITS Pilani, Pilani Campus


1- Dimensional Type

• logic blocks are arranged in a column and


the routing channels are provided in the row
direction.
• Channels are connected by feed‐through
wiring, which corresponds to the Actel’s ACT
series FPGA.
• In general, a one‐dimensional array type wiring
part tends to increase the number of switches.
• Since the ACT series FPGA uses anti‐fuse‐
type switches, even if the number of switches
is somewhat larger, the area overhead could be
mitigated.
• However, since SRAM‐based switches are
mainstream in recent FPGAs, the one‐
dimensional array type and full crossbar type
are not adopted.

BITS Pilani, Pilani Campus


Hierarchical type
• Altera Flex 10K , Apex , and Apex II have a hierarchical
routing architecture.
• The HSRA has a three‐level hierarchical structure
• There is a switch at each level, at the intersection of wire
tracks.
• At the higher hierarchical level, the number of wire tracks
per channel increases. At level 1, the lowest hierarchy,
wirings between multi‐logic blocks are performed.
• As a merit of the hierarchical FPGA, the number of
switches required for signal propagation within the same
hierarchy can be reduced.
• Therefore, high‐speed operations are possible.
• On the other hand, if the application does not match the
hierarchical FPGA, the usage rate of the logic block of each
hierarchy is extremely low.
• clear boundary between two hierarchy levels, once a
routing path crosses the hierarchy, it has to pay a delay
penalty.
• For example, if logic blocks physically close to each other
are not connected at the same hierarchical level, the delay
increases.

BITS Pilani, Pilani Campus


2-dimens ional type

Island style
• The island style is adopted by most FPGAs and there
are routing channels in the vertical and horizontal
directions among logic blocks.
• Connections between logic blocks and routing blocks
are generally a two‐point connection or a four‐point
connection,
• Also, with the uniform structure of the logic tiles, the
execution time of routing process can be reduced.

BITS Pilani, Pilani Campus


Detailed Routing Architecture

• The routing structure of an FPGA is classified into global and local


routing architectures.
• The global routing architecture does not consider switch level
details, such as connections between logical blocks and the
number of tracks per wiring channel.
• Local routing architecture includes detailed connection such as
switch arrangement between logic block and the routing channel.
• In the detailed routing architecture, the switch arrangement
between logic blocks and wire channels, and the length of the
wire segment are determined.

BITS Pilani, Pilani Campus


Detailed Routing Architecture
• W denotes tracks per channel and there are
several wire segment lengths.
• There are two types of connection blocks
(CBs), one for the input and the other for the
output.
• The flexibility of the input CB is defined as
Fcin, and as Fcout for the output.
• A switch block (SB) exists at the intersection
of the routing channel in the horizontal
direction and the vertical direction.
• The switch block flexibility is defined by Fs.
• In this example, W = 4, Fcin = 2/ 4 = 0.5, and
Fcout = 1/ 4 = 0.25.
• Also, an SB has inputs from three directions
with respect to the one output, Fs = 3.

BITS Pilani, Pilani Campus


• The wiring elements composed of a CB and a SB have a very large
influence on the area and the circuit delay of FPGA
• Regarding the area, it means that the layout area obtained by the CB and SB is
large.
• As for the delay, in the recent process technology, the wiring delay is
dominant over the gate delay. When deciding the detailed routing
•architecture, it is necessary to consider

• •the relationship between logic blocks and wire channels, the wiring segment
length and its ratio, and the transistor level circuit of the routing switch.
• However, there is a complicated tradeoff between routing flexibility and
performance.
• Increasing the number of switches can emphasize flexibility, but also increases
the area and delay.
• On the other hand, if the number of switches is reduced, routing resources
become insufficient, rendering routing impossible.
• The routing architecture is determined considering the balanced use of pass
transistors and tristate buffers.

BITS Pilani, Pilani Campus


Wire s egment length

• In the placement and routing using CAD tools, a


routing path that satisfies the constraints of
speed and electric power is determined.
• But difficult to conduct an ideal (shortest)
wiring for all circuits because of wiring
congestion and the number of switch stages.
• Necessary to perform shortcut routing using
medium‐distance or long‐distance segment
length.
• Various segment lengths (e.g. single, quad, and
long line) in routing tracks used.
• The distance of the wire segment length is
determined by the number of logic blocks it
crosses.
• There are single lines, double lines, and quad
lines. Here, the single lines are distributed at a
ratio of 40%, the double lines by 40%, and the
quad lines by 20%.

BITS Pilani, Pilani Campus


Reconfigurable Interconnect

S witch boxe s
– S hort cha nne ls
• Us e ful for connecting a dja ce nt CLBs
– Long cha nne ls
• Us e ful for connecting CLBs tha t a re s e pa ra te d
• Allows for re duced routing dela y for non-adjace nt CLBs

BITS Pilani, Pilani Campus


Reconfigurable Interconnect

Connection boxes allow CLBs to connect to routing wires


– But, that only allows us to move signals along a
single wire
– Not very useful
Problem 4: How do FPGAs connect wires together?

BITS Pilani, Pilani Campus


Reconfigurable Interconnect

Solution: Switch boxes, switch matrices


– Connects horizontal and vertical routing channels. Its flexibility, Fs, defines for a
wiring segment entering the S block the number of other wiring segments it can
be connected to or the number of programming switches between one terminal
and others

BITS Pilani, Pilani Campus


• The topology of the S blocks is very important since it is possible
to choose two different topologies with the same flexibility Fs
that result in very different routabilities. For example, figure 2
shows that meanwhile topology 1 can’t connect wire A with B,
topology 2 can.

BITS Pilani, Pilani Campus


Switch block topology
Disjoint type:
• Is used in Xilinx’s XC 4000 series,
which is also called Xilinx‐type SB.
• Disjoint‐type SB connects wiring
tracks of the same number in four
directions with Fs = 3.
• When W = 4, the left track L0 is
connected to T0, R0, B0,
• The disjoint‐type SB requires a
small number of switches, but since
it only can connect tracks of the
same index value, the flexibility is
low.

BITS Pilani, Pilani Campus


Universal type
• A switch module M with W terminals on each side is said
to be universal if every set of nets satisfying the
dimensional constraint, that is, the number of nets on
each side of M is at most W, is simultaneously routable
through M.
• A universal switch module has the maximum routing
capacity, and thus it is desirable to design such a switch
module using the minimum number of switches
• two pairs of wire tracks can be connected in the SB.
• When W = 4, wire tracks 0, 3 and 1, 2 are paired,
respectively,
• If there is no pair, it has the same connection
configuration as the disjoint‐type SB.
• Total number of wiring tracks can be reduced with the
universal type when compared with disjoint‐type SB.
• However, the universal‐type SB assumes only the single
line and does not correspond to other wire lengths.

BITS Pilani, Pilani Campus


Wilton type
• In the disjoint and the universal‐type SBs, only the wiring tracks
of the same number or two pairs of wiring tracks which are
paired are connected.
• In the Wilton‐type SB , it is possible to connect wiring tracks of
different values
• W = 4, the wire track L0 in the left direction is connected to
the wire track T0, B3 and R0.
• allows change in the track number assigned to a net as it turns. This
way, two different global routes may reach two different routing
tracks at the same destination channel. This forms two disjoint
paths, a feature called the diversity of a network.
• allows greater flexibility in the initial track selection near the
source.
• At least one wire track is connected to the wiring track
(W − 1) which is the longest distance. As a result, when
routing is performed across several SBs, the routing flexibility is
higher than that of other topologies.

BITS Pilani, Pilani Campus


Reconfigurable Interconnect

Why do fle xiblity a nd topology ma tte r?


– Routa bility: a me as ure of the number of circuits that can be routed
• Highe r fle xibility = be tte r routa bility
• Wilton s witch box topology = be tte r routa bility

BITS Pilani, Pilani Campus


FPGA Fabrics

FPGA layout called a “fabric”


– 2‐dimensional array of CLBs and programmable interconnect
– Sometimes referred to as an “island style” architecture

– Can implement any circuit ...


• But, should fabric include something else?

BITS Pilani,
PilaBnITi SCPaim
lanp
i, u
Pislani Campus
FPGAFabrics

What about memory?


– Could use FF’s in CLBs to create a memory
• Example: Create a 1 MB memory with:
– CLB with a single 3‐input, 2‐output LUT
• Each CLB = 2 bits of memory (because of 2 outputs)
• Total CLBs = (1 MB * 8 bits/byte) / 2 bits/CLB
– 4 million CLBs!!!!
– FPGAs commonly have tens of thousands of LUTs
» Large devices have 100‐200k LUTs
» State‐of‐the‐art devices ~800k LUTs
– Even if FPGAs were large enough, using a chip to implement 1 MB of
memory is not smart
– Conclusion:
• Bad Idea!! Huge waste of resources!

BITS Pilani, Pilani Campus


FPGA Memory Components
Solution 1: Use LUTs for logic or memory
– LUTs are small SRAMs, why not use them as memory?
– Xilinx refers to as distributed RAM
Solution 2: Include dedicated RAM components in the
FPGA fabric
– Xilinx refers to as Block RAM
• Can be single/dual‐ported
• Can be combined into arbitrary sizes
• Can be used as FIFO
– Different clock speeds for reads/writes
– Altera has Memory Blocks
• M4K: 4k bits of RAM
• Others: M9K, M20k, M144K

BITS Pilani,
PilaBnITi SCPaim
lanp
i, u
Pislani Campus
FPGA MemoryComponents

Fabric with Block RAM


– Block RAM can be placed anywhere
– Typically, placed in columns of the fabric

BITS Pilani, Pilani Campus


DSPComponents

FPGAs commonly used for DSP apps


– Makes sense to include custom DSP units instead of mapping onto LUTs
• Custom unit = faster/smaller
Example: Xilinx DSP48
– Includes multipliers, adders, subtractors, etc.
• 18x18 multiplication
• 48‐bit addition/subtraction
– Provides efficient way of implementing
• Add/subtract/multiply
• MAC (Multiply‐accumulate)
• Barrel shifter
• FIR Filter
• Square root
• Etc.
Altera devices have multiplier blocks
– Can be configured as 18x18 or 2 separate 9x9 multipliers

BITS Pilani, Pilani Campus


Example Fabric

Existing FPGAs are 2‐dimensional arrays of CLBs, DSP, Block RAM, and
programmable interconnect
– Actual layout/placement differs for different FPGAs

BITS Pilani, Pilani Campus


Other res ources

I/O
– Virte x 7 has 1,200 pins
– Communication is s till ofte n a bottleneck
• Pins don’t incre a s e with ne w FPGAs , but logic doe s
– Tre nd: High-spe e d s e rial trans ce ivers
Clock re s ource s
– Us ing re configurable interconne ct for clock introduces timing
problems
• S ke w, jitte r
– FPGAs often provided clock tree s , both globa lly and loca lly
– e .g. Virte x 7
http://www.xilinx.com/s upport/docume nta tion/use r_guide s/ug472_7S
e ries _Clocking.pdf

BITS Pilani,
PilaBnITi SCPaim
lanp
i, u
Pislani Campus
Example Fabrics

Virte x 7 (ima ge from Xilinx 7-s e rie s ove rvie w)

BITS Pilani, Pilani Campus

You might also like