Professional Documents
Culture Documents
Module 4
Module 4
S5 HONOURS
MODULE 4
Module 4:
Placement and Routing: Programmable interconnect - Partitioning and Placement,
Routing resources, delays; Applications -Embedded system design using FPGAs,
DSP using FPGAs.
PROGRAMMABLE INTERCONNECTS
• A key element of an FPGA is the general-purpose programmable interconnect
interspersed between the programmable logic blocks.
• There are different types of interconnection resources in all commercial
FPGAs.
• Every vendor has its own specific names for the different types of
interconnects in their FPGA.
Interconnects in Symmetric Array FPGAs
• Many FPGAs use switch matrices that provide interconnections between
routing wires connected to the switch matrix (General purpose interconnect)
General-Purpose Interconnect:
• A typical switch matrix has a switch at each intersection (i.e., wherever the lines cross).
• A switch matrix that supports every possible connection from every
wire to every other wire is very expensive.
• The connectivity is often limited to some subset of a full crossbar connection; moreover, not all
connections might be possible simultaneously.
• In the switch matrix illustrated in Figure (b), each wire from
a side of the switch can be routed to other wires using some combination of the switches.
• In order to support this type of connection, each cross point in the switch matrix must support six
possible interconnections, as shown in Figure (c).
• Depending on the programming technology, SRAM cells, flash memory
cells, or antifuse connections control the configuration of the switches.
• The switch matrices interspersed between the logic blocks in an FPGA
allow general-purpose interconnectivity between arbitrary points in the chip.
• However, the switch matrices are expensive in area and time (delay).
• If a signal passes through several of these switch matrices, it could contribute to a
significant signal delay.
• Moreover, the delays are variable and unpredictable depending on the number of the
switch matrices involved in each signal.
▪ Antifuses join the wire segments. The designer then programs the
interconnections by blowing antifuses and making connections
between wire segments; unwanted connections are left
unprogrammed.
Horizontal tracks
▪ 8 vertical tracks per LM are available for inputs (4 from the LM above the
channel and 4 from the LM below).These connections are the input stubs.
▪ The single LM output connects to a vertical track that extends across the 2
channels above the module and across the 2 channels below the module
(output stub). Since this is a dedicated connection, no antifuse is needed.
▪ Thus module outputs use 4 vertical tracks per module (counting 2 tracks from the
modules below, and 2 tracks from the modules above each channel).
ACT1 INTERCONNECTION ARCHITECTURE
Vertical tracks
▪ If the Logic Module at the end of a net is less than two rows away
from the driver module, a connection requires 2 antifuses, 1
vertical track, and 2 horizontal segments.
Xilinx LCA interconnect. (a) The LCA architecture (notice the matrix element size is
larger than a CLB). (b) A simplified representation of the interconnect resources.
Each of the lines is a bus.
Components of interconnect delay in a Xilinx LCA array. (a) A portion of the interconnect
around the CLBs. (b) A switching matrix. (c) A detailed view inside the switching matrix
showing the passtransistor arrangement. (d) The equivalent circuit for the connection
between nets 6 and 20 using the matrix. (e) A view of the interconnect at a Programmable
Interconnection Point (PIP). (f) and (g) The equivalent schematic of a PIP connection.
ALTERA MAX INTERCONNECT SCHEME
• Altera MAX 5000/7000 devices use a Programmable Interconnect
Array ( PIA ).
• The PIA is a cross-point switch for logic signals traveling between
LABs.
• The advantages of this architecture is it uses a fixed number of
connections so the routing delay is also fixed.
• Simpler and regular structure in nature that improved speed of
the placement and routing software.
• The delay between LAB1 and LAB2 is the same as the delay
between LAB1 and LAB6
ALTERA MAX 5000 AND 7000 INTERCONNECT SCHEME
A simplified block diagram of the Altera MAX interconnect scheme. (a) The PIA
(Programmable Interconnect Array) is deterministic— delay is independent of the
path length. (b) Each LAB (Logic Array Block) contains a programmable AND array.
2/25/2022 26
PLACEMENT AND ROUTING
Placement
• Determine which logic block within an FPGA should implement each of the logic
blocks required by the circuit.
• The physical assignment of all blocks on the target FPGA in a way that minimizes one
or more specific objective cost functions (e.g., wirelength, delay etc.).
Objective:
• Minimize the required wiring (wire-length driven placement)
• Balance the wiring density across the FPGA (routability-driven placement)
• Maximize circuit speed (timing-driven placement)
3 major placement algorithms:
– min-cut (partitioning-based) placement
– simulated annealing based placement
– analytic placement
PARTITIONING BASED ALGORITHM
• Also referred to as min-cut methods
• The partitioning-based placement can be realized as recursively calling the
partitioning process by picking a region containing some circuit modules,
dividing the region into a set of subregions, and assigning each module to one of
the subregions to optimize some predefined metric (e.g., wirelength and cut
size).
• minimizing the number of cuts in the nets across the boundary between two
partitions
• placing highly-connected blocks in the same partition.
• These procedures are recursively repeated until the number of modules in each
region is smaller than a threshold
• A net is said to be cut if it connects components in one region to
components in other region
• The number of nets cut is the cut size
• The advantage of partitioning-based placement algorithms is that
they run very fast, efficient and has good scalability for handling
large-scale designs
• As they use a divide-and-conquer strategy, where large problems
are divided into small sub-problems, partitioning-based methods
significantly reduce the problem search space.
• Quality is often limited because of the lack of global information in
the top/coarse level and the lack of flexibility in the bottom/fine
one, especially when a design with large whitespaces.
• Moreover, since the cut size is not an exact function of wirelength,
timing or routability, the quality is not as good as other placement
strategies.
SIMULATED ANNEALING
• Simulated annealing (SA for short) is an optimization method which provides
a probability-based mechanism for “uphill” moves (i.e., a state/solution with a
higher cost) to escape from being trapped in a local minimum, where the
probability depends on the magnitude of the “uphill” move and the total
search time.
Because SA allows a
state of a higher cost
to replace its previous
state (i.e., an “uphill”
move), SA can escape
from a local minimum;
SA often can find a
high-quality solution
• For an SA-based placer, a solution is often given by the assignment of physical
locations for all the modules, and its solution space is a collection of all the
feasible assignments.
• By changing the location(s) of one module or more, we can identify a
neighboring state (a new placement) which is evaluated by a predefined cost
function to determine whether this neighboring state is kept.
• The well-known Versatile Place and Route (VPR) is a classical SA-based FPGA
placer
• SA is typically general and robust, very suitable for handling a design with
multiple objectives
• However, SA is often time-consuming. As the design complexity increases, the
runtime of an SA-based FPGA placement might be prohibitively long.
ANALYTICAL PLACEMENT
• Recently, a significant paradigm shift for FPGA global placement (and even
legalization) is moving from simulated annealing to analytical formulation
• Analytical placement computes the desired locations of modules under given
constraints with a mathematical formulation.
• key issues lie in the analytical models of wirelength and the integration and
optimization of objective functions
• Wirelength model:
• Half-perimeter wirelength (HPWL) of a net is the most popular wirelength model for
placement
• Mathematical formulation
• Quadratic wirelength models
• Squared Euclidean wirelength to approximate HPWL
• non-quadratic wirelength
• Weighted-average and log-sum-exponential models were proposed to approximate HPWL
• Quadratic models are intrinsically faster but less accurate, while non-quadratic models are
more accurate but slower.
• Integration:
• need to handle the simultaneous optimization for multiple objectives.
• As a result, it is desirable for an analytical formulation to integrate these objectives for
effective co-optimization.
• most popular penalty method first integrates two objective functions W and D (say,
wirelength and density, respectively) as
• W + λD,
• penalty multiplier λ
• Desired balance between W and D is achieved
Ref 11
• W=?
DELAY
• Different parts of a circuit contribute to path delays : I/O pads, the logic blocks
and the interconnects
wire