You are on page 1of 26

FPGA INTRODUCTION

AUTHOR
Nandhakumar
Field Programmable Gate Array
What is an 
 Configurable Logic blocks (CLB), interconnection

FPGA? resources, and I/O pads.


 Dominant digital design implementation
 Ability to re-configure FPGA to implement any
digital logic function
 Partial re-configuration allows a portion of the
FPGA to be continuously running while another
portion is being re-configured
 FPGAs also contain analog circuitry features
including a programmable slew rate and drive
strength, differential comparators on I/O
designed to be connected to differential signaling
channels.
 Mixed-signal FPGAs contains ADCs and DACs with
analog signal conditional blocks allowing them to
operate as a system-on-chip (SoC)
 There are (at least) four companies making FPGAs in the world. The first two
(Xilinx and Altera) hold the bulk of the market.
Who makes
FPGAs?  Xilinx invented FPGAs and is the biggest name in the FPGA world.

 Altera is the second FPGA heavyweight, also a well-known name.

 Lattice and Actel are smaller players.


Elements of FPGA
The basic elements of an Field Programmable Gate Array are:
 Configurable logic blocks(CLBs)
 Configurable input output blocks(IOBs)
 Two layer metal network of vertical and horizontal lines for interconnecting
the CLBs and FPGAs (programmable interconnect)
 Logic resources (Combinational, Flip Flops)
 Combinational: LUT, Multiplexers, Gates
 Programmable interconnections: SRAM, Flash, Anti-fuse
 Special Resources: PLL/DLL, RAMs, FIFOs
 Memory Controllers, Network Interfaces, Processors
How FPGA  CLB consists of two slices, each of which
works? contains look-up tables (LUT), registers,
multiplexers, and carry logic.
 LUT implement logic functions
 Registers store data
 Multiplexers select the desired output
 Carry logic enables fast arithmetic function

 Interconnections are routing resources


including channels, switch boxes, clock
distribution networks, etc.
Field-Programmable
Gate Arrays structure
 Configurational Logic blocks
 To implement
combinational
and sequential logic
 Interconnect
 Wires to connect inputs
and
outputs to logic blocks
 I/O blocks
 Special logic blocks at
periphery of device for
external connections
Configurational Logic Blocks
 CLBs consist of:
 Look-up Tables (LUT) which implement the entries of a logic functions truth
table
 Some FPGAs can use LUTs to implement small Random Access Memory (RAM)
 Carry and Control Logic
 Implements fast arithmetic operations (adders/ subtractors)
 Can be also configured for additional operations (Built-in-Self Test iterative-OR
chain)
 Memory Elements
 Configurable Flip Flops (FFs)/ Latches( Programmable clock edges, set/reset, and
clock enable)
 These memory elements usually can be configured as shift-registers
CLB (Example Xilinx)

 A CLB can contain several


slices, which make up a single
CLB. Xilinx Virtex-5 FPGAs
(right) have two slices: SLICEL
(logic) and SLICEM (memory).
 In addition to the basic CLB
architecture, the Virtex-5
contains wide-function MUXs
which can implement:
 4:1 MUX using 1 LUT

 8:1 MUX using 2 LUTs

 16:1 MUX using 4 LUTs


How LUT works

 FPGA makes use of its LUTs as a preliminary


resource to implement any logical function.
This is actually a two-phase process.
 At first, the output values for each
combination of input variables constituting
the Boolean Function are stored in the SRAM
cells of the LUT. After this, depending on the
combination of input variables supplied by the
user, the appropriate memory bit will appear
at LUT’s output pin. This is due to the fact
that the user-provided input bits act as the
select lines for the multiplexer(s) present
inside the LUT(s).
 It can be easily reconfigured to implement OR,
XOR, NAND, and NOR gates, which are the
basics to build up more complex functions.
Look-up Tables (2:1 MUX Example)
 Configuration memory holds output of truth table entries
 Internal signals connect to control signals of MUXs to select a values of the
truth tables for any given input signals
Interconnect
 Horizontal and vertical mesh of wire segments interconnected by programmable switches called programmable
interconnect points (PIPs). These PIPs are implemented using a transmission gate controlled by a memory bits from
the configuration memory.
 Consists of global routing connecting PLBs to I/O buffers, non-adjacent PLBs, and other embedded components.
Local routing connects PLBs to other adjacent PLBs and PLBs to global routing (done through a switch matrix)

 Several types of PIPs are used


 Cross-point = connects vertical or horizontal wire segments allowing turns
 Breakpoint = connects or isolates 2 wire segments
 Decoded MUX = group of 2^n cross-points connected to a single output configure by n configuration bits
 Non-decoded MUX = n wire segments each with a configuration bit (n segments)
 Compound cross-point = 6 Break-point PIPS (can isolate two isolated signal nets)
I/O Blocks

 Bi-directional Buffers
 Programmable for inputs or
outputs
 Tri-state controls bi-
directional operation
 Pull-up/down resistors
 FFs/ Latches are used to
improve timing issues
 Set-up and hold times
 Clock-to-out delay
 Routing Resources
 Connections to core of array
 Programmable I/O voltage and
current levels
FPGA
Design Flow
Design Entry There are different techniques for design entry.
Schematic based, Hardware Description
Language and combination of both etc. Selection
of a method depends on the design and designer.

If the designer wants to deal more with


Hardware, then Schematic entry is the better
choice.

Design is complex or the designer thinks the


design in an algorithmic way then
HDL(VHDL/Verilog) is the better choice.
Synthesis

 This process which translates


HDL code into a device
netlist format.
 Synthesis process will check
code syntax and analyse the
hierarchy of the design is
optimized for the design
architecture, the designer
has selected.
 The resulting netlist is saved
to an NGC(Native Generic
Circuit) file.
Implementation
 This process consists a sequence of three steps
 Translate
 Translate process combines all the input netlists and constraints to a logic design file. This information is
saved as a NGD (Native Generic Database) file. This can be done using NGD Build program.
 Here, defining constraints is nothing but, assigning the ports in the design to the physical elements (ex.
pins, switches, buttons etc) of the targeted device and specifying time requirements of the design. This
information is stored in a file named UCF (User Constraints File).
 Map
 Map process divides the whole circuit with logical elements into sub blocks such that they can be fit into
the FPGA logic blocks.
 That means map process fits the logic defined by the NGD file into the targeted FPGA elements
(Combinational Logic Blocks (CLB), Input Output Blocks (IOB)) and generates an NCD (Native Circuit
Description) file which physically represents the design mapped to the components of FPGA.
 Place and Route
 The place and route process places the sub blocks from the map process into logic blocks according to the
constraints and connects the logic blocks.
 Ex. if a sub block is placed in a logic block which is very near to IO pin, then it may save the time but it
may effect some other constraint. So trade off between all the constraints is taken account by the place
and route process The PAR tool takes the mapped NCD file as input and produces a completely routed NCD
file as output.
Device
Programming
 Now the design must be loaded on
the FPGA. But the design must be
converted to a format so that the
FPGA can accept it. BITGEN
program deals with the conversion.
 The routed NCD file is then given
to the BITGEN program to generate
a bit stream (a .BIT file) which can
be used to configure the target
FPGA device.
 This can be done using a JTAG
cable. Selection of cable depends
on the design.
Behavioral Simulation
 Behavioral Simulation (RTL Simulation)
 This is first of all simulation steps; those are encountered throughout the
hierarchy of the design flow.
 This simulation is performed before synthesis process to verify RTL
(behavioral) code and to confirm that the design is functioning as intended.
 Behavioral simulation can be performed on either VHDL or Verilog designs. In
this process, signals and variables are observed, procedures and functions are
traced and breakpoints are set.
 Since the design is not yet synthesized to gate level, timing and resource
usage properties are still unknown.
Functional simulation (Post Translate
Simulation)

 Functional simulation gives information about the logic operation of the


circuit.
 Designer can verify the functionality of the design using this process after the
Translate process.
 If the functionality is not as expected, then the designer has to made changes
in the code and again follow the design flow steps.
Static Timing Analysis
 This can be done after MAP or PAR processes Post MAP timing report lists
signal path delays of the design derived from the design logic.
 Post Place and Route timing report incorporates timing delay information to
provide a comprehensive timing summary of the design.
 Timing simulation: simulates the real time operation of the circuit, with
timing models of blocks for the specified test vectors.
 Time consuming for exhaustive simulation
 STA, analyzes various path delay from Block and wire delays.
 A path that is never used in circuit operation may be reported (False paths)
 Registers which are not enabled every clock cycle may be reported (Multi-
cycle paths)
STA: Sequential
Circuit
 Register to register path decides
the clock frequency. But, if other 2
exceeds one need to choose the
maximum value as the minimum
clock period.
 In real life, this is not a great
concern many a time we are
designing some IPs which goes
inside the chip interfaced to other
blocks close by. Even in case inputs
are outputs are brought to external
pins, proper placement should take
care of these delays.
STA: Sequential Circuit

 Clock to Setup: Register to register path with longest delay


 Clock to Setup on destination clock <clk_signal>
 Clock to PAD: FF output delay – from FF output to chip output pin
 Clock <clk_signal> to Pad
 Setup to Clock: Setup / Hold time of FF with respect to input pin/pad
 Setup/Hold to clock <clk_signal>
False Path
Multi-cycle path
Critical Path
End of segment