You are on page 1of 74

Architecture Details

XILINX
5/7/2014 For Academic Use Only 2
Virtex-II

CPLDs
Low Power
FPGAs
SRAM-based
Feature Rich
High Performance
Spartan-IIE

Density (System Gates)
F
e
a
t
u
r
e
s

FPGAs
SRAM-based
Feature Rich
Low Cost
10K 600K 10M
Xilinx
Your Programmable Logic Solution
Configurable Elements of SPARTAN IIE
Family FPGA

IOBs provide the interface between the package pins and
the internal logic

CLBs (Configurable Logic Blocks) provide the functional
elements for constructing most logic

Dedicated block RAM memories of 4096 bits each

Clock DLLs (Delay Locked Loop) for clock-distribution
delay compensation and clock domain control

Versatile multi-level interconnect structure
Configurable Elements of SPARTAN IIE
Family FPGA

. . .
. . .
.

.

.

.

.

.

I
O
B
I
O
B
I
O
B
I
O
B
CLB CLB
R
A
M
R
A
M
R
A
M
R
A
M
IOB IOB DLL DLL
CLB CLB
DLL IOB IOB DLL
System Interfaces
Spartan-IIE Features
Features:
Support a wide variety of I/O
signaling standards
Voltages: 3.3V,2.5V, 1.8V,
1.5V
19 I/O standards ,8 flexible
I/O banks
The 3 IOB registers
3 IOB registers: in, out, 3-
state

. . .
. . .
.

.

.

.

.

.

I
O
B
I
O
B
I
O
B
I
O
B
CLB CLB
R
A
M
R
A
M
R
A
M
R
A
M
IOB IOB DLL DLL
CLB CLB
DLL IOB IOB DLL
Input/Output Block
SelectI/O
TM
Standards
V
CCO
defines output
voltage
Standard V
REF
V
CCO
Chip to Chip Interface
LVTTL na 3.3
LVCMOS2 na 2.5
LVCMOS18 na 1.8
LVDS na 2.5
LVPECL na 3.3
Backplane Interface
PCI 33MHz 3.3V na 3.3
PCI 66MHz 3.3V na 3.3
GTL 0.80 na
GTL+ 1.00 na
AGP-2X 1.32 3.3
Bus LVDS na 2.5
Memory Interface
HSTL-I 0.75 1.5
HSTL-III & IV 0.90 1.5
SSTL3-I & II 1.50 3.3
SSTL2-I & II 1.25 2.5
CTT 1.50 3.3
User I/O Pin
V
CCO
V
REF
Internal
Reference
Output
Input
V
REF
defines input threshold reference voltage
Available as user I/O when using internal
reference

The 3 IOB registers function as edge triggered
D flip flops or as level sensitive latches

Each IOB has a clock signal shared by the 3
registers and independent Clock enable signals,
Set/Reset signal

All pads are protected against damage from
electrostatic discharge (ESD) and from over-
voltage transients.

Pull-up and Pull-down drivers can be
individually controlled





Input/ Output Block

A buffer in the Spartan-IIE IOB input path routes
the input signal directly to internal logic or through
an optional input flip-flop.

There are optional pull-up and pull-down
resistors at each input for use after configuration.

An optional delay element at the D input of this
FF eliminates pad to pad hold time


Input Path
Output Path
The output path includes a 3-state output buffer that
drives the output signal onto the pad.

The output signal can be routed to the buffer directly
from the internal logic or through an optional IOB output
flip-flop.

The 3 state control of the output can also be routed
directly from the internal logic or through a flip flop that
provides a synchronous enable and disable.


D
EC
Q
SR
D
EC
Q
SR
D
EC
Q
SR
Three-State
Control
Output Path
Input Path
Three-State
Output
Clock
Set/Reset
Direct Input
Registered
Input
FF Enable
FF Enable
FF Enable
Basic I/O Block Structure
Standard V
REF
V
CCO
Chip to Chip Interface
LVTTL na 3.3
LVCMOS2 na 2.5
LVCMOS18 na 1.8
LVDS na 2.5
LVPECL na 3.3
Backplane Interface
PCI 33MHz 3.3V na 3.3
PCI 66MHz 3.3V na 3.3
GTL 0.80 na
GTL+ 1.00 na
AGP-2X 1.32 3.3
Bus LVDS na 2.5
Memory Interface
HSTL-I 0.75 1.5
HSTL-III & IV 0.90 1.5
SSTL3-I & II 1.50 3.3
SSTL2-I & II 1.25 2.5
CTT 1.50 3.3
User I/O Pin
V
CCO
V
REF
Internal
Reference
Output
Input
VREF defines input threshold
reference voltage
Available as user I/O when using
internal reference
Select I/O Standards
CLB
Spartan-IIE Features

The basic building block of the Spartan-IIE
CLB is the logic cell (LC).

An LC includes a 4-input function generator,
carry logic, and storage element.

Each Spartan-IIE CLB contains four LCs,
organized in two similar slices

Two 3-state buffers (BUFT) associated with
each CLB, accessible by all CLB outputs

CLB contains logic that combines function
generators to provide functions of five or six
inputs.



Configurable Logic Block
F5IN
CIN
CLK
CE
COUT
D Q
CK
S
R
EC
D Q
CK
R
EC
O
G4
G3
G2
G1
Look-Up
Table
Carry
&
Control
Logic
O
YB
Y
F4
F3
F2
F1
XB
X
Look-Up
Table
BY
SR
S
Carry
&
Control
Logic
SLICE
COUT
D Q
CK
S
R
EC
D Q
CK
R
EC
O
G4
G3
G2
G1
Look-Up
Table
Carry
&
Control
Logic
O
YB
Y
F4
F3
F2
F1
XB
X
Look-Up
Table
F5IN
BY
SR
S
Carry
&
Control
Logic
CIN
CLK
CE
SLICE
CLB Structure
CLB Slice Structure
Each slice contains two sets of the
following:
Four-input LUT
Any 4-input logic function
Or 16-bit x 1 sync RAM
Or 16-bit shift register

CLB Slice Structure (Contd..)
Carry & Control
Includes an XOR gate to implement 1 bit full
adder within an LC
A dedicated AND gate improves the efficiency
of multiplier implementation
Fast arithmetic logic
Fast DSP functions FIR filters


CLB Slice Structure (Contd..)
Storage element
Latch or flip-flop
Clock and clock enable signals
Synchronous set and reset signals (SR and
BY)
The D inputs can be driven either by
function generators or directly from slice
inputs
Synchronous or asynchronous Control

Inputs(ABCD) Output(Z)
0000 0
0001 0
0010 0
0011 1
..
1110 1
1111 0
Truth Table
LUT
=
4-input logic function
C
D
Z
A
B
Four-Input LUT
Implements combinatorial logic
Any 4-input logic function
Cascaded for wide-input
functions

CLB
MUXF6
Slice
LUT
LUT
MUXF5
Slice
LUT
LUT
MUXF5
Dedicated Expansion Multiplexers
MUXF5 combines 2 LUTs to create

4x1 multiplexer

Or any 5-input function (LUT5)

Or selected functions up to 9
inputs
Dedicated Expansion Multiplexers (Contd..)
MUXF6 combines 2 slices to form
8x1 multiplexer
Or any 6-input function (LUT6)
Or selected functions up to 19 inputs
Dedicated muxes are faster and more
space efficient

RAM16X1S
O
D
WE
WCLK
A0
A1
A2
A3
RAM32X1S
O
D
WE
WCLK
A0
A1
A2
A3
A4
RAM16X2S
O1
D0
WE
WCLK
A0
A1
A2
A3
D1
O0
=
=
LUT
LUT
or
LUT
RAM16X1D
SPO
D
WE
WCLK
A0
A1
A2
A3
DPRA0 DPO
DPRA1
DPRA2
DPRA3
or
Distributed RAM
CLB LUT configurable as
Distributed RAM
A LUT equals 16x1 RAM
Implements Single and Dual-
Ports
Cascade LUTs to increase
RAM size
Synchronous write
Synchronous/Asynchronous
read
Accompanying flip-flops used
for synchronous read
D Q
CE
D Q
CE
D Q
CE
D Q
CE
LUT
IN
CE
CLK
DEPTH[3:0]
OUT
LUT
=
Shift Register
Each LUT can be
configured as shift
register
Serial in, serial out
Dynamically addressable
delay up to 16 cycles
For programmable
pipeline
Cascade for greater cycle
delays
Use CLB flip-flops to add
depth
CLB Arithmetic Logic
Dedicated carry logic
Provides high performance for
counters & arithmetic functions

Discrete XOR component for
single level sum completion
Two separate carry chains in CLB
allow for 3 operand functions
Can also be used to cascade LUTs for wide-
input logic functions
Single-level Sum
LUT
0 1
LUT
0 1
LUT
0 1
LUT
0 1
COUT
Look-Up
Table
SLICE0
CIN
COUT
O
Look-Up
Table
Carry
&
Control
Logic
Look-Up
Table
SLICE1
CIN
CLB
Look-Up
Table
B1
B0
A1
A0
C1
C0
SUM1
SUM0
PARTIAL0
PARTIAL1
Carry
&
Control
Logic
Carry
&
Control
Logic
Carry
&
Control
Logic
3 Operand Adder Function
A, B, C are two-bits wide
SUM = A + B + C or PARTIAL + C, where PARTIAL = A + B
Implementation
First 2-operand sum A+B is performed in Slice 0
Second 2-operand sum PARTIAL + C is performed in Slice 1
Fast local feedback connection within the CLB
Very small delay for on PARTIAL
12- Input AND Function
Utilization
3 LUTs and 3 MUXCYs
As opposed to 4 LUTs
Performance
1 logic level
As opposed to 2 logic
levels
0 1
INIT=8000
0 1
INIT=8000
0 1
INIT=8000
Output
Vcc
LUT1
LUT2
LUT3
D
C
B
A
H
G
F
E
L
K
J
I
MUXCY
MUXCY
MUXCY
4-Input AND Truth Table
Inputs(ABCD) Output(Z) Output(HEX)
0000 0
0001 0
0010 0
0011 0
.. ..
1011 0 ..
1100 0
1101 0
1110 0
1111 1
0
8
12- Input OR Function
0 1
INIT=0001
0 1
INIT=0001
0 1
INIT=0001
Vcc
Vcc
Vcc
Output
LUT1
LUT2
LUT3
D
C
B
A
H
G
F
E
L
K
J
I
MUXCY
MUXCY
MUXCY
4-Input NOR Truth Table
Inputs(ABCD) Output(Z) Output(HEX)
0000 1
0001 0
0010 0
0011 0
.. ..
1011 0 ..
1100 0
1101 0
1110 0
1111 0
1
0
Utilization
3 LUTs and 3 MUXCYs
As opposed to 4 LUTs
Performance
1 logic level
As opposed to 2 logic
levels
CO
DI CI
S
LUT
CY_MUX
CY_XOR
MULT_AND
A
B
A x B
Dedicated AND gate
Dedicated CLB Multiplier Logic
Dedicated AND gate
Highly efficient Shift & Add implementation
For a 16x16 Multiplier
30% reduction in area and one less logic level
Lower Operating Power
1.8V core supply
Reduces power consumption
Advanced signaling standards
Smaller voltage transitions
Reduces switching power
DLLs reduce clock speed requirements
Faster clock propagation
Internal multiplication of clock
Reduces power on clock nets




Logic Summary
Flexible Configurable Logic Block (CLB)
implementations
Logic
Distributed RAM
Shift register
CLB configurable for simple to complex
logic
Any 6 input function into one logic level
Excellent for fast arithmetic operations
Specialized carry logic for arithmetic
operations
Fast DSP functions FIR filters
Device Part No
Spartan 2E family
Choice of packages
Logic & Routing
Spartan-IIE Features
Programmable Routing
The longest delay path limits the speed of any
design
Spartan IIE architecture and its place and route
software jointly used to minimize the long path
delays.
The software automatically uses the best
available routing based on user timing
requirements.
High-Performance Routing
Local routing
Direct connections
General Routing Matrix
(GRM)
Single line, Long line, Hex
line
Dedicated routing
Global routing, Clock distribution,
internal 3-state busses
I/O routing
Interface between CLBs and IOBs
SINGLE
HEX
LONG
SINGLE
HEX
LONG
S
I
N
G
L
E
H
E
X
L
O
N
G
S
I
N
G
L
E
H
E
X
L
O
N
G
TRISTATE BUSSES
SWITCH
MATRIX
SLICE SLICE
Local
Feedback
C
A
R
R
Y
C
A
R
R
Y
CLB
C
A
R
R
Y
C
A
R
R
Y
INTERNAL BUSSES
Single-length lines
Buffered Hex lines
Direct connections
Long lines and Global lines
Internal 3-state busses
General Routing
Matrix (GRM)
Local Routing
Interconnect among LUTs, FFs, GRM
CLB feedback path for connections to LUTs in same
CLB with minimal routing dealy
Direct path between horizontally adjacent CLBs
Local Routing
General Purpose Routing

SINGLE
HEX
LONG
SINGLE
HEX
LONG
S
I
N
G
L
E
H
E
X
L
O
N
G
S
I
N
G
L
E
H
E
X
L
O
N
G
TRISTATE BUSSES
SWITCH
MATRIX
SLICE SLICE
Local
Feedback
C
A
R
R
Y
C
A
R
R
Y
CLB
C
A
R
R
Y
C
A
R
R
Y
DIRECT
CONNECTION
INTERNAL BUSSES
Single-length lines
Buffered Hex lines
Direct connections
Long lines and Global lines
Internal 3-state Bus
General Purpose Routing (Contd..)
General Purpose Routing resources are:

General Routing Matrix is the switch matrix through
which horizontal and vertical routing resources are
connected.
24 single-length lines
Route GRM signals to adjacent GRMs in 4 directions
96 buffered hex lines
Route GRM signals to another GRMs six blocks away
in each of the four directions
12 buffered bidirectional Long lines
Routing across top and bottom, left and right





CLB Array
Routing Summary
Vector-based routing
Predictable routing delays independent
of device size and routing direction
Core-friendly architecture
Quick Place and Route times
Design to system at 100,000 gates per minute
Easier re-routing
Internal 3-state bussing
Eliminates bus routing contention
Improves density and performance
Embedded
Memory
Spartan-IIE Features
Embedded memory

Block RAM memory blocks are organized in
columns, usually in two columns, one along
each vertical edge.

These complement the distributed RAM Look-
Up Tables that provide shallow memory
structures implemented in CLBs.

DSP Coefficients
Small FIFOs
Scratch Pad
16x1
Distributed RAM
Single-port
Dual port
Cascadable
Cache Tag memory
Large FIFOs
Packet buffers
Video line buffers
Block RAMs
4Kbit blocks
True dual-port
SDRAM
SGRAM
PB SRAM
DDR SRAM
ZBT SRAM
QDR SRAM
High-Performance External
Memory Interfaces
DDR I/O
SSTL, HSTL, CTT
Spartan-IIE Memory Hierarchy
D
CL
K
A3
A2
A1
A0
Q
SRL16
D
CL
K
A3
A2
A1
A0
Q SRL16E
C
E
Shift Register LUT
16 registers, 1 LUT
Compact & fast
Pipelining
Buffers
Block RAM
4Kx1
2Kx2
1Kx4
512x8
256x16
P
o
r
t

A

P
o
r
t


B

Collaboration with
memory vendors
IDT, Cypress,
Micron, NEC,
Samsung, Toshiba...
Bytes
Kilobytes
RAM16X1S
O
D
WE
WCLK
A0
A1
A2
A3
RAM32X1S
O
D
WE
WCLK
A0
A1
A2
A3
A4
RAM16X2S
O1
D0
WE
WCLK
A0
A1
A2
A3
D1
O0
=
=
LUT
LUT
or
LUT
RAM16X1D
SPO
D
WE
WCLK
A0
A1
A2
A3
DPRA0 DPO
DPRA1
DPRA2
DPRA3
or
Distributed RAM
CLB LUT configurable as
Distributed RAM
A LUT equals 16x1 RAM
Implements single and
dual ports
Cascade LUTs to
increase RAM size
Synchronous write
Synchronous/Asynchronous
read
Accompanying flip-flops used for
synchronous read
SRL-16 and SRL-16E
D
CLK
A3
A2
A1
A0
Q SRL16
16-bit Shift Register Look-Up-Table
D
CLK
A3
A2
A1
A0
Q SRL16E
CE
16-bit Shift Register Look-Up-Table with
Clock Enable
D Q
CE
D Q
CE
D Q
CE
D Q
CE
LUT
IN
CE
CLK
ADDR[3:0]
OUT
Slice
LUT
LUT
Slice
LUT
LUT
CLB
DPA[3:0]
A[3:0]
WE
D
WCLK
DPO
SPO
RAM16X1D
16 x 1
RAM
16 x 1
RAM
Distributed RAM
Dual-Port Implementation
2 LUTs equal 16x1 dual-port
RAM
A Port
Uses A[3:0] address
Write and read
B Port
Uses DPA[3:0] address
Read only
Excellent for FIFOs, scratch
pads.
Block RAM
Spartan-IIE
True Dual-Port
Block RAM
P
o
r
t

A

P
o
r
t


B

Block RAM
Most efficient memory implementation
Dedicated blocks of memory
Ideal for most memory requirements
8 to 72 memory blocks
4096 bits per blocks
Use multiple blocks for larger memories
Builds both single and true dual-port RAMs
CORE Generator provides custom-sized block
RAMs
Quickly generates optimized RAM implementation
Device No. of Blocks Block RAM Bits
XC2S50E 8 32,768
XC2S100E 10 40,960
XC2S150E 12 49,152
XC2S200E 14 57,344
XC2S300E 16 65,536
XC2S400E 40 163,840
XC2S600E 72 294,912
Block RAM
Configurable synchronous Block RAM
Single-port RAM
True dual-port RAM
Two independent single-port RAMs
Block count increases with FPGA size
Block RAM
Flexible 4096-bit block Variable aspect
ratio
4096 x 1
2048 x 2
1024 x 4
512 x 8
256 x 16
Increase memory depth or width by
cascading blocks
RAMB4_S4_S16
Port A Out
4-Bit Width
Port B In
256-Bit Depth
Port A In
1K-Bit Depth
Port B Out
16-Bit Width
DOA[3:0]
DOB[15:0]
WEA
ENA
RSTA
ADDRA[9:0]
CLKA
DIA[3:0]
WEB
ENB
RSTB
ADDRB[7:0]
CLKB
DIB[15:0]
Dual-Port Bus Flexibility
Each port can be configured with a different data
bus width
Provides easy data width conversion without any
additional logic
Content Addressable Memory (CAM)
Storage array like a RAM
Functionally opposite of a RAM
Quickly find the location of a particular stored value
Output the address and toggle the MATCH line, if data match is
found



Used in telecommunications, networking, Ethernet,
ATM switches
Xilinx provides reference designs and application notes
1024x8
ADD[9:0]
DATA [7:0]
RAM
1024x8
DATA[7:0] ADD [9:0]
CAM
MATCH
CAM in Block RAM
External Memory Interface
Easy access to high-
speed external memory
External Memory Type SelectI /O Stardard
SRAM SSTL
SGRAM HSTL
ZBT SRAM/NoBL LVTTL
QDR SRAM HSTL
SDRAM LVTTL
DDR SRAM SSTL2
EDO TTL
FPM TTL
PB TTL
PC100/133 LVTTL / SSTL
SelectI/O
TM
provides
interface to most
memory types
Embedded Memory Summary
Fast distributed RAM
Data right beside logic
Memory requirements solved by Block
RAM
Single and True Dual-Port RAM
implementations
FIFO for buffering data
Data width conversion
Cache
Register stacks
CAM for high-speed parallel searches
Many more
Direct connection to external high-speed
memory
System Clock
Management
Spartan-IIE Features
System Clock Management
Delay Locked Loops Lower Board Costs
. . .
. . .
.

.

.
.

.

.
I
O
B
I
O
B
I
O
B
I
O
B
CLB CLB
R
A
M
R
A
M
R
A
M
R
A
M
IOB IOB DLL DLL
CLB CLB
DLL IOB IOB DLL
100% Digital DLL Design
Noise insensitive
Scalable to new processes
Excellent Jitter
specifications
+/- 100ps, <<50ps Typical
No cumulative phase error
Used in advanced
memories
Every Spartan-IIE device
has
4 DLLs
External clock outputs
4 DLLS in every device
Delay Locked Loop
Eliminates skew between the clock input and
internal clock input pins throughout the device
DLL monitors the input clock and the distributed
clock, and automatically adjusts a clock delay
element.
Additional delay is introduced such that clock
edges reach internal flip flops one clock period
after they arrive at the input.
Delay Delay Delay Delay
CLKIN
Phase Delay
Control
CLKOUT
CLKFB
Clock
Distribution
Network
Generic DLL Operation
A DLL inserts delay on the clock net until the
clock input rising edge is in phase with the clock
feedback rising edge
Requires a well-designed clock distribution
network: the clock edges arrive simultaneously
everywhere in the part
DLL Capabilities
Easy clock duplication
System clock distribution
Cleans and reconditions incoming clock
Quick and easy frequency adjustment
Single crystal easily generates multiple clocks

Faster state machine utilizing different clock
phases
Excellent for advance memory types

De-skew incoming clock
Generate fast setup and hold time or fast
clock-to-outs
Clock
De-skew
System Clock Management
Summary
All digital DLL Implementation
Input noise rejection
50/50 duty cycle correction
Clock mirror provides system clock
distribution
Multiply input clock by 2x or 4x
Divide clock by 1.5, 2, 2.5, 3, 4, 5, 8, or 16
Provides 0, 90, 180, and 270 clock phase
shift
De-skew clock for fast setup, hold, or
clock-to-out times
Configuration
Spartan-IIE Features
Configuration Basics
Spartan-IIE device
Is SRAM-based and hence volatile
Needs a configuration data source
Needs to be re-configured (re-programmed) upon power-up
ISP
Re-programmable/upgradable in the field
Configuration
Programming the device with design logic
Simple Serial Interface
System Integrated Serial
High Performance Parallel

Configuration
Data Source

Software Tools
Programmable
Silicon
Pervasive
Networking
IRL
Configuration
Configuration data source
PROM
Serial/Parallel PROMs
Hard disk
Microprocessor memory
Configuration interface
Simple serial
High-speed parallel
JTAG or boundary scan
IRL
Microprocessor
CPLD
Master Serial Mode
Spartan-IIE device acts like a master
Generates configuration clock (CCLK) using internal
oscillator
PROM stores the configuration data
Configuration rate selectable from 4-60 MHz
-30% to +45% variance due to process dependence
Slave Serial Mode
Spartan-IIE device acts like a slave
An external clock source drives the CCLK pin
Configuration data is stored in PROM, flash,
micro- controller or microprocessor memory
Maximum configuration rate of 66 MHz



Slave Parallel Mode
Single or multiple Spartan-IIE devices connected in
parallel
Slave Parallel Mode (contd)
Spartan-IIE device acts like a slave
An external clock source drives the CCLK pin
Microprocessor, Microcontroller or CPLD controls
configuration
Configuration data is stored in parallel PROM,
flash, Microcontroller or Microprocessor memory
Fastest configuration mode
8 bits per CCLK cycle
50MHz configuration rate (400 Mbit/sec)
Supports Readback
Bi-directional read/write port for configuration and
readback
JTAG Basics
Also known as
IEEE/ANSI standard 1149.1
Boundary scan
Set of design rules that facilitate
Testing
Programming
Debug
Can be done at the chip, board, and systems
level
Can also have user-defined instructions
Example: vendor-specific instructions: configure and
verify
JTAG Basics (contd)
Rapid and automatic detection and isolation of
defects due to common failures
Detect opens and shorts
Ensure all components on PCB are
Mounted properly
In right place
Have proper interconnects among them
Allows complete control and access to the
boundary pins of a device without the need for
Bed-of-nails
Other test equipment
JTAG Compliant Device
Includes a boundary-scan cell connected to
each input, output or bi-directional pin
Transparent and inactive under normal conditions
Test mode
Input signals captured and output signals set to affect
other devices on the board

JTAG Mode
Supports readback through boundary scan port
Can mix any Xilinx device (FPGA, CPLD,
PROM) and non-Xilinx devices in the chain
JTAG Mode (Contd)
Dedicated TDI, TCK, TDO and TMS pins
must operate at LVTTL
V
CCO
for bank 2 must be at 3.3V
Maximum configuration rate of 33 MHz

You might also like