4/4/2011

EE 811 Advanced Digital System Design
Dr. Arshad Aziz

Basic FPGA Architecture

Technology Timeline
1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 Transistors ICs (General) SRAMs & DRAMs Microprocessors SPLDs CPLDs ASICs FPGAs

The Design Warrior’s Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright © 2004 Mentor Graphics Corp. (www.mentor.com)

1

4/4/2011

Major FPGA vendors
SRAM-based FPGAs Xilinx Inc. – www xilinx com Inc www.xilinx.com Altera Corp. – www.altera.com Atmel Corp. – www.atmel.com Lattice Semiconductor Corp.– www.latticesemi.com Antifuse and fl h b A tif d flash-based FPGA d FPGAs Actel Corp. – www.actel.com QuickLogic Corp. – www.quicklogic.com

Feature
Technology node Reprogrammable Reprogramming speed (inc. erasing) Volatile (must be programmed on power-up) Requires external configuration file Good for prototyping Instant-on IP Security Size of configuration cell Power consumption Rad Hard

SRAM
State-of-the-art Yes (in system) Fast

Antifuse
One or more generations behind No

E2PROM / FLASH
One or more generations behind Yes (in-system or offline) 3x slower than SRAM No (but can be if required) No Yes (reasonable) Yes Very Good Medium-small (two transistors) Medium Not really

----

Yes

No

Yes Yes (very good) No Acceptable
(especially when using bitstream encryption)

No No Yes Very Good Very small Low Yes

Large (six transistors) Medium No

2

4/4/2011

The Programmable Marketplace
Q1 Calendar Year 2005
PLD Segment Actel Lattice L tti 5% 7% QuickLogic: Q i kL i 2% Other: 2% FPGA Sub-Segment

Xilinx

58% 33% 51% 31% Altera 11%

Xilinx

Altera

All Others

Source: Company reports Latest information available; computed on a 4-quarter rolling basis

FPGA Families
Low-cost
– – – Spartan 3 Spartan 3E Spartan 3L

High-performance
Virtex 4 LX / SX / FX Virtex 5 LX

Xilinx

Cyclone II

Stratix II Stratix II GX

Altera

3

] • Xilinx Primary products: FPGAs and the associated CAD software Programmable Logic Devices ISE Alliance and Foundation Series Design Software • • Main headquarters in San Jose.4/4/2011 Xilinx • Primary products: FPGAs and the associated CAD software Programmable Logic Devices ISE Alliance and Foundation Series Design Software • • Main headquarters in San Jose. CA Fabless* Semiconductor and Software Company • UMC (Taiwan) {*Xilinx acquired an equity stake in UMC in 1996} • Seiko Epson (Japan) • TSMC (Taiwan) Source: [Xilinx Inc. CA Fabless* Semiconductor and Software Company  UMC (Taiwan) {*Xilinx acquired an equity stake in UMC in 1996}  Seiko Epson (Japan) • TSMC (Taiwan) Source: [Xilinx Inc.] 4 .

mentor. Virtex-II PRO (130 nm) – Virtex-4 (90 nm) – Virtex 5 (65 nm) Source: [Xilinx Inc. XC4000.5µm.com) 5 . Tools. 0.4/4/2011 Xilinx FPGA Families • Old families – XC3000.] General structure of an FPGA The Design Warrior’s Guide to FPGAs Devices. Not recommended for modern designs. Virtex-EM (180 nm) – Virtex-II.35µm and 0.25µm technology. ISBN 0750676043 Copyright © 2004 Mentor Graphics Corp. (www. and Flows. • L Low Cost F il C t Family – Spartan/XL – derived from XC4000 – Spartan-II – derived from Virtex – Spartan-IIE – derived from Virtex-E – Spartan-3 (90 nm) – Spartan-3E (90 nm) – Spartan-3A (90 nm) • High-performance families High performance – Virtex (220 nm) – Virtex-E. XC5200 – Old 0.

4/4/2011 Xilinx FPGA Configurable Logic Blocks Block RAMs Block RAMs I/O Blocks Block RAMs Generic FPGA architecture: Configurable Logic Block (CLB) (CLB) Connection Block Wire segments Switch Block Routing Channels I/O pad 6 .

4/4/2011 Xilinx CLB Configurable logic block (CLB) Slice CLB CLB Logic ll L i cell Logic cell Slice Logic ll L i cell Logic cell Slice CLB CLB Logic cell Logic cell Slice Logic cell Logic cell The Design Warrior’s Guide to FPGAs Devices.com) Xilinx Point of Reference • A Xilinx CLB has FOUR slices – Each slice has TWO logic cells – Each logic cell has TWO LUTs plus other logic (carry and control) plus a flip-flop/latch • For SLICEL slices.mentor. 16-bit Shift Register 7 . these LUTs can be configured as: 1. ISBN 0750676043 Copyright © 2004 Mentor Graphics Corp. and Flows. Tools. 1 LUT • For SLICEM slices. 16 x 1 Distributed RAM (16 words x 1 bit/word) 3. these LUTs can be configured as: 1. LUT 2. (www.

and Flows.mentor.4/4/2011 CLB Structure of Spartan 3 • Each Virtex-II CLB contains four slices – Local routing provides feedback between slices in the same CLB.com) 8 . (www. Tools. and it provides routing to neighboring CLBs – A switch matrix provides access to general routing resources Switch Matrix COUT BUFT BUF T Slice S3 COUT Slice S2 SHIFT Slice S1 Slice S0 Local Routing CIN CIN Simplified view of a Xilinx Logic Cell 16-bit SR 16x1 RAM a b c d e clock clock enable set/reset 4-input p LUT y mux flip-flop q The Design Warrior’s Guide to FPGAs Devices. ISBN 0750676043 Copyright © 2004 Mentor Graphics Corp.

4/4/2011 Simplified Slice Structure • Each slice has four outputs – Two registered outputs. accessible by all 16 CLB outputs Slice 0 LUT Carry PRE D Q CE CLR • Carry logic runs vertically. two non-registered outputs – Two BUFTs associated with each CLB. MUXF8 (only the F5 and F6 MUX are shown in this diagram) – Carry Logic – MULT_ANDs – Sequential Elements 9 . MUXF7. up only – Two independent carry chains per CLB LUT Carry D PRE Q CE CLR Detailed Slice Structure • The next few slides discuss the slice features – LUTs – MUXF5. MUXF6.

Look-Up Tables • Combinatorial logic is stored in Look-Up Tables (LUTs) – Also called Function Generators (FGs) – Capacity is limited by the number of inputs. If SRAM (M) = 1 then signals passes from S  D An SRAM cell can be attached to the select line of a MUX to control it.4/4/2011 SRAM Cell (Pass Transistor) • • • An SRAM cell can drive the gate (G) terminal of an NMOS transistor. not by the complexity A B C D Z 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 1 1 0 1 0 0 1 0 1 0 1 1 . . 1 1 0 0 0 1 1 0 1 0 1 1 1 0 0 1 1 1 1 1 10 . Z • Delay through the LUT is constant Combinatorial Logic A B C D .

Assume the function to be realized is y = (a&b) | !c This could be achieved by loading the LUT with the appropriate output values LUT (Look-Up Table) Functionality x1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 x2 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 x3 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 x4 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 y 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 x1 x2 x3 x4 LUT y x1 x2 x3 x4 x1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 x2 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 x3 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 x4 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 y 0 1 0 0 0 1 0 1 0 1 0 0 1 1 0 0 • Look-Up tables are primary elements for logic implementation • Each LUT can implement any function of 4 inputs i x1 x2 y y 11 .4/4/2011 Look Up Table (LUT) • • • The LUT is used to realize any Boolean function.

4/4/2011 5-Input Functions implemented using two LUTs • • • One CLB Slice can implement any function of 5 inputs Logic function i partitioned b L i f i is ii d between two LUT LUTs F5 multiplexer selects LUT A4 A3 A2 A1 WS DI 0 LUT ROM RAM D F5 F5 GXOR G F4 F3 F2 F1 BX A4 A3 A2 A1 WS DI D 1 X LUT ROM RAM nBX BX 1 0 5-Input Functions implemented using two LUTs X X X X X 5 4 3 2 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 1 0 0 1 0 0 0 0 1 0 1 0 0 1 1 0 0 0 1 1 1 0 1 0 0 0 0 1 0 0 1 0 1 0 1 0 0 1 0 1 1 0 1 1 0 0 0 1 1 0 1 0 1 1 1 0 0 1 1 1 1 1 0 0 0 0 1 0 0 0 1 1 0 0 1 0 1 0 0 1 1 1 0 1 0 0 1 0 1 0 1 1 0 1 1 0 1 0 1 1 1 1 1 0 0 0 1 1 0 0 1 1 1 0 1 0 1 1 0 1 1 1 1 1 0 0 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 Y 0 1 0 0 1 1 0 0 1 0 0 1 1 1 1 1 0 0 0 0 0 0 0 1 0 1 0 1 0 1 0 0 LUT OUT LUT 12 .

4/4/2011 Dedicated Expansion Multiplexers • • • MUXF5 combines 2 LUTs to create CLB Slice LUT LUT MUXF5 Slice LUT LUT MUXF5 MUXF6 • • • • • • Any 5-input function (LUT5) Or selected functions up to 9 inputs Or 4x1 multiplexer Any 6-input function (LUT6) Or selected functions up to 19 inputs 8x1 multiplexer MUXF6 combines 2 slices to form Dedicated muxes are faster and more space efficient Connecting Look-Up Tables F5 F8 CLB Slice S3 Slice S2 F7 MUXF8 combines the two MUXF7 outputs (from the CLB above or below) MUXF6 combines slices S2 and S3 MUXF7 combines the two MUXF6 outputs MUXF6 combines slices S0 and S1 MUXF5 combines LUTs in each slice Slice S1 F5 Slice S0 F6 F F5 F5 F6 13 .

4/4/2011 Programmable Logic Block • Early devices were based on the concept of programmable logic block. along with a few other elements. or 6-input LUTs? • • • • • The key feature of n-input LUT is that it can implement any possible n-input combinational logic function. which comprised • • • 3-input 3 input lookup table (LUT). 4-. 3-. some devices were created using a mixture of different LUT sizes because this offered the promise of optimal device utilization. Adding more inputs allows you to represent more complex functions. but every time you add an input. The current consensus is that 4-input LUTS offer the optimal balance of pros and cons. multiplexer. you double the number of SRAM cells! • • The first FPGAs were based on 3-input LUTs. However current logic synthesis tools prefer uniformity and regularity 14 . 5-. FPGA vendors and researchers studied the relative merits of 3. In the past. (LUT) register that could act as flip flop or a latch. 5 and even 6 input LUTS. 4.

4/4/2011 FPGA Function generators • • LUT Example: Implement the function using: 2 input 2-input LUTs 3-input LUTs 4-input LUTs F = ABD + BC D + A B C A B D B C D A B C A B D F B C D A B C F A B C D F Fast Carry Logic  – Increases efficiency and performance of adders. comparators. accumulators. subtractors.  Carry logic is independent of normal logic and routing resources LSB Carry Logic Routing Each CLB contains separate logic and routing for the fast MSB generation of sum & carry signals 15 . and counters p .

4/4/2011 Fast Carry Logic • Simple. fast.B) • Comparators (if A < B then…) • Counters (count <= count +1) 16 . and complete arithmetic Logic – Dedicated XOR gate for singlelevel sum completion – Uses dedicated routing resources ti – All synthesis tools can infer carry logic COUT To S0 of the next CLB COUT To CIN of S2 of the next CLB First Carry Chain SLICE S3 CIN COUT SLICE S2 SLICE S1 COUT CIN Second Carry Chain SLICE S0 CIN CIN CLB Accessing Carry Logic • All major synthesis tools can infer carry logic for arithmetic functions • Addition (SUM <= A + B) • Subtraction (DIFF <= A .

serial out IN CE CLK D CE Q D CE Q • Dynamically addressable delay up to 16 cycles • For programmable pipeline • Cascade for greater cycle delays d l • Use CLB flip-flops to add depth LUT = D CE Q OUT D CE Q DEPTH[3:0] 17 . eight in each CLB .4/4/2011 Flexible Sequential Elements • Either flip-flops or latches • Two in each slice. g • Inputs come from LUTs or from an independent CLB input • Separate set and reset controls – Can be synchronous or asynchronous _1 FDRSE D CE R S Q FDCPE D PRE Q CE CLR • All controls are shared within a slice – Control signals can be inverted locally within a slice LDCPE D PRE Q CE G CLR Shift Register LUT • Each LUT can be configured as shift register i t – Serial in.

NOP 64 3 Cycles 12 Cycles 9 Cycles Paths are Statically Balanced 18 .4/4/2011 Shift Register 12 Cycles Operation A 64 4 Cycles Operation C 3 Cycles 3 Cycles Operation B 8 Cycles 64 9-Cycle imbalance • Register-rich FPGA Register rich – Allows for addition of pipeline stages to increase throughput • Data paths must be balanced to keep desired functionality Shift Register LUT Example 12 Cycles Operation A Operation B 64 4 Cycles Operation C 8 Cycles Operation D .

read is asynchronous LUT = LUT or RAM16X2S D0 D1 WE WCLK A0 A1 A2 A3 O0 O1 RAM16X1D D WE WCLK A0 A1 A2 A3 DPRA0 DPO DPRA1 DPRA2 DPRA3 SPO • Two LUTs can make – 32 x 1 single-port RAM – 16 x 2 single-port RAM – 16 x 1 dual-port RAM or Xilinx Multipurpose LUT The Design Warrior’s Guide to FPGAs Devices.4/4/2011 Distributed RAM • CLB LUT configurable as Distributed RAM – An LUT equals 16x1 RAM – Cascade LUTs to increase RAM size LUT RAM16X1S = RAM32X1S D WE WCLK A0 A1 A2 A3 A4 O D WE WCLK A0 A1 A2 A3 O • Synchronous write • Asynchronous read – Can create a synchronous read by using extra flip-flops – Naturally distributed RAM Naturally. and Flows. ISBN 0750676043 Copyright © 2004 Mentor Graphics Corp.com) 19 . (www. Tools.mentor.

Tools. (www.4/4/2011 Simplified view of a Xilinx Logic Cell 16-bit SR 16x1 RAM a b c d e clock clock enable set/reset 4-input p LUT y mux flip-flop q The Design Warrior’s Guide to FPGAs Devices.mentor. ISBN 0750676043 Copyright © 2004 Mentor Graphics Corp. (www. and Flows. Tools.mentor. ISBN 0750676043 Copyright © 2004 Mentor Graphics Corp. and Flows.com) RAM Blocks and Multipliers in Xilinx FPGAs The Design Warrior’s Guide to FPGAs Devices.com) 20 .

t. Block RAM Port B Port A Spartan-3 Dual-Port Block RAM Block RAM • Most efficient memory implementation – Dedicated blocks of memory • Ideal for most memory requirements – 4 to 104 memory blocks • 18 kbits = 18 432 bits per block (16 k without parity bits) 18. so FPGAs now include relatively large chunks of embedded RAM called e-RAM or Block RAM (BRAM). ( ) Depending on the architecture of the component. these blocks might be positioned around the periphery of the device or organized as columns • These blocks can be used for a variety of purposes.432 – Use multiple blocks for larger memories • Builds both single and true dual-port RAMs • Synchronous write and read (different from distributed RAM) 21 . e. such as implementing standard single or dual port RAMs.4/4/2011 Embedded Ram Blocks • • A lot of applications require the use of memory. FIFO.c.

383 1024 x (16+2) 22 .4/4/2011 Spartan-3 Block RAM Amounts 1 0 Block RAM can have various configurations (port aspect ratios) 2 0 4 0 8k x 2 4.095 4k x 4 16k x 1 8.191 0 8+1 2k x (8+1) ( ) 2047 16+2 0 1023 16.

4/4/2011 Block RAM Port Aspect Ratios Single-Port Block RAM 23 .

4/4/2011 Dual-Port Block RAM Dual-Port Bus Flexibility RAMB4_S16_S8 WEA Port A In 1K-Bit Depth ENA RSTA CLKA ADDRA[9:0] DIA[17:0] WEB ENB DOA[17:0] Port A Out 18-Bit Width Port B In 2k-Bit Depth RSTB CLKB ADDRB[10:0] DIB[8:0] DOB[8:0] Port B Out 9-Bit Width • Each port can be configured with a different data bus width • Provides easy data width conversion without any additional logic 24 .

ADDR[12:0] 0 ADDR[12 0] WEA ENA RSTA CLKA ADDRA[12:0] DIA[0] DOA[0] Port A Out 1-Bit Width Port B In 8K-Bit Depth 1. 25 . like multipliers are inherently slow if they are implemented by connecting a large number of programmable logic blocks together. g g Current FPGA incorporate special hard wired multiplier blocks which are typically located in close proximity to the embedded RAM blocks (Arithmetic Based Applications). ADDR[12:0] WEB ENB RSTB CLKB ADDRB[12:0] DIB[0] DOB[0] Port B Out 1-Bit Width • • Added advantage of True Dual DualPort – No wasted RAM Bits • To access the lower RAM – Tie the MSB address bit to Logic Low Can split a Dual-Port 16K RAM into two Single-Port 8K RAM – Simultaneous independent access to each RAM • To access the upper RAM – Tie the MSB address bit to Logic High Embedded Multipliers • • Some functions.4/4/2011 Two Independent Single-Port RAMs RAMB4_S1_S1 Port A In 8K-Bit Depth 0.

4/4/2011 18 x 18 Embedded Multiplier • Fast arithmetic functions – Optimized to implement multiply / accumulate modules 18 x 18 signed multiplier Fully combinational Optional registers with CE & RST ( i li ) O i l i ih (pipeline) Independent from adjacent block RAM 18 x 18 Multiplier • Embedded 18-bit x 18-bit multiplier – 2’s complement signed operation i d in l • M lti li Multipliers are organized i columns Data_A (18 bits) 18 x 18 Multiplier Data_B (18 bits) Output (36 bits) 26 .

4/4/2011 Positions of Multipliers Asynchronous 18-bit Multiplier 27 .

ISBN 0750676043 Copyright © 2004 Mentor Graphics Corp.mentor. Tools. and Flows.com) 28 .4/4/2011 18-bit Multiplier with Register A simple clock tree Clock tree Flip-flops Special clock pin and pad Clock signal from outside world The Design Warrior’s Guide to FPGAs Devices. (www.

Tools. There might be multiple clock managers supporting only a subset of features (Jitter removal. and Flows. Special clock pin and pad Daughter clocks used to drive internal clock trees or output pins 29 . Frequency Synthesis. ISBN 0750676043 Copyright © 2004 Mentor Graphics Corp. Special clock pin and pad Daughter clocks used to drive internal clock trees or output pins The Design Warrior’s Guide to FPGAs Devices.com) Digital Clock Managers (DCM) • • • The clock pin is usually connected to special hard-wired function called a clock-manager that generates “daughter clocks”. The daughter clocks may be used to drive internal clock trees or external output pins that can be used to provide clocking services to other devices on the host circuit board. …) Clock signal from outside world Clock Manager etc.4/4/2011 Digital Clock Manager (DCM) Clock signal from outside world Clock Manager etc.mentor. (www.

The clock manager can be used to generate daughter clocks with frequencies that are derived by multiplying or dividing the original signal. 30 .4/4/2011 DCM: Jitter Removal • • • In the real world clock edges may arrive a little early or a little late. A fuzzy clock would result (jitter) due to the delay encountered. DCM: Frequency Synthesis • • The frequency of the clock signal being presented to the FPGA from the outside world might not be exactly what the designer engineer wishes for. The FPGA clock manager can be used to detect and correct for this jitter and provide a “clean” daughter clock signal for use inside the device.

4/4/2011 DCM: Phase Shifting • • Certain designs require the use of clocks that are phase shifted (delayed) with respect to each other. other Some clock managers allow you to select from fixed phase shifts of common values such as 1200 and 2400 (for a three-phase clocking scheme) Basic I/O Block Structure Three-State FF Enable Clock Set/Reset Output FF Enable D Q EC SR Direct Input FF Enable Registered Input Q D EC Input Path D Q EC SR Three-State Control Output Path SR 31 .

4/4/2011 IOB Functionality • IOB provides interface between the package pins and CLBs • Each IOB can work as uni. resistors were applied as discrete components (outside the FPGA). • • In the past.or bi-directional I/O • Outputs can be forced into High Impedance • Inputs and outputs can be registered – advised for high-performance I/O • Inputs can be delayed Configurable I/O Impedances • • The signals used to connect devices on today’s circuit board often have fast edge rates. FPGA) Today's FPGAs allow the use of internal terminating resistors whose value can be configured by the user. In order to prevent signals reflecting back it is necessary to apply appropriate terminating resistors to the FPGA input and output pins. 32 .

4/4/2011 Spartan 3 Family Attributes FPGA Nomenclature 33 .

4/4/2011 Spartan-3 FPGA Family Members 2001 – Virtex-II FPGA Family • Virtex-II FPGA introduced followed by Virtex-II Pro in 2003 – 444 18x18 Multipliers & 18kbit block RAMs introduced – Gbit Serial I/O Communications & Power PC Processors Introduced – C Complex Floating Point Algorithm Implementation now possible • Virtex-II / Pro – 44.000 Logic Slices – 444 18Kbits BRAMs – 444 18x18 Multipliers – 2 PowerPC Processors – 20 Gbit I/O – 1164 Max User I/O 34 .

000 to 50.4/4/2011 Virtex II Pro Floorplan Up to 16 serial transceivers • 622 Mbps to 3.000 logic cells • 200k to 4M bits RAM • 204 to 852 I/Os PowerPCs Logic cells Virtex-II Pro (Selection) 35 .125 Gbps • 1t 4P to PowerPCs PC • 4 to 16 multi-gigabit transceivers • 12 to 216 multipliers • 3.

4/4/2011 Embedded Processor Cores (Hard and Soft) • • • • The majority of designs make use of microprocessors. Embedded Core (Inside) • • Xilinx and Altera tend to embed one or more microprocessor cores directly into the main FPGA fabric (PowerPC) In this case the design tools have to be able to take account of the presence of these blocks in the fabric (any memory used by the core is formed from the embedded RAM blocks). high-end FPGAs have become available that contain one or more embedded microprocessors (referred to as microprocessor cores).  The main advantage of this scheme is the inherent speed p advantages to be gained from having the processor core in intimate proximity to FPGA fabric. These appeared as discrete devices on the circuit board. There are two types of cores: • A hard microprocessor core is implemented as a dedicated predefined block (two approaches) • A soft microprocessor core is implemented by configuring a group of programmable logic blocks to act as a microprocessor. 36 . Lately.

Also. The main advantage of this scheme is that the user need only implement a core if he/she needs it. g p Soft cores are simpler (more primitive) and slower than their hard-core counterparts. the user can instantiate as many cores as they require until they run out of resources! Virtex Architectures Built for high-performance applications Other Families include • Virtex-II Pro • Virtex-4 • Virtex-5 Latest Family include • Virtex-6 Basic Architecture 74 37 . 2. it is possible to configure a group of p g programmable logic blocks to act as a microprocessor.4/4/2011 Soft Core • • As opposed to embedding a microprocessor physically into the fabric of the chip. ADVANTAGE? 1.

7 Mb 4–8 128– 128–512 320– 320–640 N/A N/A N/A Logic Memory DCMs DSP Slices SelectIO RocketIO PowerPC Ethernet MAC 14K– 14K–200K LCs 0.3– 2.8 Mb SelectIO™.1164 I/O Advanced FPGA Logic – 99k logic cells XtremeDSP Functionality Embedded multipliers RocketIO™ and RocketIO X High-speed Serial Transceivers 622 Mbps to 3. and Signal Processing LX Resource FX 12K– 12K–140K LCs 0.4/4/2011 Virtex-II Pro Architecture Contains embedded Processors and Multi-Gigabit Transceivers High performance True Dual-port RAM .Ultra Technology . 9 layer copper in 300 mm wafer technology Basic Architecture 75 Virtex-4 Family Advanced Silicon Modular BLock (ASMBL) Architecture Optimized for logic.9– 0.9–6 Mb 4–12 32– 32–96 240– 240–960 N/A N/A N/A Basic Architecture 76 38 .125 Gbps PowerPC™ Processors 400+ MHz Clock Rate .12 130 nm.6–10 Mb 4–20 32– 32–192 240– 240–896 0–24 Channels 1 or 2 Cores 2 or 4 Cores SX 23K– 23K–55K LCs 2. Embedded.6– 0.3–5.2 XCITE Digitally Controlled Impedance Any I/O DCM™ Digital Clock Management .

If the red x still appears. y ou may hav e to delete the image and then insert it again. and then open the file again. or the image may hav e been corrupted. Embedded. y ou may hav e to delete the image and then insert it again.3 Gbps Smart RAM New block RAM/FIFO Advanced CLBs 200K Logic Cells Xesium Clocking Technology 500 MHz Tri-Mode Ethernet MAC XtremeDSP™ Technology Slices 256 18x18 GMACs 10/100/1000 Mbps PowerPC™ 405 with APU Interface 450 MHz. and then open the file again./Serial Basic Architecture 78 39 . Your computer may not hav e enough memory to open the image. y ou may hav e to delete the image and then insert it again.4/4/2011 Virtex-4 Architecture RocketIO™ Multi-Gigabit Transceivers 622 Mbps–10. Restart y our computer. Your computer may not hav e enough memory to open the image. Your computer may not hav e enough memory to open the image. SXT The image cannot be display ed. Logic Logic On-chip RAM DSP Capabilities Parallel I/Os Serial I/Os PowerPC® Processors Logic/Serial DSP/Serial Emb. or the image may hav e been corrupted. y ou may hav e to delete the image and then insert it again. Restart y our computer. or the image may hav e been corrupted. and then open the file again. If the red x still appears. Restart y our computer. or the image may hav e been corrupted. Signal Processing. 680 DMIPS 1 Gbps SelectIO™ ChipSync™ Source synch. and High-Speed Connectivity Virtex™-5 Platforms LX The image cannot be display ed. XCITE Active Termination Basic Architecture 77 Virtex-5 Family Optimized for logic. Restart y our computer. LXT The image cannot be display ed. and then open the file again. Your computer may not hav e enough memory to open the image. If the red x still appears. If the red x still appears. FXT The image cannot be display ed.

low-cost applications Spartan-3 Family 18x18 bit Embedded Pipelined Multipliers for efficient DSP Configurable 18K Block RAMs + Distributed RAM Spartan-3 Bank 0 Bank 2 Up to eight on-chip Digital Clock Managers to support multiple system clocks 4 I/O Banks.4/4/2011 Virtex-5 Architecture Enhanced 36Kbit Dual-Port Block RAM / Dualg FIFO with Integrated ECC 550 MHz Clock Management Tile with DCM and PLL SelectIO with ChipSync Technology and XCITE DCI Advanced Configuration Options 25x18 DSP Slice with Integrated ALU RocketIO™ Transceiver Options TriTri-Mode 10/100/1000 Mbps Ethernet MACs LowLow-Power GTP: Up to 3. mini-LVDS Bank 3 Bank 1 Basic Architecture 80 40 . RSDS.5 Gbps New Most Advanced HighHighPerformance Real 6LUT Logic Fabric PCI Express® Endpoint Block System Monitor Function with BuiltBuilt-in ADC Next Generation PowerPC® Embedded Processor Basic Architecture 79 TheBuilt for high volume. DDR333. Support for all I/O Standards including PCI.75 Gbps HighHigh-Performance GTX: Up to 6.

8V HSTL.5V • Logic resources – Only one-half of the slices support RAM or SRL16s (SLICEM) – Fewer block RAMs and multiplier blocks • Clock Resources – Fewer global clock multiplexers and DCM blocks • I/O Resources – Fewer pins per package – No internal 3-state buffers – Support for different standards • New standards: 1.2V versus 1.09 micron versus .15 micron – Vccint = 1. and SSTL • Default is LVCMOS. 1. versus LVTTL Basic Architecture 81 SLICEM and SLICEL • Each Spartan™-3 CLB contains four slices – Similar to the Virtex™-II Left-Hand SLICEM Right-Hand SLICEL COUT COUT • Slices are grouped in pairs – Left-hand SLICEM (Memory) • LUTs can be g y configured as memory or SRL16 Switch Matrix SHIFTIN Slice X1Y1 Slice X1Y0 Slice X0Y1 – Right-hand SLICEL (Logic) • LUT can be used as logic only Basic Architecture 82 Slice X0Y0 Fast Connects SHIFTOUT CIN CIN 41 .2V LVCMOS.4/4/2011 Spartan-3 Family Based upon Virtex-II Architecture – Optimized for Lower Cost • Smaller process = lower core voltage – .

4/4/2011 Multiple Domain-optimized Platforms Basic Architecture 83 Spartan-3E Features • More gates per I/O than Spartan 3 Spartan-3 • Removed some I/O standards – – – – Higher-drive LVCMOS GTL. HSTL I HSTL II 18 HSTL_I. BPI – Multi-Boot mode • DDR Cascade – Internal data is presented on a single clock edge Architecture Basic 84 42 . HSTL_III – LVDS_EXT. GTLP SSTL2_II HSTL_II_18. ULVDS • 16 BUFGMUXes on left and right sides – Drive half the chip only – In addition to eight global clocks • Pipelined multipliers • Additional configuration modes – SPI.

complex IP. standard speed grade – Compatible with VirtexDSP XtremeDSP DSP48A Slice • Increased memory capacity and performance – Also important for embedded processing. etc Basic Architecture 86 43 .4/4/2011 Spartan-3A DSP Features • Increased amount of block memory (BRAM) – 1512K of S3A1800 vs 648 K of S3E1600 • More XtremeDSP DSP48A slices – Replaces Embedded multiplier of Spartan-3E • 3400A – 126 DSP48As • 1800A – 84 DSP48As Basic Architecture 85 Spartan-3A DSP Tuning DSP Performance • Integrated XtremeDSP Sli Xt DSP Slice – Application optimized capacity – Integrated pre-adder optimized for filters – 250 MHz operation.

8mm pitch) *FG676 27x27mm (1. Basic Architecture 87 Spartan-3A Device Table Spartan-3 Spartan-3A XC3S1400A XtremeDSP DSP48A Slices Dedicated Multipliers Block Ram Blocks Block RAM (Kb) Distributed RAM (Kb) FFs/LUTs Logic C ll L i Cells DCMs Max Diff I/O Pairs CS484 19x19mm (0. multiply-accumulate etc. Multiply-add.528 25. such as 3input addition and 2-input multiplication with a single addition and the very valuable rounding of multiplication away from zero. Often a speed limiting path. Important in FIR filter construction. and auto-resetting counters/accumulators. underflow/overflow detection for saturation arithmetic. Reduces the critical path timing in FIR filter applications better performance.4/4/2011 Function Multiplier Pre-Adder Cascade Inputs Cascade Output Dedicated C input Adder Dynamic Opmodes ALU Logic Functions Pattern Detect SIMD ALU Support Carry Signals DSP48 DSP48 Comparison DSP48E 25 x 18 No Two Yes Yes 3 input 48 bit Yes Yes Yes Yes Carry In & Out DSP48A 18 x 18 Yes One Yes Yes 2 input 48 bit Yes No No No Enables parallel ALU operations on multiple data sets. The C input supports many 3-input mathematical functions.344 25 344 8 227 502 Basic Architecture 88 84 DSP48As 84 1.440 37 440 8 227 309 519 126 DSP48As 126 2. Similar to the ALU of a microprocessor.. (Add.280 37.268 373 47.0mm pitch) Spartan-DSP Spartan-3A DSP XC3SD1800A XC3SD3400A 32 32 576 176 22. Benefit 18 x 18 No One Yes No 3 input 48 bit Yes No No No Carry In Reduces FPGA resource needs for DSP algorithms. Carry In & Out Supports fast carry functions between DSP blocks. 53 712 8 213 309 469 44 . Supports simple add and accumulate functions. One DSP48 can provide more than one function. Enables fast d t E bl f t data path chaining of DSP48 bl k f l th h i i f blocks for larger filt filters. Multiply.512 260 33. Enables fast data path chaining of DSP48 blocks for larger filters. Enables the selection of ALU function on a clock cycle basis Enables multiple functions to be selected. or Compare) This feature supports convergent rounding.744 53. Subtract.

Protects Design Investments Basic Architecture 90 45 .4/4/2011 Latest Families Basic Architecture 89 Architecture Alignment Virtex-6 FPGAs Spartan-6 FPGAs 760K Logic Cell Device Common Resources LUT-6 CLB BlockRAM DSP Slices High-performance Clocking 150K Logic Cell Device FIFO Logic Tri-mode EMAC System Monitor *Optimized for target application in each family Parallel I/O HSS Transceivers* PCIe® Interface Hardened Memory Controllers 3.3 Volt compatible I/O Enables IP Portability.

4/4/2011 Addressing the Broad Range of Technical Requirements Spartan-6 LX Spartan-6 LXT Virtex-6 LXT Lowest cost logic + DSP Lowest logic + high-speed serial Virtex-6 HXT Market Size Virtex-6 SXT High logic density + serial connectivity Ultra high-speed serial connectivity + logic DSP + logic + serial connectivity Application Market Segments Basic Architecture 91 + 100s More Designers Eccentrics • Higher System Performance – More design margin to simplify designs – Higher integrated functionality • Lower System Cost – Reduce BOM – Implement design in a smaller device & lower speedgrade • Lower Power – Help meet power budgets – Eliminate heat sinks & fans – Architecture 92 Basic Prevent thermal runaway 46 .

and Cost Basic Architecture 94 Virtex-6 Base Platform 94 47 . Power.4/4/2011 Virtex-6 Family Basic Architecture 93 Virtex® Product & Process Evolution Virtex-6 40-nm Virtex-5 65-nm 6 Virtex-4 90-nm Virtex-II Pro 130-nm Virtex-II 150-nm Virtex-E 180-nm 180 nm Virtex 220-nm 2nd Generation 3rd Generation 4th Generation 5th Generation 6th Generation 1st Generation Delivering Balanced Performance.

9V option allows power / performance tradeoff Dynamic termination Allows sophisticated monitoring of temperature and voltage • D Dynamic P i Power R d ti Reduction • Reduced Core Voltage Devices Lower Overall Power • I/O Power Improvements • System Monitor Up to 50% Power Reduction vs.4/4/2011 Strong Focus on Power Reduction • Static Power Reduction – – – – – Higher distribution of low leakage transistors Reduced capacitance through device shrink VCCINT = 0. Previous Generation Basic Architecture 95 Virtex-6 Logic Fabric • Virtex-6 Configurable Logic Block (CLB) – Each CLB contains two slices – Each slice contains four 6-input Lookup Tables 6 input (6LUT) Slice LUT LUT Slice LUT LUT LUT LUT LUT LUT • Slices implement logic functions (slice_l) • Slices for memories and shift registers (slice_m) • LUT6 implements – All functions of up to 6 variables – Two functions of up to 5 or less variables each – Shift registers up to 32 stages long – Consumption Benefits PowerMemories of 64 bits Performance Benefits • Shift register • Multiple configurations within slice_m – memories mode greatly reduces power • Increased ratio of a slice consumption over FF implementation available closer to the source or target logic Basic Architecture 96 CLB Cost Benefits • Can pack logic and memory functions more efficiently 48 .

4/4/2011 Higher DSP Performance • Most advanced DSP architecture – New optional pre-adder for symmetric filters – 25x18 multiplier • High resolution filters • Efficient floating point support – ALU-like second stage enables mapping of advanced operations • Programmable op-code • SIMD support • Addition / Subtraction / Logic functions – Pattern detector • Lowest power consumption • Highest DSP slice capacity – Up to 2K DSP Slices Basic Architecture 97 Virtex®-6 LXT / SXT FPGAs Basic Architecture 98 49 .

4/4/2011 Spartan-6 Family Basic Architecture 99 Spartan-6 • Next Generation 45nm Spartan Family – Increased performance & density – Evolutionary feature enhancements – Dramatic cost & power reductions • Two Silicon Platforms – LX: Cost optimized Logic. Power & Performance Basic Architecture 100 50 . Memory – LXT: LX features plus High-Speed Serial Connectivity – More unified & integrated with Virtex Delivering the Optimal Balanced of Cost.

4/4/2011 Spartan-6 Logic Evolution Higher Performance. Increased Utilization • Modified Virtex 6-input LUT – 4 additional flip-flops per slice – Higher utilization for register Spartan-3A Series & Spartanintensive designs Earlier LUT / FF Pair NEW Efficient Design SpartanSpartan-6 LUT / Dual FF Pair 6LUT • Efficient & Capable – Logic – Arithmetic functions – Distributed RAM & shift registers – Interconnect 4LUT • Up to 25% Higher Performance Great GeneralGeneral-Purpose Logic 6-input LUT & 2nd FlipFlipflop for Higher Utilization Basic Architecture 101 Spartan-6 CLB Logic Slices SliceM (25%) SliceL (25%) SliceX (50%)  LUT6  8 Registers  Carry Logic  Wide Function Muxes  Distributed RAM / SRL logic  LUT6  8 Registers  Carry Logic  Wide Function Muxes  LUT6  Optimized for Logic p g  8 Registers Slice mix chosen for the optimal balance of Cost. Power & Performance Basic Architecture 102 51 .

4/4/2011 Spartan-6 Lowest Total Power • Static power reductions – Process & architectural innovations • Dynamic power reduction – Lower node capacitance & architectural innovations • More hard IP functionality – Integrated transceivers & other logic reduces power – Hard IP uses less current & power than soft IP • Lower IO power • Low power option -1L reduces power even further • Fewer supply rails reduces power Basic Architecture 103 Spartan-6 Hard Memory Controller • New Hard Block Memory Controller – Up to 4 controllers per device • Why a Hard Memory Block? – Very common design component – Multiple customer benefits Customer Requests Higher performance Lower cost Lower power Easier designs Spartan-6 Hard Block Memory Controller Benefits • Up to 800 Mbps • Saves soft logic. smaller die • Dedicated logic • Timing closure no longer an issue • Configurable MultiPort user interface • CoreGen/MIG wizard & EDK support Basic Architecture 104 52 .

4/4/2011 Memory Controller • Only low cost FPGA with a “hard” memory controller • G Guaranteed memory interface performance providing t d i t f f idi – Reduced engineering & board design time – DDR. 8 or 16 bit memories devices FLASH DRAM DDR DDR2 DDR3 LP DDR EEPROM Basic Architecture 105 Integrated DSP Slice • 250 MHz implementation – Fast multiplier & 48 bit adder – ASIC-like performance XtremeDSP DSP48A1 Slice • Input and output registers for higher speed Optimizes FIR filter applications Super Regional Training 106 53 .8Mbps bandwidth for each memory controller • Automatic calibration features • M lti t structure f user i t f Multiport t t for interface – Six 32-bit programmable ports from fabric Spartan-6 DRAM SRAM – Controller interface to 4. DDR2. DDR3 & LP DDR support – Up to 12.

4/4/2011

Better, More BRAM
• More Block RAMs
– 2x higher BRAM to Logic Cell ratio than Spartan-3A platform
9K BRAM 18K BRAM

• More port flexibility
– 18K can be split into two 9K BRAM blocks and can be independently addressed

OR

9K BRAM

• Improves buffering, caching & data storage
– Excellent for embedded processing, communication protocols – Enables DSP blocks to provide more efficient video and surveillance algorithms

• Lower Static Power
Basic Architecture 107

Compare to Spartan-3A
Twice the Capabilities, Half the Power, Hard Blocks!
Feature Logic Cells (Kbit) LUT Design Block RAM (Mbit) Transceiver Count / Speed Voltage Scaling Static Power (typ mW) Memory Interface Max Differential IO Multipliers/DSP Memory Controllers Clock Management PCI Express Endpoint Security
Basic Architecture 108

Extended Spartan-3A (90nm) Up to 55K 4 input 4-input LUT + FF Up to 2 Mbit no No (1.2V only) 11 mW (smallest density) 400 Mbps 640 Mbps Up to 126 Multipliers / DSP no DCM Only no Device DNA Only

Spartan-6 (45nm) Up to 150K 6 input 6-input LUT + 2FF Up to 5 Mbit Up to 8 / Up to 3.125 Gbps Yes (1.2V, 1.0V) Up to 60% less! DDR3 800 Mbps 1050 Mbps Up to 184 DSP48 Blocks Up to 4 Hard Blocks DCM & PLL Yes, Gen 1 Device DNA & AES

54

4/4/2011

Spartan-6 LX / LXT FPGAs

** All memory controller support x16 interface, except in CS225 package where x8 only is supported

Basic Architecture 109

FPGA Design Flow

55

4/4/2011

Design process (1)
Specification
Design and implement a simple unit permitting to speed up encryption with RC5-similar cipher with fixed key set on 8031 microcontroller. Unlike in the experiment 5, this time your unit has to be able to perform an encryption algorithm by itself, executing 32 rounds…..

Verilog description (Your Verilog Source Files)
Library IEEE; use ieee.std_logic_1164.all; use ieee.std_logic_unsigned.all; entity RC5_core is port( clock, reset, encr_decr: in std_logic; data_input: in std_logic_vector(31 downto 0); data_output: out std_logic_vector(31 downto 0); out_full: in std_logic; key_input: in std_logic_vector(31 downto 0); key_read: out std_logic; ); end AES_core;

Functional simulation

Synthesis

Post-synthesis simulation y

Design process (2)
Implementation (Mapping, Placing & Routing) Timing simulation g

Configuration On chip testing

56

MUX_2. Y<=Y1 when (NEG_Y='0') else not Y1. end MLU_DATAFLOW. MUX_2<=A1 xor B1. MUX_1. with (L1 & L0) select Y1<=MUX_0 when "00". MUX_1< A1 MUX 1<=A1 or B1. Circuit netlist 57 . begin A1<=A when (NEG_A='0') else not A. MUX_3 when others. MUX_3<=A1 xnor B1. signal MUX_0. MUX_2 when "10". signal B1:STD_LOGIC. MUX_1 when "01". MUX_0<=A1 and B1.4/4/2011 Design Process control from Active-HDL Logic Synthesis VHDL description architecture MLU_DATAFLOW of MLU is signal A1:STD_LOGIC. MUX_3: STD_LOGIC. B1<=B when (NEG_B='0') else not B. signal Y1:STD_LOGIC.

4/4/2011 Synthesis Tools XST … and others Features of synthesis tools • Interpret RTL code p • Synplify Pro: Produces synthesized circuit netlist in a standard EDIF (.VHM (VHDL code merged into one) file for post-synthesis simulation • XST: Produces synthesized circuit netlist in NGC format • Netlist is composed of gates in the particular Xilinx implementation library – http://toolbox.com/docsan/xilinx9/books/manuals.pdf has information on libraries • Give preliminary performance estimates • Some can display circuit schematics corresponding to EDIF netlist 58 .edf) format – Can optionally produce .xilinx.

765 11.765 12.8 MHz 11.924 inferred Inferred_clkgroup_0 System 85.0 MHz 86.0 MHz 78.572 0.4/4/2011 Timing report after synthesis Performance Summary ******************* Worst slack in design: -0.924 0.688 -0.193 system default_clkgroup =========================================================== Implementation • After synthesis the entire implementation process is performed by FPGA vendor tools 59 .4 MHz 11.924 Requested Estimated Requested Estimated Clock Clock Starting Clock Frequency Frequency Period Period Slack Type Group ------------------------------------------------------------------------------------------------------exam1|clk 85.

4/4/2011 Mapping LUT0 LUT4 LUT1 LUT5 LUT2 FF2 LUT3 FF1 60 .

4/4/2011 Placing FPGA CLB SLICES Routing Programmable Connections FPGA 61 .

03i Map H.352 6% Number of Slices containing only related logic: 145 out of 145 100% Number of Slices containing unrelated logic: g g 0 out of 145 0% *See NOTES below for an explanation of the effects of unrelated logic Total Number 4 input LUTs: 210 out of 4.4/4/2011 Map report header Release 7.704 3% Number of 4 input LUTs: 173 out of 4.704 4% Number used as logic: 173 Number used as a route-thru: 5 Number used as 16x1 RAMs: 32 Number of bonded IOBs: 74 out of 176 42% Number of GCLKs: 1 out of 4 25% Number of GCLKIOBs: 1 out of 4 25 62 .$Revision: 1.exe -p 2S200FG256-6 -o map.26.ncd -pr b -k 4 -cm area -c 100 -tx off exam1.704 3% Logic Distribution: Number of occupied Slices: 145 out of 2.1.41 Xilinx Mapping Report File for Design 'exam1' Design Information -----------------Command Line : c:\Xilinx\bin\nt\map.6.ngd exam1.pcf Target Device : xc2s200 Target Package : fg256 Target Speed : -6 Mapper Version : spartan2 -.4 $ Mapped Date : Wed Nov 02 11:15:15 2005 Map report Design Summary -------------Number of errors: 0 Number of warnings: 0 Logic Utilization: Number of Slice Flip Flops: 144 out of 4.

442ns Minimum output required time after clock: 11. 0 nets.765 ns | 11.442ns | 2 -------------------------------------------------------------------------------- Post layout timing report Timing summary: --------------Timing errors: 0 Score: 0 Constraints cover 42912 paths.765ns | 11.491ns 63 . and 1038 connections Design statistics: Minimum period: 11. This may be due to a setup or hold violation.765 ns BEFORE COMP "clk" | 11.044MHz) Minimum input required time before clock: 11. -------------------------------------------------------------------------------Constraint | Requested | Actual | Logic | | | Levels -------------------------------------------------------------------------------TS_clk = PERIOD TIMEGRP "clk" 11.622ns (Maximum frequency: 86.765ns | 11.622ns | 13 HIGH 50% | | | -------------------------------------------------------------------------------OFFSET = OUT 11.491ns | 1 -------------------------------------------------------------------------------OFFSET = IN 11.765 ns AFTER COMP "clk" | 11.765ns | 11.4/4/2011 Place & route report Timing Score: 0 Asterisk (*) preceding a constraint indicates it was not met met.

4/4/2011 Post-place-and-route simulation • After place-and-route performed. can do post-place-and-route simulation t l d t i l ti – Now have real timing information! – Also can do static timing analysis: shows the worst case critical path in circuit Configuration • Once a design is implemented.bit extension) • The BIT file can be downloaded directly to the FPGA. you must create a file that the FPGA can understand – This file is called a bit stream: a BIT file (. FPGA or can be converted into a PROM file which stores the programming information 64 .

com) System Gates vs. where each implementation of each function requires a different number of transistors (difficult to compare capacity/complexity) Solution: Assign each function an equivalent gate value and sum all these values.000 equivalent gates that needs to be migrated into an FPGA fit into a particular FPGA? 65 . (www. th l How can we establish a basis for comparison between FPGAs and ASICs? Can an ASIC of 500. ISBN 0750676043 Copyright © 2004 Mentor Graphics Corp. and Flows. An equivalent gate consists of an arbitrary number of transistors. Real Gates • • • • • • One common metric used to measure the size of a device in the ASIC world is that of equivalent gates (e-gate) (eConvention used: • • A 2-input NAND function to represent one equivalent gate.4/4/2011 Configuration of SRAM based FPGAs The Design Warrior’s Guide to FPGAs Devices. Different vendors provide different functions in their cell libraries.mentor. Tools.

to make comparisons between two different implementations on an FPGA (i. Integrated transceivers running at 10 Gigabits/sec >100. so a three million FPGA system gates would equate to one million ASIC equivalent gates!! However. dedicated circuits. Rule of thumb? • Divide the system gates value by three.e. flexible I/O.4/4/2011 FPGAs: System Gates • • • System Gates A 4-input LUT can be used to represent Gates: anywhere between one and more than twenty 2-input primitive h b t d th t t 2i t i iti logic gates. Fixed point adder) designers should use the resources available in an FPGA: • • • Number of 4-input LUTs used Number of embedded multipliers Number of embedded RAM blocks State-of-the-Art FPGAs • • • 65-90 nm process on 300 mm wafers • • • • • • Lower cost per function (LUT + register) Smaller and faster transistors: Higher speed Mainly through smart interconnects. and same number 18 x 18 multipliers System speed up to 500 MHz More Logic and Better Features: • 1156 pins (balls) with >800 GP I/O i (b ll ) ith • 50 I/O standards. Floating point adder vs. LVDS with internal termination • 16 low-skew global clock lines • Multiple clock management circuits • On-chip microprocessor(s) and multi-Gbps transceivers 66 . incl.000 LUTs & flip-flops >200 embedded RAMs. clock management.

5 X Faster 50 X Less Expensive 1/9 1 1/92 1/93 1 /94 1/9 5 1/96 1/97 1/98 1 /99 Y ear Source: Xilinx 67 .1Gb/s) • Up to four PowerPC 405 cores Altera Stratix-II • • • • • 90nm process Up to 1170 I/Os 179000 logic elements 9.6Mb embedded RAM 96 DSP blocks: 380 18x18 multipliers • 12 PLLs • Serial I/O up to 1Gb/s • No hard processor cores FPGAs Becoming More Attractive 21 X Bigger C a p a c ity S peed P ric e 5.4/4/2011 Latest Devices: Capacity & Features Xilinx Virtex-5 • • • • 65nm process Up to 960 I/Os /O >200000 logic cells Up to 552 18kb block RAMs (~10Mb RAM) • 450 DSP slices (18x18 multiplier-accumulator) • 20 digital clock managers (DCM) • 24 high-speed serial transceivers (622Mb/s to 11.

programmable. ASIC • Applicability of FPGAs relies on CAD tools provided by different vendors such as Xili and Alt diff t d h Xilinx d Altera • RCS can be realized with several technologies: – FPGAs: Fine/Medium Grain – Coarse Grain Reconfigurable Architectures: CGRAs 68 .t. – Advantages: Flexible.r. – Disadvantages: Power dissipation.4/4/2011 FPGA Shortcomings • • • Circuit Delay • Delay increases due to programmable switches in the FPGA routing architecture Area • Configuration cells and programmable resources incur substantial area penalty Power • Typically not suited for low power applications Performance ASIC Need to improve FPGA FPGA FPGA Cost ASIC Time to market ASIC Conclusion • FPGAs are the main enabler of Reconfigurable Computing Systems • FPGAs fill the gap between Instruction Set Processors (GPs) and ASICS. performance w.

Sign up to vote on this title
UsefulNot useful