Training C T i i Course on

Advance FPGA based Digital System Design
by Fahad Al Ghazali

*Organized by Skill Development Council Islamabad
(Ministry of Professional & Technical Training Govt. of Pakistan)

Xilinx Xtreme DSP Architecture

DSP Implementation I l t ti
Digital Signal Processing can be implemented in both hardware and p software Software based approach implements in general purpose Processor Programs the processor for the tasks of particular application
FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m )

Texas Instruments. c o m ) . customized processor whose architecture pp p g processing tasks in g supports special signal p form of libraries e.g.DSP Implementation (2) I l t ti Second approach is to use a special p p purpose . Tiger Shark. Da Vinci etc. FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . hard wired high p g performance .

c o m ) .DSP Implementation (3) I l t ti Application Specific Integrated circuits can be fabricated for a unique . q Feasible only if large number of units are required FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o .

c o m ) .DSP Implementation (4) I l t ti DSP on FPGAs Benefits Reduced Chip count in case design already requires programmable logic Useful in case of greater number of channels Flexibility Debugging FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o .

MAC R i t fM i MACs FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o .DSP D i Ch ll Design Challenges High Throughput Multiple Concurrent operations Multiple ALUs Requirement of Memories . c o m ) .

c o m ) .DSP T i l O Typical Operations ti DSP operate on fixed-word length data that arrive at regular intervals of time Multiplication and Addition commonly known as MAC operation MAC functional units must be implemented efficiently and must give high performance Floating point/ Fixed point arithmetics Memory read/write Number of channels FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o .

e. Depends upon how many clock cycles we have before next y y sample In case whole of the binary word is being y g processed at the same time. time between first input and first valid output t t FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o .Timing Ti i The operations can be distributed spatially in different blocks or one block .time delivery of the results The operations can be distributed over latency factor i. then hardware resources ensure in. c o m ) .

c o m ) . BWRC FPGA based Digital Design using Verilog HDL 10 ( f p g a c o u r s e @ y a h o o .High Speed processing requirement in DSP algorithms *Source: Jan Rabaey.

DSP Architecture Support in Xilinx FPGAs Today’s FPGA architecture address DSP implementation issues and offer specialized architectures. Reasons: Market is flowing more towards reduced chip count solution to decrease the the sizes of devices To extract market share of devices used in booming communication industry y To exploit the parallel architecture offered by FPGAs FPGA based Digital Design using Verilog HDL 11 ( f p g a c o u r s e @ y a h o o . c o m ) .

c o m ) .DSP options i FPGA ti in FPGAs The options are DSP48 slice introduction in architecture Built in cores of DSP functions so that user does not have to start design from scratch g On-chip soft/hard processor I.Ps with Floating point unit and support for C FPGA based Digital Design using Verilog HDL 12 ( f p g a c o u r s e @ y a h o o .

c o m ) .FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o .

c o m ) .FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o .

multiplier followed by an adder. three-input adder. p ( ). including multiplier. p y multiplier-accumulator (MACC). c o m ) . p management. memory. barrel shifter. and digital signal processing Each XtremeDSP tile contains two DSP48 slices to form the basis of a versatile coarse grain DSP architecture coarse-grain Support independent functions.Xtreme DSP : Design Considerations The DSP48 slice is a new element in the Xilinx development model referred to as Application Specific Modular Blocks (ASMBL™) architecture Delivers off-the-shelf programmable devices with the g y processors. FPGA based Digital Design using Verilog HDL 15 ( f p g a c o u r s e @ y a h o o . I/O. clock best mix of logic.

DSP48 Architecture A hit t The DSP48 slice is an 18 x 18 bit two’s complement multiplier followed by a 48-bit signextended adder/subtracter/accumulator. c o m ) . a function that is widely used in digital signal processing (DSP) Its predecessors which came in Spartan –III/IIIE were with the name of MULT18x18 Inherent Pipeline bases architecture enhanced throughput 48-bit bus internal offers high aggregation FPGA based Digital Design using Verilog HDL 16 ( f p g a c o u r s e @ y a h o o .

FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m ) .

c o m ) .FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o .

FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m ) .

FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m ) .

Features F t FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m ) .

c o m ) .FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o .

Xilinx introduced DSP48 block for high-speed DSP on FPGAs Essentially a multiply-accumulate core with many other features Now also Spartan 3A and Virtex 5 have DSP blocks Spartan-3A FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m ) .Xilinx XtremeDSP Starting with Virtex 4 family.

Xtreme DSP Interconnect in Virtex DSP48 and Block RAM have dedicated interconnect to prevent interconnect bandwidth issues FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m ) .

Features F t 1. FPGA based Digital Design using Verilog HDL 25 ( f p g a c o u r s e @ y a h o o . The 18-bit A bus and B bus are concatenated. Selecting any of the 48 bit 36-bit inputs provides a 48-bit sign-extended output.Y. sign extended to 48 bits. 2. The X. The partial products feed the X and Y multiplexers. both X and Y multiplexers are utilized and the adder/subtracter combines the partial products into a valid multiplier result. with the A bus being the most significant. When OPMODE selects the multiplier. c o m ) . and Z multiplexers are 48-bit designs. The multiplier outputs two 36-bit partial products. 3.

8 Enabling SUBTRACT implements Z – (X+Y+CIN) at the output of the adder/subtracter FPGA based Digital Design using Verilog HDL 26 ( f p g a c o u r s e @ y a h o o . The gray colored multiplexers are programmed at configuration time gray-colored 7.Features F t 4. The multiply-accumulate path for P is through the Z multiplexer. or rounding 8. The shared C register supports multiply-add. wide addition. and sign extends the upper 17 bits 6. The Right Wire Shift by 17 bits path truncates the lower 17 bits. The P feedback through the X multiplexer enables accumulation of P cascade when the multiplier is not used 5. c o m ) .

c o m ) .FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o .

FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m ) .

c o m ) .FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o .

c o m ) .FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o .

c o m ) .FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o .

c o m ) .Simplified Si lifi d DSP Slice M d l Sli Model FPGA based Digital Design using Verilog HDL 32 ( f p g a c o u r s e @ y a h o o .

A input Logic i tL i FPGA based Digital Design using Verilog HDL 33 ( f p g a c o u r s e @ y a h o o . c o m ) .

c o m ) .B input logic i tl i FPGA based Digital Design using Verilog HDL 34 ( f p g a c o u r s e @ y a h o o .

c o m ) .C input L i i t Logic FPGA based Digital Design using Verilog HDL 35 ( f p g a c o u r s e @ y a h o o .

P output Logic t tL i FPGA based Digital Design using Verilog HDL 36 ( f p g a c o u r s e @ y a h o o . c o m ) .

FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m ) .

FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m ) .

DSP48 Slice: Virtex 4 FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m ) .

DSP 48 Tile Til FPGA based Digital Design using Verilog HDL 40 ( f p g a c o u r s e @ y a h o o . c o m ) .

DSP48E Slice : Virtex5 FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m ) .

c o m ) . and CIN by the adder/subtracter. X multiplexer output.DSP48 Functionality F ti lit Full speed operation is 500 MHz when using the pipeline registers Equation 1 1 summarizes the combination of X 1-1 X. The CIN. and Y multiplexer output are always added together together. Z. Adder Out Add O t = (Z ± (X + Y + CIN)) FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . This combined result can be selectively added to or subtracted from the Z multiplexer output. Y.

Adder Out = C ± (A × B + CIN) FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o .DSP48 Functionality F ti lit A and B are multiplied and the result is added to or subtracted from the C register. c o m ) . Selecting the multiplier function consumes both X and Y multiplexer outputs to feed the dd f d th adder. The two 36-bit partial products from the multiplier are sign extended to 48 bits before being sent to the adder/subtracter.

c o m ) .Simplified Form of DSP48 Si lifi d F f FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o .

Mathematical Functions M th ti l F ti DSP 48 can perform mathematical functions such as: Add/Subtract Accumulate Multiply py Multiply-Accumulate Multiplexer Barrel Shifter Counter Divide ( lti Di id (multi-cycle) l ) Square Root (multi-cycle) Can also create filters such as: Serial FIR Filter (Xilinx calls this MACC filters) Parallel P ll l FIR Filt Filter Semi-Parallel FIR Filter Multi-rate FIR Filters FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m ) .

MACC Filter

Xilinx implementation of a serial FIR filter called a MACC ( lti l accumulate filt ) (multiply l t filter) This example has 96 coefficients Max input sample rate = clock speed / number of p p p taps t
FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m )

FPGA based Digital Design using Verilog HDL 47 ( f p g a c o u r s e @ y a h o o . c o m )

DSP 48E1 i Vi t 6 in Virtex-6
Enhancements to the DSP48E1 slice provide improve flexibility and utilization, improved efficiency of applications, reduced overall power consumption, and increased maximum frequency. frequency The high performance allows designers to implement multiple slower operations in a single DSP48E1 slice using time-multiplexing methods

FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m )

c o m ) . These functions include : Multiply py Multiply accumulate (MACC) Multiply add Three-input add ee pu Barrel shift Wide-bus multiplexing Magnitude comparator Bit-wise logic functions. pattern detect.Features of DSP48E1 F t f The DSP48E1 slice supports many independent functions. and wide counter FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o .

Virtex-6 Vi t 6 DSP 48E1 Sli Slice FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m ) .

Enhanced F t E h d Features The Virtex-6 FPGA Virtex 6 DSP48E1 slice includes all Virtex-5 FPGA DSP48E features plus a variety of enhancements FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m ) .

c o m ) .Enhanced F t E h d Features (C t’d) (Cont’d) The enhanced features in the Virtex-6 FPGA DSP48E1 slice are: • 25 bit pre-adder with D register t enhance th 25-bit dd ith i t to h the capabilities of the A path • INMODE control supports balanced pipelining when dynamically switching between multiply (A*B) and add operations (A:B) FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o .

The height of a DSP48E1 tile g is the same as five configurable logic blocks (CLBs) and also matches the height of one block RAM. 6.DSP48E1 Tile and Interconnect Two DSP48E1 slices and dedicated interconnect form a DSP48E1 tile . DSP48E1 columns. The bl k Th block RAM i Vi t 6 d i in Virtex-6 devices can b split be lit into two 18K block RAMs. 2. Each DSP48E1 slice aligns horizontally with an 18K block RAM. c o m ) . The DSP48E1 tiles stack vertically in a DSP48E1 column. Virtex 6 Virtex-6 family members have 1 2 6 or 10 1. FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o .

No. of DSP48E1 Slices offered in Virtex-6 Family SMU CSE 5349/7349 FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m ) .

DSP48E1 Sli P i iti Slice Primitive CSE 5349/7349 FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m ) .

FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m ) .

FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m ) .

. Divide(float/fixed) Match ith M t h with MATLAB results i th next lt in the t session …. Floating point/Fixed Point Double/single precision Square Root. FPGA based Digital Design using Verilog HDL 58 ( f p g a c o u r s e @ y a h o o . Multiply.Arithmetics A ith ti …. c o m ) .

015625 = 0..m Format N bits are in integer part and 5 bits are in mantissa part 10/15 = 0000000 0000000….5+0. c o m ) .Fixed Point Representation Fi d P i t R t ti Qn. 1 0 1 0 1 0 1 Weights of mantissa part-1 -2 -3 -4 -5 -6 -7 0.125+0.03125 +0. .6666777 FPGA based Digital Design using Verilog HDL 59 ( f p g a c o u r s e @ y a h o o .

c o m ) .Fixed Point Divider i Coregen Fi d P i t Di id via C FPGA based Digital Design using Verilog HDL 60 ( f p g a c o u r s e @ y a h o o .

c o m ) .FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o .

FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m ) .

FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m ) .

Optimize Your Design for Xilinx A hi Xili Architecture y CORE Generator System .

c o m ) . interfaces. microprocessors • Some cores can be customized FPGA based Digital Design using Verilog HDL 65 ( f p g a c o u r s e @ y a h o o . transforms. accumulators. and memories – Specialized functions such as bus interfaces controllers and functions. and multipliers – System-level building blocks. such as adders. including filters.What are Cores? A core is a ready-made function that you can instantiate into your design as a “black box” black box • Cores can range in complexity – Simple arithmetic operators. controllers.

c o m ) .Benefits of Using Cores Save design time Cores are created by expert designers who have in-depth knowledge of Xilinx FPGA architecture Guaranteed functionality saves time during simulation Increase design performance Cores that contain mapping and placement information have predictable performance that is constant over device size and utilization The data sheet f each core provides performance expectations for f Use timing constraints to achieve maximum performance FPGA based Digital Design using Verilog HDL 66 ( f p g a c o u r s e @ y a h o o .

com FPGA based Digital Design using Verilog HDL 67 ( f p g a c o u r s e @ y a h o o .xilinx. c o m ) . plus: – Data sheets – Customizable parameters (available for some cores) • Interfaces with design entry tools – Creates graphical symbols for schematic-based designs – Creates instantiation templates for HDL-based designs • Web access from the Help menu – The IP Center contains new cores to download and install • You always have access to the latest cores – Direct access to http://support xilinx com http://support.What is the CORE Generator System? Graphical User Interface (GUI) that allows central access to the cores themselves.

then select the type of core FPGA based Digital Design using Verilog HDL 68 ( f p g a c o u r s e @ y a h o o . c o m ) .Invoking the CORE Generator System select Project → New Source • Select IP (CoreGen & Architecture Wizard) and enter a filename t fil • Click Next.

c o m ) .Core G C Generator GUI t FPGA based Digital Design using Verilog HDL 69 ( f p g a c o u r s e @ y a h o o .

Xilinx CORE Generator System GUI FPGA based Digital Design using Verilog HDL 70 ( f p g a c o u r s e @ y a h o o . c o m ) .

Core Customize Window

FPGA based Digital Design using Verilog HDL 71 ( f p g a c o u r s e @ y a h o o . c o m )

CORE Data Sheets

FPGA based Digital Design using Verilog HDL 72 ( f p g a c o u r s e @ y a h o o . c o m )

Schematic Design Flow
Generate a core – Use the Edit → Project Options to select a schematic p symbol instead of HDL templates – Creates an EDIF file and schematic symbol • Instantiate symbol onto your schematic – Treated as a “black box” - no underlying schematic • Proceed with normal schematic flow
FPGA based Digital Design using Verilog HDL 73 ( f p g a c o u r s e @ y a h o o . c o m )

additional simulation models will be automatically extracted during installation FPGA based Digital Design using Verilog HDL 74 ( f p g a c o u r s e @ y a h o o . and Scirocco • If you download new or updated cores. VCS. Cadence NC-Verilog VCS ModelSim NC-Verilog. c o m ) . Speedwave.exe to compile th XilinxCoreLib simulation lib il the Xili C Lib i l ti library – Located in $XILINX\bin\<platform> – Supports ModelSim.HDL Design Flow: Compile Simulation Library Before your first behavioral simulation. you must run compxlib.

c o m ) .HDL Design Flow: Core Generation and Integration Generate or purchase a core – Netlist file (EDN) – Instantiation template files ( p (VHO or VEO) ) – Behavioral simulation wrapper files (VHD or V) • Instantiate the core into your HDL source – Cut and paste from the templates p p p provided in the VEO or VHO file • Design is ready for synthesis and implementation • Use the wrapper files for behavioral simulation – ISE automatically uses wrapper files when cores are p y pp present in the design g – VHDL: Analyze the wrapper file for each core before analyzing the file that instantiates the core FPGA based Digital Design using Verilog HDL 75 ( f p g a c o u r s e @ y a h o o .

c o m ) . two s FPGA based Digital Design using Verilog HDL 76 ( f p g a c o u r s e @ y a h o o .DSP48 macro in Xili ISE i Xilinx DSP48 macro provides an easy-to-use interface that abstracts the XtremeDSP™ slice simplifies it d i lifi its dynamic operation b i ti by enabling th bli the specification of multiple operations via a set of userdefined arithmetic expressions p Support for up to 64 instructions Configurable latency Choose between XtremeDSP Slice or fabric Implementation Support of signed two’s complement input data signed.

c o m ) .DSP48 macro in Xilinx ISE(Cont’d) The user specifies 1 to 64 instructions in the core GUI that are translated into the various control signals for the XtremeDSP slice of the target device g The instructions are stored in a ROM from which the user selects the appropriate instruction using the SEL port FPGA based Digital Design using Verilog HDL 77 ( f p g a c o u r s e @ y a h o o .

Max : 48 bits ( f p g a c o u r s e @ y a h o o . of instructions P port – output from XtremeDSP slice add/sub. Carry in value from fabric SEL port – Selects the instruction width as p per no. c o m ) 78 . Asserting SCLR synchronously with CLK resets all registers A Port – input of operand to Xtreme DSP Cascaded A port .Basic Core I/O B i C I/Os Name CLK SCLR A ACIN B CONCAT C CARRYIN SEL P Direction Optional Description Input Input Input Input Input Input Input Input Input Output No Yes Yes Yes Yes Yes Yes Yes Yes No Clock – active rising edge Synchronous Clear – synchronous reset (active High). Driven by ACOUT B Port – input of operand to Xtreme DSP Concatenation of A and B ports C port – input to XtremeDSP slice add/sub. provides the selected instructions FPGA based Digital Design using Verilog HDL result.

Max : 48 bits CARRYOUT of sub/add operation f b/ dd ti CARRYO UT Ouput O t No N FPGA based Digital Design using Verilog HDL 79 ( f p g a c o u r s e @ y a h o o . c o m ) .Basic Core I/O B i C I/Os Name P Direction Optional Description Output No P port – output from XtremeDSP slice add/sub. provides the selected instructions result.

Core S h C Schematic S b l ti Symbol FPGA based Digital Design using Verilog HDL 80 ( f p g a c o u r s e @ y a h o o . c o m ) .

Configuration of Core C fi ti fC A Graphical user interface appears when DSP48 macro is selected to be generated g via CoreGen First Component name is provided by user A number of instructions copied from available instructions can be pasted on to user-defined instructions There are 64 i t ti Th instructions FPGA based Digital Design using Verilog HDL 81 ( f p g a c o u r s e @ y a h o o . c o m ) .

Pipeline Options Pi li O ti There are 3 options . Automatic Tier1 axis Expert : Fully automated (as per ISE) : Configurable upto one tier ( one : Fully configurable Checkboxes appear as to select whether pipeline is to be inferred or not at a certain point of hardware FPGA based Digital Design using Verilog HDL 82 ( f p g a c o u r s e @ y a h o o . c o m ) .

Implementation .

DSP 48 th through C G h CoreGen FPGA based Digital Design using Verilog HDL 84 ( f p g a c o u r s e @ y a h o o . c o m ) .

c o m ) .DSP48 Consumption C ti FPGA based Digital Design using Verilog HDL 85 ( f p g a c o u r s e @ y a h o o .

c o m ) .FPGA based Digital Design using Verilog HDL 86 ( f p g a c o u r s e @ y a h o o .

c o m ) .FPGA based Digital Design using Verilog HDL 87 ( f p g a c o u r s e @ y a h o o .

carryin(carryin). // Bus [17 : 0] . // Bus [47 : 0] p(p)). .p(p)).sel(sel). // Bus [47 : 0] . . .c(c).a(a). // Bus [17 : 0] b(b) .Instantiation T I t ti ti Template l t dsp481 YourInstanceName ( ( ) .b(b). // Bus [2 : 0] y ( y ). FPGA based Digital Design using Verilog HDL 88 ( f p g a c o u r s e @ y a h o o .clk(clk). c o m ) .

c o m ) .High frequency synthesis Hi h f th i Timing Summary: --------------Speed Grade: -12 Minimum period: 1 244 (M i Mi i i d 1.514ns p Maximum output required time after clock: 4.152ns FPGA based Digital Design using Verilog HDL 89 ( f p g a c o u r s e @ y a h o o .244ns (Maximum F Frequency: 804.001MHz) Minimum input arrival time before clock: 2.

FPGA based Digital Design using Verilog HDL 90 ( f p g a c o u r s e @ y a h o o . c o m ) .

Device Utilization Summary of the design with 3 instructions Device utilization summary: --------------------------Selected Device : 4vfx12ff668-12 Number of Slices: Number of Slice Flip Flops: Number of 4 input LUTs: Number of IOs: Number of bonded IOBs: Number of GCLKs: Number of DSP48s: 61 out of 5472 1% 112 out of 10944 1% 53 out of 10944 0% 54 53 out of 320 16% 1 out of 32 3% 1 out of 32 3% FPGA based Digital Design using Verilog HDL 91 ( f p g a c o u r s e @ y a h o o . c o m ) .

Thanks … .

Sign up to vote on this title
UsefulNot useful