You are on page 1of 47

Implementing a Source Synchronous Interface between Altera FPGAs

Version 2.0

Page 1 of 47

August 2007

ABSTRACT Previous Tech Notes1 examined the methods of timing source synchronous inputs and outputs independently, when one device was an Altera FPGA, using the Quartus II Classic static timing analysis tool. This Tech Note examines in detail a special case when both the transmitting and receiving devices are Altera FPGAs using the new Quartus II TimeQuest static timing analysis tool. In addition to demonstrating how to do this for specific interfaces between two Altera FPGAs, information provided in the Appendices describes how to adapt this method for using on any source synchronous interface when only one side, input or output, is an Altera device. In the first example, a 550 MHz 144-bit SDR source synchronous interface is examined as a means of passing high-throughput data between two devices. In the next example, a 400 MHz 144-bit DDR source synchronous interface is examined. The method used in these examples will maximize timing margin on the data capture and demonstrate how to time the interface for both SDR and DDR connections. No practical design could run at these internal clock speeds in an FPGA today, but these examples demonstrate the speed and data width that an SDR or DDR source synchronous I/O interface is capable of running at when using Altera devices. There are several different methods used to transfer high-throughput data between devices. Table 1 shows three common methods supported in Altera FPGAs.
Method High-Speed SERDES High-Speed LVDS Clocking CDR CDS (DPA) Data Serial Serial MegaFunction ALTGXB ALT2GXB ALTLVDS Devices Stratix GX Stratix II GX Stratix GX Stratix II GX Stratix2 Stratix II Cyclone3 Cyclone II4 All FPGAs Data Rate 400Mbps 6.375 Gbps 150Mbps 1.25 Gbps Max Width 20 132 Max Throughput 128 Gbps 165 Gbps

Source Synchronous

Source Synchronous feedback mode of PLL

Parallel (SDR & DDR)

ALTDDIO

2 Mbps 550 Mbps (SDR) 4 Mbps 800 Mbps (DDR)

1445

115 Gbps

Table 1: Comparison of High-Throughput Data Transfer Methods High-Speed SERDES runs at very high-speed serial data rates and has embedded clock-data recovery (CDR) making this a very reliable method using very few I/Os. This method requires complex silicon to implement, which is only present in specialized GX devices. A less complex method using clock-data synchronization (CDS) with optional dynamic phase alignment (DPA) is High-Speed LVDS. Because of the data rates, the differential LVDS I/O standard is used. An embedded SERDES does the serial-toparallel and parallel-to-serial conversion that uses dedicated silicon not all FPGAs have. For those families

Centering the Clock in the Data Valid Window for Source Synchronous Inputs and Timing Analysis of Source Synchronous Outputs in the Quartus II Software. 2 No DPA support in these devices. 3 Implemented in Logic Cells. Differential LVDS supported with LVDS Sneak Path. 4 Implemented in Logic Cells. 5 This is a practical limit, not a hard limit. Version 2.0 Page 2 of 47 August 2007

without the dedicated silicon, this functionality can be implemented in the core at somewhat slower bit rates. The Source Synchronous method requires very little specialized silicon, only double data rate (DDR) registers in the I/O cell. It is also useful over long board distances since the clock and data are sent together and are usually trace length matched to tight tolerances. Relatively high data throughput can be obtained. However because this is a parallel interface, more I/Os are required. Unlike the first two methods that use an empirical method to determine the maximum timing margin dynamically and therefore cannot, and do not need to be timed, accurate timing analysis is required to ensure proper operation with source synchronous interfaces and there is no dynamic monitoring of timing margin. Despite this, reasonable margins can be obtained at the receiver, even at high rates, to make this unnecessary. As you can see from Table 1 above, since our test design is implemented in a Stratix II device, a more practical method for transferring data would be using High-Speed LVDS with DPA. However we are using the Source Synchronous to demonstrate this technique.

QUESTION: How can I create a simple 144-bit parallel single data rate (SDR) source-synchronous interface between two Altera FPGAs? ANSWER: Observe the simple register-to-register test circuit below in Figure 1. The same test circuit will be used for both FPGAs (except for the PLL output phase shift), one demonstrating the transmit function, and the other demonstrating the receive function. In this circuit, there are 144 data lines that come into registers in input I/O cells (Fast Input Registers), which are pipelined to internal registers, and then pipelined to registers in output I/O cells (DDIO_OUT). The clock comes into a PLL, and then the output of the PLL drives a clock to the data registers, and additionally to a register in the I/O cell that drives the source synchronous clock out (DDIO_OUT).

Figure 1: Simple SDR Test Circuit Schematic Here are the step-by-step instructions on how to set up the source synchronous interface, first for the output transmit side, then for the input receive side. SDR TEST CIRCUIT GIVENS: Stratix II EP2S130F1508 device used to get maximum LVDS I/Os The C3 (fastest) speed grade device used giving lowest max-to-min timing spread.

Version 2.0

Page 3 of 47

August 2007

A 60/40 duty cycle was used on the clock to better demonstrate that there is no timing relationship to the falling edge of the clock in the analysis for SDR. No attempt was made to place any of the I/Os. Keeping outputs close together can help minimize skew. An arbitrary output data skew goal of +/- 125 ps was used here. LVDS was used for the I/O standard on the outputs for the source synchronous output case, and on the inputs for the source synchronous input case. Nominal output pin loads of 10 pf were used. Board data trace lengths are assumed to be matched to the clock trace within +/- inch.

SDR Output (Transmit) FPGA


STEP 1: Drive the output data using the DDIO_OUT (or DDIO_BIDIR) MegaFunction. Use the ALTDDIO_OUT MegaWizard to create a single MegaFunction instantiation for the entire width of the data output bus as shown in Figure 2. Connect the data bus from the FPGA core to both the datain_h and datain_l inputs to the DDIO_OUT instantiation as shown in Figure 1. Connect the output dataout to the output ports of the FPGA.6

Figure 2: ALTDDIO_OUT MegaWizard At the high data rates used in this example, the LVDS differential I/O standard is used for the data and the source synchronous clock for signal integrity reasons; however this is not required for timing analysis. These I/O standard settings can be made in the Assignment Editor. In addition, an output load should be specified for these outputs as shown in Figure 3. Keeping the data and clock output pads close together will improve the skew across the output data bus; however this is not required even at the data rates used in this example. Skews as low as 50 ps can be obtained by assigning the outputs to a single bank in Stratix II devices.

Even though this is SDR, we use DDIO_OUT registers to get the lowest possible output skew. If the lowest possible skew is not required, then standard output registers can be used instead. This method is not recommended for families that dont support DDIO_OUT registers in hard silicon (e.g. Cyclone, Cyclone II, etc.). For these devices, use a single register for the data out. Skew numbers will be larger however. Version 2.0 Page 4 of 47 August 2007

Figure 3: Output Assignments STEP 2: Drive the output clock using the DDIO_OUT MegaFunction. Use the ALTDDIO_OUT MegaWizard to create a MegaFunction instantiation with a bus width of 1. Connect the datain_h input to GND, and the datain_l to VCC. Connecting these inputs to the opposite polarity will have the effect of creating an output clock 180 degrees out of phase with the data, or centeraligning the rising edge of clock with respect to the data valid window7. Connect the output of the MegaFunction instantiation to the clock out port of the FPGA. STEP 3: Connect a clock to the clock input of the data out and clock out registers. Connect the input clock pin of the DDIO_OUT registers for both the data registers and the clock registers to the same clock source. In this example, a single output from a PLL was used, however a clock port from the FPGA could also be used, or any other clock source in the FPGA. It is however important to make sure that the clock skew is low, so make sure that this clock goes through a global clock buffer. Separate clock outputs from the PLL can be used, but this is not necessary as one of the nice things about this approach is that there is no phase shift that has to be calculated for the clock out. Using DDIO_OUT registers for both the clock and the data and swapping the inputs on the clock DDIO_OUT input data pins creates a centeraligned clock-data relationship that is very accurate. STEP 4: Set the output constraints for the design. TimeQuest uses SDC commands for timing constraints, so an SDC file, source_sync_out.sdc will be used to set these output constraints in TimeQuest. Below is an example of how to set up the SDC file. The first section sets up the input clock, clk_in. In the next section, the PLL output is defined from which the output clock, clk_out is created. Note the use of the -invert option since the inputs to the DDIO_OUT were wired up to invert the clock. The next section is used to specify the output clock jitter on the clk_out port8. The value for the clock uncertainty set here was determined by executing the derive_clock_uncertainty SDC command.9 The last section constrains the outputs to a very tight skew range to guide the fitter. Since this is centeraligned source synchronous data being output, our clock offset is set to half the clock period. The output skew is set to 125 ps, which sets up the early and late margin settings.

See Appendix A for using edge-aligned output or for using a clock directly from a PLL output instead of using a DDIO_OUT to generate the clock. 8 Note that there is only setup uncertainty between the clk0 and clk_out because this is an intra-clock domain transfer. Only inter-clock domain transfers require both setup and hold uncertainty. 9 The derive_clock_uncertainty command is required for HardCopy II, but can also be used with Stratix II and future families (e.g. Stratix III, Cyclone III, etc.) This command will not work for any other device families. It is usually not required for Stratix II due to the guard band added to the I/O delays which covers the jitter in all but the most uncommon cases. Version 2.0 Page 5 of 47 August 2007

Figure 4: SDR Source Synchronous Output SDC File10


10

Refer to Clock Setup and Hold Slack Analysis Explained Tech Note fpr OMD/OmD equations. Page 6 of 47 August 2007

Version 2.0

These are then used to determine the output max delay (OMD) and output min delay (OmD) settings to constrain the outputs11. Figure 5 below shows the output timing and the relationships of the settings.
Clock Period Late Margin Output Min Delay (-) Output Max Delay (+)

clk_out
Clock Offset

Early Margin

data_out

Data Valid Window


Clock Period / 2 - Output Skew

Data Valid Window

+ Output Skew

Figure 5: SDR Output Timing Diagram STEP 5: Set up Quartus II and compile the project. Open Quartus II Settings and select TimeQuest for timing analysis processing as shown in Figure 6.

Figure 6: Timing Analysis Settings


11

Throughout this Tech Note, the FPGA-Centric or Skew Method is used to calculate the input and output timing constraints as opposed to the System-Centric method. This is because most source synchronous inputs and outputs are specified with a clock offset and a skew. For a description of the System-Centric method and an explanation of the differences in methodology, refer to Altera AN 433. Version 2.0 Page 7 of 47 August 2007

Next, add the SDC file to the project. Save these settings and compile the project. STEP 6: Analyze the timing using TimeQuest. Open the TimeQuest GUI. A Tcl script was created to run the timing analysis as shown in Figure 7.

Figure 7: Tcl Script for SDR Output Timing Analysis The first section of this script initializes the timing netlist in preparation for reporting. The script is designed to be run multiple times switching between slow and fast timing models if desired. The SDC file is read in, and then the constraints are applied with the update_timing_netlist command.

Version 2.0

Page 8 of 47

August 2007

The Early Margin, Late Margin, OMD, and OmD are reported to the console window. Next the setup and hold slack is determined for the outputs and reported to the console window. Lastly, the output skew and balance of slacks are determined and reported to the console window. To run this script, type source source_sync_out.tcl in the console window. The results are shown below in Figure 8.

Figure 8: SDR Output Timing Results Slow Timing Models Note that these results are using the default Slow timing models. There are 95 ps of setup slack and 4 ps of hold slack with our OMD and OmD constraints, and 151 ps of data output skew. To modify the script to run the Fast timing models, change the set model slow command to set model fast and re-run the script. The results are shown below in Figure 9.12

Figure 9: SDR Output Timing Results Fast Timing Models There are 95 ps of setup slack and 23 ps of hold slack with our OMD and OmD constraints, and 132 ps of output data skew using the Fast timing models. Since the skew is greater using the Slow timing models, 151 ns will be the worst case skew. The results for the output timing analysis show that the 144 data outputs have met the 125 ps skew requirements from the rising edge of the source synchronous clk_out clock and the overall data skew at the outputs is no greater than 151 ps.

SDR Input (Receive) FPGA


STEP 1: Receive the data in registers in the I/O cells.

12

Both Slow and Fast timing model analysis can be run in the same Tcl script by making use of the set_operating_conditions command. See Quartus II Help for the syntax on how to use this command. This method wasnt used here to keep the script simpler for this document. Version 2.0 Page 9 of 47 August 2007

As mentioned, the same design used for the Transmit FPGA is used for the Receive FPGA for simplicity sake; however the inputs are constrained instead of the outputs. While not a requirement, tighter timing can be obtained by assigning the receiving registers to the I/O cells. This is accomplished with the Fast Input Register assignment. Also, the differential LVDS I/O assignments should be made for the data and input clock at these speeds for signal integrity reasons as mentioned before.

Figure 10: Input Assignments STEP 2: Set up the receive clock to the data registers. This is the key to the input timing. Create a PLL instantiation using the ALTPLL MegaWizard with an output clock at the same frequency of the input clock, 550 MHz in this case, and with 0 ns offset. The important thing here is to set the feedback mode to Source Synchronous. Using this PLL feedback mode ensures that the clock-data relationship at the inputs is preserved at the registers. It also simplifies the process because a PLL offset phase shift does not need to be calculated to meet input timing. Connect the clk_in input clock port to the input pin of the PLL instantiation and the output pin of the PLL instantiation to the clock pin on the receive data registers.

Figure 11: PLL Instantiation using Source Synchronous Operation Mode

Version 2.0

Page 10 of 47

August 2007

STEP 3: Set the input constraints for the design. Again we use an SDC file, source_sync_in.sdc to set the input constraints in TimeQuest. Figure 12 shows an example of how to set up the SDC file. The first section sets up the input clock, clk_in. In the next section, the PLL output is defined. There is no offset on the PLL output since we are trying to maintain the relationship between the clock and data at the ports13. The next section is used to specify the input clock jitter on the clk_in port. The value for the clock uncertainty set here was determined by executing the derive_clock_uncertainty SDC command. Since this uncertainty is between the input and output of a PLL, the uncertainty is larger and affects both setup and hold since this is an inter-clock domain transfer. Next, set the input max delay (IMD) and input min delay (ImD) constraints. The clock offset is set to value at the output of the Transmit FPGA, which in this case is half a period since it is center-aligned. The output skew is obtained from the Transmit FPGAs SDC file shown in Figure 4. The data trace skew of 83 ps represents inch of variance, a number that is easily obtainable for most board layout auto routers. The sum of these two numbers represents the data skew at the inputs. The setup (early margin) and hold (late margin) time are balanced and equal to half the period minus the data skew at the inputs. From the setup and hold requirements, we can set the IMD and ImD constraints as shown in the file. These values are then reported to the console window.

13

See Appendix B for using center-aligned input or for using no PLL on the input clock. Page 11 of 47 August 2007

Version 2.0

Figure 12: SDR Source Synchronous Input SDC File14 Figure 13 below shows the input timing and the relationships of the settings. It is interesting to note that the difference between the IMD and ImD is equal to the total skew on the data bus.
Clock Period Input Max Delay (+) Input Min Delay (+) Early Margin

clk_in
Clock Offset

Late Margin - Trace Skew + Trace Skew

data_in

Data Valid Window


Clock Period / 2 - Output Skew

Data Valid Window

+ Output Skew

Figure 13: SDR Input Timing Relationships STEP 4: Set up Quartus II and compile the project.
14

Refer to Clock Setup and Hold Slack Analysis Explained Tech Note for OMD/OmD equations. Page 12 of 47 August 2007

Version 2.0

Open Quartus II Settings and select TimeQuest for timing analysis processing as shown in Figure 6.. Next, add the SDC file to the project. Save these settings and compile the project. STEP 5: Analyze the timing using TimeQuest. Open the TimeQuest GUI. A Tcl script was created to run the timing analysis as shown in Figure 14. Again, the first section of this script initializes the timing netlist in preparation for reporting. The script is designed to be run multiple times switching between slow and fast timing models if desired. The SDC file is read in, and then the constraints are applied with the updata_timing_netlist command. The Early Margin, Late Margin, IMD, and ImD are reported to the console window. Next the setup and hold slack is determined for the inputs and reported to the console window. Lastly, since our goal is to balance the setup and hold slack, the difference between the slack values is printed to the console window.

Figure 14: Tcl Script for SDR Input Timing Analysis To run this script, type source source_sync_in.tcl in the console window.

Version 2.0

Page 13 of 47

August 2007

Figure 15: SDR Input Timing Results Slow Timing Models

Figure 16: SDR Input Timing Results Fast Timing Models The results are shown above in Figure 15 for the Slow timing models. There are 283 ps of setup slack and 474 ps of hold slack with our IMD and ImD constraints. The differences between the slack values is 191 ps. The Fast timing model results are obtained by modifying the script by changing the set model slow command to set model fast and re-running the script. The results are shown below in Figure 16. There are 425 ps of setup slack and 536 ps of hold slack. The differences between the slack values is 111 ps. To consider the worst case scenario, use the worst-case setup slack using the Slow timing models (283 ps) and the worst-case hold slack using the Fast timing models (536 ps). The difference between these two is only 253 ps. The results for the input timing analysis show that the 144 data inputs have met the setup and hold requirements with almost 300 ps of margin, and that the slack values are balanced to within 253 ps.

QUESTION: What changes would I have to make if I wanted to make the interface DDR instead of SDR? ANSWER: Running at double data rate will require a few changes in both the transmit and receive FPGAs. Since the data is transitioning at twice the clock frequency, the center of the data valid window is now at 90 (or -90) degrees instead of 180. Since the shift required is not half a period as in the SDR case, we can no longer take advantage of using the DDIO_OUT to invert the clock on the transmit side. We will need to use the PLL to shift the clock a quarter of a period. We can do this in either the transmit or the receive FPGA. In this example, we will shift in the receive FPGA, so we will be transmitting edge-aligned data instead of center-aligned data. Therefore, the changes on the transmit side are that the clock DDIO_OUT will not be inverted, and the data DDIO_OUT will have separate inputs. On the receive side, the biggest change is that now the data must be received in DDIO_IN (or DDIO_BIDIR) registers instead of regular I/O registers. In addition, because this is now double data rate, the clock speed must be lowered to 400 MHz (which is an effective data rate of 800 Mbps for each data line and the PLL output must be shifted by a quarter period.

Version 2.0

Page 14 of 47

August 2007

The modified circuit for DDR is shown below in Figure 17. The same test circuit will be used for both FPGAs (except for the PLL output phase shift). In this circuit, there are 144 data lines that come into registers in input I/O cells (DDIO_IN), which are pipelined to two sets of internal registers, and then pipelined to registers in output I/O cells (DDIO_OUT). Note that now there is a separate data stream connected to the datain_h and datain_l inputs to the DDIO_OUT instantiation. You will also see that the high and low data streams are swapped from the input DDIO_IN. This will be explained in detail when discussing the input FPGA. The clock comes in to a PLL, and then the output of the PLL drives a clock to the data registers, and additionally to a register in the I/O cell that drives the source synchronous clock out (DDIO_OUT).

Figure 17: Simple DDR Test Circuit Schematic Here are the step-by-step instructions on how to set up the source synchronous interface, first for the output transmit side, then for the input receive side. DDR TEST CIRCUIT GIVENS: Stratix II EP2S130F1508 device used to get maximum LVDS I/Os The C3 (fastest) speed grade device used giving lowest max-to-min timing spread. A 50/50 duty cycle was used on the clock, however provisions are made to vary the duty cycle and observe the effects.. No attempt was made to place any of the I/Os. Keeping outputs close together can help minimize skew. An arbitrary output data skew goal of +/- 125 ps was used here. LVDS was used for the I/O standard on the outputs for the source synchronous output case, and on the inputs for the source synchronous input case. Nominal output pin loads of 10 pf were used. Board data trace lengths are assumed to be matched to the clock trace within +/- inch.

DDR Output (Transmit) FPGA


STEP 1: Drive the output data using the DDIO_OUT (or DDIO_BIDIR) MegaFunction. Use the ALTDDIO_OUT MegaWizard to create a single MegaFunction instantiation for the entire 144-bit width of the data output bus as shown in Figure 17. Connect the datain_l data bus from the FPGA core

Version 2.0

Page 15 of 47

August 2007

to the datain_h inputs and the datain_h data bus from the FPGA core to the datain_l inputs of the DDIO_OUT instantiation. Connect the output dataout to the output ports of the FPGA. Again, LVDS differential I/O standard is used for the data for signal integrity reasons at these high data rates. The I/O standard and Pin Load settings are made in the Assignment Editor and are the same as those shown in Figure 3. Keeping the data and clock output pads close together will improve the skew across the output data bus, however this is not required even at the data rates used in this example. Skews as low as 50 ps can be obtained by assigning the outputs to a single bank in Stratix II devices. STEP 2: Drive the output clock using the DDIO_OUT MegaFunction. Use the ALTDDIO_OUT MegaWizard to create a MegaFunction instantiation with a bus width of 1. Connect the datain_h input to VCC, and the datain_l to GND. Connecting these inputs to the same polarity will have the effect of creating an output clock that is in phase with the data, or edge-aligning the rising edge of clock with respect to the data valid window. Using edge-alignment instead of centeralignment as we did in the SDR case saves us from requiring a second output from the PLL15. Connect the output of the MegaFunction instantiation to the clock out port of the FPGA. STEP 3: Connect a clock to the clock input of the data out and clock out registers. Connect the input clock pin of the DDIO_OUT registers for both the data registers and the clock registers to the same clock source. In this example, a single output from a PLL was used, however a clock port from the FPGA could also be used, or any other clock source in the FPGA. It is however important to make sure that the clock skew is low, so make sure that this clock goes through a global clock buffer. Separate clock outputs from the PLL can be used, but this is not necessary as one of the nice things about this approach is that there is no phase shift that has to be calculated for the clock out. Using DDIO_OUT registers for both the clock and the data creates an edge-aligned clock-data relationship that is very accurate. STEP 4: Set the output constraints for the design. TimeQuest uses SDC commands for timing constraints, so an SDC file, source_sync_out.sdc will be used to set these output constraints in TimeQuest. Figure 18 below shows the output timing and the relationships of the settings. Note that the DDR case is much more involved than the SDR case. The first section sets up the input clock, clk_in. Note that since DDR references both clock edges, duty cycle will come into play and has been added to all the clock definitions. In the next section, the PLL output is defined, and then the output clock, clk_out is created from this. The next section is used to specify the output clock jitter on the clk_out port. The value for the clock uncertainty set here was determined by executing the derive_clock_uncertainty SDC command. Note that there is only setup uncertainty between the clk0 and clk_out because this is an intra-clock domain transfer. The next section constrains the clock as a data input. This requires an understanding of how the DDIO_OUT MegaFunction operates to explain. As shown in Figure 19, both the datain_h and datain_l registers are clocked on the rising edge of outclock. The register outputs are then multiplexed to dataout using the outclock as the select line. This provides three separate pathways to dataout as shown in the diagram. The circuit is designed such that the outclock is the critical path, which provides for glitch-free operation of the circuit. However, the timing netlist lists all three paths as taking the same amount of time. Therefore, any one of these paths could be used for timing analysis, and the results would be the same. The problem with using one or both of the register paths is that only a rising edge output will be shown since neither of the registers is clocked on the falling edge. This would be fine except in the case when dealing with non-50% duty cycles. So, in order to have both edges reported, we really would prefer to use the outclock as the data selector for timing analysis instead of the register paths.
15

See Appendix D for using center-aligned output or for using a clock directly from a PLL output instead of using a DDIO_OUT to generate the clock. Version 2.0 Page 16 of 47 August 2007

Version 2.0

Page 17 of 47

August 2007

Version 2.0

Page 18 of 47

August 2007

Figure 18: DDR Source Synchronous Output SDC File Since outclock feeds a clock input, TimeQuest does not recognize this as a data path through the multiplexer. In order to get it to do so, we make a set_input_delay setting to the clk_in port with the clk0 PLL output as the reference clock. Doing this trick makes TimeQuest treat the clk0 as both a clock source and a data source. The -source_latency_included option is used so that TimeQuest does not count the delay through the PLL twice16.

Figure 19: DDIO_OUT Output Timing Paths The next section constrains the outputs to a very tight skew range to guide the fitter. Since this is edgealigned source synchronous data being output, our clock offset is set to 0. The output skew is set to 125 ps, which sets up the early and late margin settings. These are then used to determine the output max delay (OMD) and output min delay (OmD) settings to constrain the outputs17. Since this is DDR and we are using both the rising and falling edges to clock the data, we need to repeat the OMD and OmD settings using the -clock_fall option. The next section is for cutting paths we dont want to see. This section is optional since without it, the worst case slack will still be reported correctly. We make set_false_path settings to cut those paths which are not relevant. Since this is edge-aligned, we are launching and latching on the same edge, i.e. rise-to-rise and fall-to-fall. Therefore, for setup we want to cut the rise-to-fall and fall-to-rise paths. For
16

For Quartus II version 7.2 and beyond, TimeQuest will recognize this as a data path, and no constraints will be necessary for it to treat the clock as data. 17 Throughout this Tech Note, the FPGA-Centric or Skew Method is used to calculate the input and output timing constraints as opposed to the System-Centric method. This is because most source synchronous inputs and outputs are specified with a clock offset and a skew. For a description of the System-Centric method and an explanation of the differences in methodology, refer to Altera AN 433. Version 2.0 Page 19 of 47 August 2007

hold we want to cut the rise-to-rise and fall-to-fall paths. In addition, we also cut the paths from the DDIO_OUT registers to the data_out ports (the paths shown in red in Figure 19). The last section is used to correct the setup and hold relationships by using set_multicycle_path settings. With DDR that is edge-aligned on the outputs, the same edge that launches the data also latches it. The default is next edge capture, so a destination multicycle setup of 0 is used to correct for this. It needs to be made on both the rising and falling edges. Also, if not made specifically on the rise-to-rise and fall-to-fall edges, then the hold relationships are not correct.
Clock Period Clock Period / 2 Output Max Delay (-) Output Min Delay (-) Late Margin Early Margin Clock Offset

clk_out

data_out

Data Valid Window

Data Valid Window


- Output Skew

Data Valid Window

+ Output Skew

Figure 20: DDR Output Timing Diagram STEP 5: Set up Quartus II and compile the project. Open Quartus II Settings and select TimeQuest for timing analysis processing as shown in Figure 6. Next, add the SDC file to the project. Save these settings and compile the project. STEP 6: Analyze the timing using TimeQuest. Open the TimeQuest GUI. A Tcl script was created to run the timing analysis as shown in Figure 7. The first section of this script initializes the timing netlist in preparation for reporting. The script is designed to be run multiple times switching between slow and fast timing models if desired. The SDC file is read in, and then the constraints are applied with the update_timing_netlist command. The Early Margin, Late Margin, OMD, and OmD are reported to the console window. Next the setup and hold slack is determined for the outputs and reported to the console window. Lastly, the output skew and balance of slacks are determined and reported to the console window. To run this script, type source source_sync_out.tcl in the console window. The results are shown below in Figure 21.

Version 2.0

Page 20 of 47

August 2007

Figure 21: DDR Output Timing Results Slow Timing Models Note that these results are using the default Slow timing models. There are 95 ps of setup slack and 4 ps of hold slack with our OMD and OmD constraints, and 151 ps of data output skew. To modify the script to run the Fast timing models, change the set model slow command to set model fast and re-run the script. The results are shown below in Figure 22.

Figure 22: DDR Output Timing Results Fast Timing Models There are 95 ps of setup slack and 23 ps of hold slack with our OMD and OmD constraints, and 132 ps of output data skew using the Fast timing models. Since the skew is greater using the Slow timing models, 151 ns will be the worst case skew. The results for the output timing analysis show that the 144 data outputs have met the 125 ps skew requirements from the rising edge of the source synchronous clk_out clock and the overall data skew at the outputs is no greater than 151 ps.

DDR Input (Receive) FPGA


STEP 1: Receive the data in registers in the I/O cells. As mentioned, the same design used for the Transmit FPGA is used for the Receive FPGA for simplicity sake; however the inputs are constrained instead of the outputs. Since this is now DDR input, we need to use the ALTDDIO_IN MegaWizard to create a single MegaFunction instantiation for the entire width of the data input bus. Also, the differential LVDS I/O assignments should be made for the data and input clock at these speeds for signal integrity reasons as mentioned before. These I/O standard settings can be made in the Assignment Editor. Note in the schematic shown in Figure 17 that the dataout_h output bus of the DDIO_IN instantiation is connected through a register to the datain_l of the DDIO_OUT and the dataout_l output bus is connected through a register to the datain_h. This crossing of busses requires an explanation. Figure 23 shows a behavioral simulation of a DDIO_OUT register. Note that the even bit stream is connected to the datain_h input and the odd bit stream is connected to the datain_l input of the DDIO_OUT instantiation to produce a bitordered serial stream at the dataout output. Figure 24 shows a behavioral simulation of a DDIO_IN register. The bit-ordered serial stream from the DDIO_OUT register feeds the datain input. The even bit stream is now on the datout_l output and the odd bit stream is on the dataout_h output. This reversal of streams is due to the fact that the DDIO_OUT puts the high data out before the low data and the DDIO_IN captures the falling edge data on the low output followed by the rising edge data on the high output. It is not possible to avoid this bus reversal without adding an extra pipeline stage to the high data stream. Because of this bus reversal, the datain_h and datain_l pats are swapped on the Tx side as shown in the source schematic in Figure 17.

Version 2.0

Page 21 of 47

August 2007

Figure 23: DDR DDIO_OUT Behavioral Simulation Waveforms

Figure 24: DDR DDIO_IN Behavioral Simulation Waveforms STEP 2: Set up the receive clock to the data registers. This is the key to the input timing. Create a PLL instantiation using the ALTPLL MegaWizard with an output clock at the same frequency of the input clock, 400 MHz in this case, and with -90 degree offset and set the feedback mode to Source Synchronous. Using this PLL feedback mode ensures that the clockdata relationship at the inputs is preserved at the registers. In order for the high and low data streams to line up on the input side, the data launched by the rising edge at the output must be captured by the falling edge on the input. You can see this relationship in Figure 24. This is an opposite-edge capture interface on the input side. Connect the clk_in input clock port to the input pin of the PLL instantiation and the output pin of the PLL instantiation to the clock pin on the receive data registers. STEP 3: Set the input constraints for the design. Again we use an SDC file, source_sync_in.sdc to set the input constraints in TimeQuest. Figure 25 shows an example of how to set up the SDC file. The first section sets up the input clock, clk_in. When constraining source synchronous inputs for DDR, it makes things easier if we use a virtual clock to specify the source clock because the IMD and ImD settings stay the same and the difference in phase between the clock and data is handled with the virtual clock. Since in this case, the transmitting FPGA is sending the data edge-aligned, the virtual clock and the actual input clock have the same rise and fall edges. The

Version 2.0

Page 22 of 47

August 2007

usefulness of this technique becomes more apparent when looking at other clock-data relationships on the input side.

Version 2.0

Page 23 of 47

August 2007

Version 2.0

Page 24 of 47

August 2007

Figure 25: DDR Source Synchronous Input SDC File In the next section, the PLL output is defined. There is a -90 degree offset on the PLL output since we are trying to center align the clock in the data valid window with the opposite edge18. We do not need to create
18

See Appendix D for using edge-aligned input or for using no PLL on the input clock. Page 25 of 47 August 2007

Version 2.0

a generated clock for the clk_out port since this SDC file is only constraining and checking the input side. The next section is used to specify the input clock jitter on the clk_in port. Since we are using a virtual clock, the clock uncertainty is actually applied between the virtual clock, vclk_in and the clk0 PLL output clock. The value for the clock uncertainty set here was determined by executing the derive_clock_uncertainty SDC command. Since this uncertainty is between the input and output of a PLL, the uncertainty is larger and affects both setup and hold since this is an inter-clock domain transfer. Next, set the input max delay (IMD) and input min delay (ImD) constraints. The clock offset is always set to 0 for edge-aligned data. Any offset at the Transmit FPGA is handled by the offset in the virtual clock setting. The output skew is entered from the Transmit FPGAs SDC file shown in Figure 25. The data trace skew of 83 ps represents inch of variance, a number that is easily obtainable for most board layout auto routers. The sum of these two numbers represents the data skew at the inputs. The setup (early margin) and hold (late margin) time are not balanced since the data arrives edge-aligned. From the setup and hold requirements, we can set the IMD and ImD constraints as shown in the file. These values are then reported to the console window. Figure 26 below shows the input timing and the relationship of the settings. It is interesting to note that the difference between the IMD and ImD is equal to the total skew on the data bus.
Clock Period Input Min Delay (-) Late Margin (-) Input Max Delay (+) Early Margin Clock Offset - Trace Skew + Trace Skew

clk_out

data_out

Data Valid Window

Data Valid Window


- Output Skew

Data Valid Window

Clock Period / 2

+ Output Skew

Figure 26: DDR Input Timing Relationships STEP 4: Set up Quartus II and compile the project. Open Quartus II Settings and select TimeQuest for timing analysis processing as shown in Figure 6.. Next, add the SDC file to the project. Save these settings and compile the project. STEP 5: Analyze the timing using TimeQuest. Open the TimeQuest GUI. A Tcl script was created to run the timing analysis as shown in Figure 14. Again, the first section of this script initializes the timing netlist in preparation for reporting. The script is designed to be run multiple times switching between slow and fast timing models if desired. The SDC file is read in, and then the constraints are applied with the update_timing_netlist command. The Early Margin, Late Margin, OMD, and OmD are reported to the console window. Next the setup and hold slack is determined for the inputs and reported to the console window. Lastly, since our goal is to balance the setup and hold slack, the difference between the slack values is printed to the console window. To run this script, type source source_sync_in.tcl in the console window.

Version 2.0

Page 26 of 47

August 2007

Figure 27: DDR Input Timing Results Slow Timing Models

Figure 28: DDR Input Timing Results Fast Timing Models The results are shown above in Figure 15 for the Slow timing models. There are 283 ps of setup slack and 474 ps of hold slack with our IMD and ImD constraints. The differences between the slack values is 191 ps. The Fast timing model results are obtained by modifying the script by changing the set model slow command to set model fast and re-running the script. The results are shown below in Figure 16. There are 425 ps of setup slack and 536 ps of hold slack. The differences between the slack values is 111 ps. To consider the worst case scenario, use the worst-case setup slack using the Slow timing models (283 ps) and the worst-case hold slack using the Fast timing models (536 ps). The difference between these two is only 253 ps. The results for the input timing analysis show that the 144 data inputs have met the setup and hold requirements with almost 300 ps of margin, and that the slack values are balanced to within 253 ps.

Summary In this document, it has been demonstrated that through a relatively simple process, very high data rates can be achieved across wide busses for transferring data between two Altera FPGAs with adequate balanced setup and hold margin on the receive side using a source synchronous interface. This can be accomplished without any complex calculations for PLL clock phase shifting on both the transmit and receive side that was detailed in previous Tech Notes on this subject. This is true for both SDR and DDR cases. On the transmit side, the key is to use the ALTDDIO_OUT MegaFunction to create the data output registers and the forwarded clock output with low skew values. For the SDR case, creating a clock that has a rising edge centered in the data valid window only requires connecting the two inputs to the DDIO_OUT clock register to the opposite polarity voltage rails. For the DDR case, creating a clock that has a rising edge centered in the data valid window would require adding a second PLL output, so we use edge alignment for the clock and data and center it on the receive side instead.

Version 2.0

Page 27 of 47

August 2007

On the receive side, the key is to place the receiving data registers in the I/O cells. Simple registers can be used for SDR, and we use the ALTDDIO_IN MegaFunction for DDR. A PLL is used to generated the receive clock to these registers in Source Synchronous feedback mode, which maintains the clock-data relationship at the pins to the registers in the device. For the SDR case, no shift on this clock is required since the data is already center-aligned at the input ports. For the DDR case, a -90 phase shift is used on the PLL output clock to center that clock in the data valid window since the data is received edge-aligned from the Transmit FPGA. When going between the ALTDDIO_OUT in one FPGA and the ALTDDIO_IN in another FPGA, this ensures that the data in the two received data streams is aligned properly. In both the SDR and DDR cases, we took into account the output and input clock jitter. For the SDR case duty cycle doesnt matter, but for the DDR case we included a setting for duty cycle. In this particular example at these high clock speeds, varying the duty cycle will cause timing failures on the setup (hold times are unaffected by changes in duty cycle). At lower clock speeds, duty cycle can be varied and the SDC files shown are set up to handle this. Because of the method used on the output side of using the clock as data though the multiplexer in the DDIO_OUT instantiation, both rising edge and falling edge clock launch cases can be examined when varying the duty cycle. This Tech Note demonstrates that using these methods, very high bandwidth interfaces can be created between two Altera FPGAs over long distances on a PCB board. While the focus of this Tech Note was between two Altera FPGAs, these methods can be used in general even when one of the devices, either transmit or receive, are not an Altera FPGA. In the Appendices that follow, different methods are examined that provide more flexibility in implementation to cover a wider variety of situations with source synchronous interfaces in an attempt to provide a document that can be used for almost all cases.

Revision History
Tech Note.

The table below displays the revision history for this

Date March 2007 June 2007

Documen t Version 1.0 2.0 Initial Revision Major Revision

Changes Made

Added DDR input and output cases including clock uncertainty and duty cycle Added clock uncertainty to SDR cases to model jitter Added Appendices for alternate cases

Version 2.0

Page 28 of 47

August 2007

APPENDIX A Alternate Methods for SDR Outputs

The method used in the Tech Note for the SDR case on the output side was to use one PLL clock output for generating both the data_out and clk_out, and to have the clock center-aligned in the data valid window. Two other cases will be considered in this appendix. First, we will examine the case where the clock is edge-aligned with respect to the data. This would be helpful in those cases where an Altera FPGA is connected to a non-Altera device that requires edge-aligned data on the input side. The second case considered is when the output clk_out is driven directly from a PLL output instead of using a DDIO_OUT register to generate the clock. This might be useful to some customers in situations where they are running at lower clock speeds and output skew between the clock and data is not that crucial. CASE 1: Edge-Aligned Output To create edge-aligned clock and data outputs, the only circuit schematic change required is to reverse the inputs on the DDIO_OUT that generates the clock as shown in Figure 29 below. Now the datain_h input is connected to the VCC voltage rail and the datain_l input is connected to the GND rail.

Figure 29: Circuit Change for SDR Edge-Aligned Clock Output The SDC file must be modified to take this change into account. There are only 3 minor changes required to the SDC file. The first change is that the create_generated_clock command for the clk_out should have the -invert option removed since we are not inverting the clock now. Secondly, the clock offset value needs to be changed from half a period to 0 since we are now edge-aligned. Lastly, we need to add the destination multicycle setup of 0 similar to the DDR case for edge-alignment. These changes are shown below in Figure 30.

Version 2.0

Page 29 of 47

August 2007

Version 2.0

Page 30 of 47

August 2007

Figure 30: SDC File for Edge-Aligned DDR Output

Version 2.0

Page 31 of 47

August 2007

Figure 31 below shows the output timing and the settings relationships.
Clock Period Late Margin Output Min Delay (-) Output Max Delay (-)

clk_out

Early Margin

data_out

Data Valid Window


Clock Period / 2 - Output Skew

Data Valid Window

Figure 31: SDR Edge-Aligned Output Timing Diagram

+ Output Skew

The same Tcl script shown in Figure 7 can be used to run in TimeQuest. The results are shown below for the Slow and Fast timing models in Figure 32 and Figure 33, and the slacks and skews are the same as with the center-aligned output case.

Figure 32: SDR Edge-Aligned Output Report Slow Models

Figure 33: SDR Edge-Aligned Output Report Fast Models CASE 2: PLL Direct Output Clock When driving the clock directly out from a PLL, in Stratix II devices, the clock frequency is limited to 281 MHz. Another problem with this method is that the clock is not in phase with the data at the outputs. This is due to the delay difference between the clock being driven by an output buffer, and the clock traversing through the multiplexer select of the DDIO_OUT driving the data out. In order to compensate for this

Version 2.0

Page 32 of 47

August 2007

difference, another output from the PLL must be used to drive the clock out so that the offset can be corrected for. There are several ways to determine what the offset is between the clk_out and data_out outputs. One way to do this is to run the Report Datasheet report in TimeQuest and look at the difference in the tPD between the clk0 PLL clock output and the data_out bus and the new clk1 PLL clock output and the clk_out output. When doing this using the slow models on the SDR test design, the offset is 594 ps. When doing this using the fast models on the SDR test design, the offset is -69 ps. As you can see, there is a much larger variation in the offset between the two models with this method. In order to meet timing for both models, the output skew must be changed from 125 ps to 525 ps. Despite this much larger value, since the clock frequency is only half what it was before, we can still make timing on the input side with this larger output skew value. Due to the lower clock frequency limitation and the larger output skew, this is not the preferred method to use in most cases.19

19

Except in the case of HardCopy II. Page 33 of 47 August 2007

Version 2.0

APPENDIX B Alternate Methods for SDR Inputs

The method used in the Tech Note for the SDR case on the input side was to use a PLL clock output for clocking the data_in bus that is center-aligned in the data valid window at the input ports. Two other cases will be considered in this appendix. First, we will examine the case where the data is edge-aligned at the input ports. This would be helpful in those cases where a non-Altera device that generates edge-aligned data on the output is connected to an Altera FPGA. The second case considered is when the input clk_in directly clocks the input registers without involving a PLL. This might be useful to some customers in situations where they or they have run out of PLLs. CASE 1: Edge-Aligned Input With edge-aligned clock and data inputs, the only circuit schematic change required is to add a 180 degree phase shift on the PLL clk0 output. The SDC file must be modified to take this change into account. There are only 2 minor changes required to the SDC file. The first change is to add a -phase 180 options to the generated clock for the clk0 output from the PLL. Secondly, the clock offset value needs to be changed from half a period to 0 since we are now edge-aligned at the inputs. These changes are shown below in

Version 2.0

Page 34 of 47

August 2007

Figure 34: SDC File for SDR Edge-Aligned Input


Clock Period Input Min Delay (-) Late Margin (-)

clk_in
Clock Offset

Early Margin - Trace Skew

Input Max Delay (+)

+ Trace Skew

data_in

Data Valid Window


- Output Skew Clock SDR / 2 Figure 35:PeriodEdge-Aligned Inputs Timing Diagram + Output Skew

Version 2.0

Page 35 of 47

August 2007

Figure 35 above shows the output timing and the settings relationships. The same Tcl script shown in Figure 14 can be used to run in TimeQuest. The results are shown below for the Slow and Fast timing models in Figure 36 and Figure 37 below, and the slacks and skews are the same as with the center-aligned output case.

Figure 36: SDR Edge-Aligned Input Report Slow Models

Figure 37: SDR Edge-Aligned Input Report Fast Models CASE 2: Direct Input Clock without PLL When source synchronous data arrives at the receiving FPGA center-aligned, using a PLL on the input clock can offer more flexibility, but is not required. Consider the circuit below for the input case.

Figure 38: SDR Test Circuit No PLL

Version 2.0

Page 36 of 47

August 2007

The SDC file must be modified for the new circuit. There are only two simple changes. The first is to remove the create_generated_clock constraint, and the second is to remove the clock uncertainty constraints. The generated clock is not needed since the PLL was removed, and since the PLL was removed, the clock uncertainty can also be removed. Running derive_clock_uncertainty shows that there is no addition uncertainty needed with a direct clock input. The Tcl script used to report the timing must be modified as indicated below in Figure 39. The report_timing commands must be modified to report from the data_in ports since there is no unique clock to clock setting that will give the same results.

Figure 39: Tcl Script for SDR Input Timing Analysis No PLL At 550 MHz, there may be some internal clock paths not meeting timing, but the input ports are still all meeting timing with plenty of margin to spare as shown in Figure 40 and Figure 41 below.

Version 2.0

Page 37 of 47

August 2007

Figure 40: SDR Center-Aligned Input Report without PLL Slow Models

Figure 41: SDR Center-Aligned Input Report without PLL Fast Models With both the Slow and Fast timing models, there are over 500 ps of margin and the slacks are balanced to around within 50 ps. The advantage of using this method for this case is that the clock uncertainty is not required, hence the margins are larger. The Quartus II Fitter does a good job of balancing the clock and data delays to meet these stringent requirements without the need for a PLL.

Version 2.0

Page 38 of 47

August 2007

APPENDIX C Alternate Methods for DDR Outputs

The method used in the Tech Note for the DDR case on the output side was to use one PLL clock output for generating both the data_out and clk_out, and to have the clock edge-aligned in the data valid window. Two other cases will be considered in this appendix. First, we will examine the case where the clock is center-aligned with respect to the data. This would be helpful in those cases where an Altera FPGA is connected to a non-Altera device that requires centeraligned data on the input side. The second case considered is when the output clk_out is driven directly from a PLL output instead of using a DDIO_OUT register to generate the clock. This might be useful to some customers in situations where they are running at lower clock speeds and output skew between the clock and data is not that crucial. CASE 1: Center-Aligned Output To create center-aligned clock and data outputs, a second output must be added to the PLL with a 90 degree phase shift. Since the data is clocked on both edges of the clock, the center of the data valid window is at 90 degrees with DDR as opposed to 180 degrees with SDR.

Figure 42: Simple DDR Test Circuit for Center-Aligned Output The SDC file must be modified for the new test circuit. A new create_generated_clock setting must be made for the new PLL output clock. Set the clock offset to a quarter period, and use this as the phase shift on the PLL output. Also, we need to remove the multicycle settings since the correct edges will be analyzed now without them. The changes to the clock section are shown below in Figure 43. The -source and -master_clock options on the clk_out setting need to change to now use the clk1 PLL output instead of the clk0 output.

Version 2.0

Page 39 of 47

August 2007

Figure 43: SDC File Clock Changes for DDR Center-Aligned Output Figure 44 below shows the output timing and the settings relationships.

Clock Period Clock Period / 2 Output Min Delay (-) Output Max Delay (+) Late Margin Clock Offset Early Margin

clk_out

data_out

Data Valid Window

Data Valid Window


- Output Skew

Data Valid Window

Clock Period / 4

+ Output Skew

Figure 44: DDR Center-Aligned Output Timing Diagram It should be noted that if this center-aligned output were to be connected to an Altera FPGA set up for center-aligned input, that an extra pipeline register stage would have to be added to the dataout_h output of the ALTDDIO_IN instantiation in order to align the two data streams as mentioned previously. In order to avoid this, the center-aligned output would have to use a -90 degree phase shift on the new PLL output. The clock offset would then be set to a negative quarter period instead. Several other changes would also be required of the output SDC file as shown above in Figure 45. The early margin and late margin calculations change to add the clock offset instead of subtract it. Also, with a -90 phase shift instead of a +90 degree phase shift with DDR outputs, we change from same-edge capture to opposite-edge capture, so the set_false_path settings all need to change to the opposite polarity.

Version 2.0

Page 40 of 47

August 2007

Figure 45: SDC Additional Changes for -90 Phase Shift on DDR Output Whether +90 degrees or -90 degrees is used for center-aligned output, the results from running the same Tcl script as in Figure 7 are the same.

Figure 46: DDR Center-Aligned Output Report Slow Models

Version 2.0

Page 41 of 47

August 2007

Figure 47: DDR Center-Aligned Output Report Fast Models CASE 2: PLL Direct Output Clock See CASE 2 of Appendix A. The limitations and methodology of this technique are the same for DDR as they are for SDR. The offset between the clk_out and data_out must be determined, and then this phase shift must be applied to a second PLL output to drive the clk_out. As with the SDC case, due to the lower clock frequency limitation and the larger output skew, this is not the preferred method to use in most cases.

Version 2.0

Page 42 of 47

August 2007

APPENDIX D Alternate Methods for DDR Inputs

The method used in the Tech Note for the DDR case on the input side was to use a PLL clock output for clocking the data_in bus that is center-aligned in the data valid window at the input ports. Two other cases will be considered in this appendix. First, we will examine the case where the data is center-aligned at the input ports. This would be helpful in those cases where a non-Altera device that generates center-aligned data on the output is connected to an Altera FPGA. The second case considered is when the input clk_in directly clocks the input registers without involving a PLL. This might be useful to some customers in situations where they or they have run out of PLLs. CASE 1: Center-Aligned Input With center-aligned clock and data inputs, the only circuit schematic change required is to change the phase shift on the PLL clk0 output to 0 degrees. The SDC file must be modified to take this change into account. There are only 2 minor changes required to the SDC file. The first change is that the -waveform option on the create_clock for the clk_in must be changed to add the quarter period offset. Because we are using the virtual clock to handle the change in clock offset, the equations for the OMD and OmD stay the same. Even though the equations remain the same to get the equivalent slack results, the effective OMD and OmD based on the 90 degree offset on the input are different. The effective values are calculated and output to the console. Secondly, because we are switching from opposite-edge capture to same-edge capture, the set_false_path settings must change to the opposite polarity. These changes are shown below in Figure 48.

Version 2.0

Page 43 of 47

August 2007

Version 2.0

Page 44 of 47

August 2007

Version 2.0

Page 45 of 47

August 2007

Figure 48: SDC File for DDR Center-Aligned Input Figure 49 below shows the output timing and the settings relationships.
Clock Period Output Max Delay (+) Output Min Delay (-) Early Margin

clk_out
Clock Offset

Late Margin - Trace Skew

+ Trace Skew

data_out

Data Valid Window

Data Valid Window

Data Valid Window


- Output Skew + Output Skew

Clock Period / 2 Clock Period / 4 Figure 49: DDR Center-Aligned Input Timing Diagram

The Tcl script used to run in TimeQuest is the same one as in Figure 14.

Version 2.0

Page 46 of 47

August 2007

The results are shown below for the Slow and Fast timing models in Figure 50 and Figure 51 below, and the slacks and skews are the same as with the center-aligned output case.

Figure 50: DDR Center-Aligned Input Report Slow Models

Figure 51: DDR Center-Aligned Input Report Fast Models The slack values are the same as those obtained in the edge-aligned input case. CASE 2: Direct Input Clock without PLL When source synchronous data arrives at the receiving FPGA center-aligned for DDR, using a PLL on the input clock is required in most cases. For this design, in order to meet timing using the fast models without a PLL, the output skew would need to be lowered to 20 ps. This is not possible with a 144-bit bus, but with smaller bus sizes it is obtainable if all data outputs are kept in a single bank. Also, the trace skew would need to be lowered to 0.2 or less (35 ps). This would require hand routing in most cases. Even with the removal of the clock uncertainty, the Quartus II Fitter is currently not capable of doing as good of a job as with a PLL for DDR at 550 MHz.

Version 2.0

Page 47 of 47

August 2007