Fast Simulation Techniques in FPGA

Uma Rajchandani#1, Ashwani Kumar*2

M.Tech VLSI, S.G.V.U. Jaipur,India

*M.Tech VLSI, S.G.V.U. Jaipur,India

ABSTRACT Advancement of technology has transformed big and complex circuit boards into small and simple Integrated Chips (ICs). ICs have surpassed circuit boards in every field. Be it their small size, low cost, higher speed and reliability. This paper describes fast simulation methodology that can produce simulators that (i) are orders of magnitude faster than comparable simulators, (ii) are cycle- accurate, (iii) a functional model that simulates the functionality of the computer system using FPGA. A method for automatic multi partitioning of a multiple-output logic function into the smallest number of sub functions for mapping to fixed-size .Power consumption and delays play an important role in extending the architecture to complex designs, Implementation of larger designs leads to same difficulty as that of discrete component. We describe a prototype FAST system: a fullsystem, RTL-level cycle-accurate-capable computer system simulator. KEYWORDS: Field-programmable Gate Array (FPGA), Programmable logic array (PLA), Register transfer level (RTL). 1. INTRODUCTION One of the biggest challenges that FPGA design and simulation engineers face today is time and resource constraints. With FPGAs growing in speed, density and complexity, there is a lot of taxation not only on manpower but also on computer processors and available memory to complete a full timing simulation. Furthermore there is an escalating challenge for the design and verification engineer to get proper testing of today’s FPGA designs in shorter timeframes with an increased confidence of first-pass success. simulation is the primary tool used for verifying the logical correctness of a hardware design. In many cases simulation is the first activity performed in

the process of taking a hardware design from concept to realization. With the increasing design size of chip, simulation is not sufficiently fast to accommodate Performance evaluation with realistic benchmarks. This poses the following requirements on the design of the performance model: (1) The simulation speed of the original HDL model needs to be high enough to allow fast functional simulation and (2) the speed of the design synthesized from this model should be acceptably high to support the simulation of large benchmarks. This paper examines the specify problems of speeding up partial dynamic reconfiguration of a fine grain FPGA. The time taken to perform reconfiguration depends on a number of factor .the numbers of resources to be configured, off chip-configuration bandwidth, granularity of the configuration memory and the configuration memory organization. The importance of the first three factors to configuration time is obvious .The organization of the configuration memory is important, since can adversely after the expected linear relationship between the number of resources being configured and the amount of the data that must be loaded into the device. It configuration bit controlling unrelated resources are contained in the same memory location ,then there is a high likelihood that with a small change to one area of the fabrics, a disproportionately large number of memory locations will need to be written in order to bring about the change. 2. SIMULATION TECHNIQUES FPGAs need simulation techniques in order to ensure that designs work and continue to work fig 1. FPGA designs are growing in complexity and the traditional verification methodologies are no longer sufficient. In the past, simulation was not an important stage in the FPGA design flow. Currently, however, it is becoming one of the most critical. simulation is especially important when

designing with the more advanced Simulation techniques of FPGA are:-






2.1 STATIC TIMING ANALYSIS /FORMAL VERIFICATION: Static Timing Analysis techniques were originally employed for gate-level event-driven simulations to verify both functionality and timing of internal logic. Because of the growth in gate count availability these analysis techniques have become valuable for functional simulation of FPGA internal logic as well. Most engineers see this as the only analysis needed to verify that the design meets timing. There are a lot of drawbacks to using this as the only timing analysis methodology. Static analysis cannot find any of the problems that can be seen when running a design dynamically. This analysis will only be able to show if the design as a whole can meet setup and hold requirements and generally is only as good as the timing constraints applied. In a real system, dynamic factors can cause timing violations on the FPGA. 2.2 FUNCTIONAL SIMULATION: Functional simulation is a very important part of the verification process, but it should not be the only part. When doing a functional simulation it will only test for the functional capabilities of the RTL design. It does not include any timing information, nor does it take into consideration changes done to the original design due to implementation and optimization.

In integrated circuit design, register transfer level (RTL) fig 2 is a level of abstraction used in describing the operation of a synchronous digital circuit. In RTL design, a circuit's behavior is defined in terms of the flow of signals (or transfer of data) between hardware registers and the logical operations performed on those signals. Register transfer level abstraction is used in hardware description languages (HDLs) like Verilog and VHDL to create high-level representations of a circuit, from which lower-level representations and ultimately actual wiring can be derived.

Fig 2. RTL schematic

2.2.2 GATE LEVEL SIMULATION: Gate level simulation is used in the late design cycle to increase the level of confidence about a design implementation and can help to verify dynamic circuit behavior that cannot be accurately verified with static methods. 3. CHALLENGES AHEAD We now come to the crux of the problem: simulators simply do not execute fast enough. It is easy to see that a simulator which simulates your design at 1 cycle per second will take a very long time to run a test of a million cycles. Using specialpurpose hard ware, like hardware emulators, is very expensive and not very flexible --making changes and re-running takes a long time. Simulators are wonderfully consistent they produce the same results every time and the stimulus is 100% repeatable. Not so in the real world where things can drift based on temperature or other physical factors or unpredictable delays inserted into some part of the process. So this kind of verification is unique to the rapid prototyping world. One way to do this would be to route a few interesting signals to the external pins of the device, but there are two problems here. The first is that many designs are pin-constrained, and the number of interesting signals could be quite large. The second problem is that if the board has not been designed carefully, deciding which pins can

Fig 1. FPGA simulation process

perform this debug function, some of the time, and be accessible is not a trivial task. Of course, if they are used for this purpose 100% of the time, then that can be taken care of more simply. So instead we could think about on-chip instrumentation fig 3. On-chip instrumentation is special purpose logic built into the device that can capture internal activity for replay at a later time. So, just like a logic analyzer, we need some triggers to decide when to start capturing the data and we need some memory to hold the data and a mechanism to be able to access that data.

companies will put in a larger than necessary FPGA on a few debug systems and make the production versions with smaller FPGAs inserted. 4. SPEED STRATEGIES To improve your system performance following methods can be applied globally to the entire design.

Type of VHDL DesignVerilog simulation Runtime / Design Simulation Runtime / Memory Simulation Memory Full RTL 6.4 minutes /18.1 minutes / Simulation 28.8 MB 26 MB Full Timing 176.9 186.2 minutes / Simulation minutes / 742 775 MB MB Timing 7.7 minutes /28.0 minutes / simulation of 35.8 MB 112 MB subsection Full simulation, 13.8 minutes /48.9 minutes / timing only 56 MB 134 MB on subsection
Table 1: Runtimes and memory usage for different styles of simulation FPGA designs

Fig 2.Internal logic diagram of FPGA

Almost all systems these days contain a JTAG port and it is fairly simple to connect this debug logic into the JTAG system. Through this port, the debug instrumentation can be configured, controlled, and used to stream the data out. In order to gain access to the internal data, the system is stopped and the scan chains fed out of the system along with the captured data from the debug system. So this is very static in nature. Set up the test – capture the data, stop the test, look at and analyze the data. The next issue to look at is memory. While FPGAs have quit a lot of memory there are many designs that use large percentages of this. So with limited memory the amount of debug data that can captured is also limited and this has to be a tradeoff between the number of signals captured and the depth of the trace. An external logic analyzer may be able to capture millions of vectors, but this will not be possible on chip. Using external memory will slow the process down such that it may not be possible to capture data and real time. These let you select what signals are of interest, triggering, and sampling logic and may also contain compression logic to help maximize the use of the available memory. While this takes extra logic, this is not something that needs to be there for every system that is shipped, so a lot of

4.1. QUICK METHODS A) OPTIMIZATION EFFORT: By default, XST synthesizes designs using Normal optimization effort. However, setting the effort to High may improve speed up to five percent, at the cost of increased runtime. To apply the OPT_EFFORT constraint, set the Optimization Effort Synthesis Option. B) REGISTER BALANCING: Use register balancing to improve speed at the cost of increased area. If your design no longer fits after using register balancing, make a precise timing analysis of your design and apply register balancing only on the most critical clocks or regions of your design. To use the REGISTER_BALANCING constraint, you must set the Register Balancing. C) CONVERT TRISTATES TO LOGIC: If you target an architecture that supports internal tristates and your design has tristate inferences, convert the tristates to logic. The replacement of internal tristates by logic usually leads to an increase in speed and area. However, this

replacement can lead to an area reduction, because the logic generated from tristates can be combined and optimized with surrounding logic. D) RESOURCE SHARING: In most cases, resource sharing improves area and speed results. However, for some designs, disabling resource sharing can improve speed up to 10 percent. To disable the RESOURCE_SHARING synthesis constraint, disable the Resource Sharing HDL Option. 4.2. IN-DEPTH METHODS A) CHECK HDL ADVISOR MESSAGES: These messages help you improve your design. For example, if you place a KEEP constraint on a net and this constraint prevents XST from improving design speed, the XST HDL Advisor points out this limitation. Monitor these HDL Advisor messages in the Console tab of the Project Navigator Transcript window, or double-click View Synthesis Report. For more information on the Console tab, see Using the Console, Errors, and Warnings Tabs. B) CHECK THE USE OF FPGA-SPECIFIC RESOURCES: Check the use of resources, such as block versus distributed RAM and LUT-based versus hardware multipliers. For example, if your critical path goes through a multiplier and the multiplier is implemented using a MULT18X18 primitive, you can increase the speed by changing the implementation to a LUT structure and pipelining it. C) ADJUST SLICE UTILIZATION RATIO: The SLICE_UTILIZATION_RATIO constraint, which is set to 100 percent for the entire design by default, controls the amount of logic and register replication that takes place during timing optimization. For example, if you specify a ratio of 50 percent for one of the blocks in your design, but XST detects that the actual ratio is 48 percent, XST performs timing optimization until timing constraints are met or until the 50 percent limit is reached. If the timing is not met, but the ratio limit is reached, decrease the ratio limit to see if it is possible to meet the timing constraints. If the timing constraints are met after decreasing the ratio, find a way to reduce the area for less critical blocks to allow greater area for the critical block. To apply this constraint, use the Slice Utilization Ratio Synthesis Option. D) ADJUST MAX FAN-OUT: The value set for the MAX_FANOUT synthesis constraint controls logic replication. If the critical path goes through a net with a high fan-out, XST

replicates the logic or inserts a buffer. In general, this improves the speed of the design, but the logic replication for this net may be excessive or insufficient. You can apply a different maximum fan-out value to a particular net to force XST to further replicate or reduce the net to improve performance. To apply this constraint, set the MAX_FANOUT constraint on a specific signal in the HDL code. E) REGISTER PARTITION BOUNDARIES WHEN USING INCREMENTAL SYNTHESIS: When you use incremental synthesis to divide your design into several partitions, XST cannot perform efficient optimization across partition boundaries, which leads to less-than-optimal results. When using incremental synthesis, register the boundaries for each of the partitions. This minimizes the impact on optimization. For example, if the critical path goes through two partitions, XST must preserve hierarchy and cannot optimize across the partitions. In this case, use registers to separate these blocks or to change your design partitions. F) REDUCE AREA: If your target device is nearing capacity, the placer and router may have problems finding efficient routing, and may have problems meeting timing objectives. When the route delay is significantly higher than the logic delay in the Timing Reports, this indicates such a problem. Reducing area may free more routing and logic placement resources to help meet speed requirements. 5. CONCLUSION As designs get more complex, simulation speed will become the overriding consider ration for selecting an HDL and simulator. So in this paper we have described the fast simulation methodologies for advanced simulation with a technology that is currently available. This is by no means a revolutionary methodology but one that either most designers are not fully aware of or fully understand. These are techniques that have been used in the past for different types of simulation and verification, but may not have been used to their full potential. Using simulation can have an immense effect on how much time and effort it takes to completely verify a design. Hopefully, with the aid of this paper, it is possible to accomplish faster and more efficient simulation. 6. ACKNOWLEDGEMENT I am heartily thankful to my coordinator Asso. Prof. Sujeet Gupta (S.G.V.U. Jaipur, India), whose encouragement, guidance and support from the initial to the final level enabled us to develop an understanding of the subject.

1. D. Chatterjee, A. DeOrio, and V. Bertacco. High-performance Gate-level simulation with GP-GPUs. In Proc. DATE, 2009. 2. H. Kubota, Y. Tanji, T. Watanabe, and H. Asai, “Generalized method of the time-domain circuit simulation Proc. CICC 2005. 3. Wolkotte P.T., Holzenspies P.K.F. and Smit G.J.M. Fast, Accurate and Detailed NoC Simulations In Proceedings of the First International Symposium on Networks-on-Chip. 4. Y.-I. Kim, W. Yang, Y.-S. Kwon, and C.-M. Kyung. efficient hardware acceleration for fast functional simulation. Proc. DAC, 2004. 5. Z. Barzilai, J. Carter, B. Rosen, and J. Rutledge. HSS–a highspeed simulator. IEEE Trans., 1987. 6. Derek Chiou, Huzefa Sunjeliwala, Dam Sunwoo, John Xu, and Nikhil Patil. FPGA-based Fast, Cycle-Accurate, FullSystem Simulators. Number UTFAST-2006-01, Austin, TX, 2006. 7. T. Suh, H.-H. S. Lee, S.-L. Lu, and J. Shen. Initial Observations of Hardware/Software Co-Simulation using FPGA in Architectural Research. 8. Derek Chiou, Dam Sunwoo, Joonsoo Kim, Nikhil Patil, William H. Reinhart, D. Eric Johnson and Zheng Xu The FAST methodology for high-speed SoC/computer simulation. In Proceedings of the 2007 IEEE. 9. P. S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hogberg, F. Larsson, A. Moestedt, and B. Werner. Simics: A full system simulation platform. IEEE Computer, February 2002.

Sign up to vote on this title
UsefulNot useful