You are on page 1of 6

Batch Performance

How can parallel processing be implemented without code changes?

Writer: Dean Kirby

Parallel processing can slice more than 85% off the runtime of long running batch processes. The team at Batch Performance has overcome the standard limitations of performance tuning by taking a different approach. We have developed an innovative technique to allow a standard batch program to be parallel processed. The order of execution remains unchanged. No code changes are required. This means that where a single instance of a program took one hour to execute over ten million rows, ten instances of the program can process ten million rows in approximately six minutes. This is an incredibly powerful technique and an excellent tool to use in addition to performance tuning. Given performance tuning is not an exact science; it is not usually possible to predict the level of improvement that will result from making a change. Parallel processing on the other hand provides reliable, predictable and quick results. The solution can be implemented and tested in under a week.

Appendix 1 contains additional examples of performance improvements and related runtimes that can be expected from introducing parallel processing:
Number of processes % Improvement Number of Rows per process 10,000,000 5,000,000 3,333,333 2,500,000 2,000,000 1,666,667 1,428,571 1,250,000 1,111,111 1,000,000 Time to process in hours (example 1) 1:00 0:30 0:20 0:15 0:12 0:10 0:08 0:07 0:06 0:06 Time to process in hours (example 2) 12:00 6:00 4:00 3:00 2:24 2:00 1:42 1:30 1:20 1:12 Time to process in hours (example 3) 24:00 12:00 8:00 6:00 4:48 4:00 3:25 3 2:40 2:24 Time to process in hours (example 4) 36:00 18:00 12:00 9:00 7:12 6:00 5:08 4:30 4:00 3:36

1 2 3 4 5 6 7 8 9 10

0% 50% 67% 75% 80% 83% 86% 88% 89% 90%

Appendix 2 shows number of rows processed by each instance

Appendix 3 shows the total time to process the load as additional parallel processes are added.

The team at Batch Performance specializes in significantly reducing batch program runtimes without changing any of the code or business logic. This is achieved using a custom external schema. The external schema is necessary to facilitate a separation of concerns between what data the program has access to and the underlying logical table. The external schema is used to create virtual segments on specific database tables. The segmentation criteria need to be tailored to the program that uses it while allowing for uniform distribution of data. Each segment 3

contains different rows. Once the segments are created it is then possible to run multiple instances of the batch program in parallel, each over a different segment. The batch program remains unchanged as does the underlying schema and tables. If no transaction control is built into the application then it will be necessary to use the Batch Performance transaction control module. This aides in resolving problems related to contention and provides high availability. The percentage improvement is impacted (+/-) by the following: 1. 2. 3. 4. 5. Use of the transaction control module The complexity of segmentation Hardware performance Network performance Database performance

Appendix 4 Normal processing

Appendix 5 Parallel processing

Parallel processing leverages spare capacity (available RAM, CPU, virtual memory, etc) on the server. Large amounts of spare capacity facilitate high numbers of parallel process. This 4

technique is specifically suited to environments that have spare capacity. It is often not necessary to have large amounts of available resources to use this approach. Even running a minimum of two balanced parallel processes should approximately half the overall runtime. Further processes can be added until performance reaches the desired level. The approach facilitates efficiency gains where they were not previously possible. The batch job can then do more work in a smaller timeframe. Parallel processing technique is equally effective where the system has already been tuned by the system administrators and software vendor. Collaborative performance tuning activity can often get the program to run within the vendors performance benchmark. At that stage the program is running at or near terminal velocity. If there are no major constraints then it is unlikely that the program will run much faster without a rewrite. Where it is determined that the process is CPU bound, the vendor will usually recommend upgrading hardware to further improve performance. Taking either approach is expensive and time consuming. In complex systems the exact performance gain from a hardware upgrade is often not quantifiable. The parallel processing approach offers an immediate solution and reliable alternative. Halving the load has the effect of halving the runtime. Multiple instances of the program operate over different segments and permit the work to be completed concurrently. More work is done in a shorter amount of time. The Batch Performance approach to parallel processing introduces scalability into processes that arent currently scalable. It not only corrects the immediate performance problem but allows for future increases in load while maintaining or shrinking the batch window. This is particularly useful if load is expected to grow. It can also be pivotal in facilitating a cost cutting strategy for consolidation of environments.

Appendix 6 demonstrates the principle: A single instance of the program processes 10,000,000 rows in 1 hour

Running process in parallel means that each shares the work. So if 2 processes are used 10,000,000 rows can be processed in 30 minutes.

Batch performance problems have plagued operational teams and software vendors for as long as batch processing has existed. Failures in end of day jobs often have a knock on effect for the online day, thus impacting the business. Downtime is costly in terms of productivity, lost revenue and can lead to financial penalties if service level agreements are breached. Reducing the processing time in the batch window provides a larger window of opportunity for recovering batch failures. It also helps to ensure that the batch can process quickly once the problem is resolved. Quicker batch processes reduce risk of prolonged downtime to the business. The solution also comes with a fully configurable module that can be used in conjunction with the external schema to provide transaction level control of the batch program. This means that transaction size can be regulated and tuned outside of the batch program. The main purpose of the module is to reduce locking contention across processes. Regular commits minimize the chances of row, page or block level contention. It also comes with some high availability features including extensible failure detection and retry control. The failure detection and retry control are fully parameterized. Any known issues that cause the program to terminate can be detected and the associated recovery procedure executed automatically before the module triggers the batch program to reprocess the failed transaction. The module checks information written to standard output, standard error and the programs exit condition. Failure detection with automated recovery and retry functionality is useful for almost all batch programs. It is possible to use these features without using the transaction control functionality. Benefits include high availability, lowers risk to the business, fewer call outs to recover the end of day jobs and a more stable system. The technique is proven to work with: Oracle, SQL Server, DB2, MySQL and Sybase. The approach can be evaluated against other databases as well. This solution is suitable for both companies that run batch processes and software vendors that are interested in significantly improving the performance of their batch for future software releases.