You are on page 1of 7

White paper On INFORMATICA PERFORMANCE TUNNING

PERFORMANCE TUNNING IN INFORMATICA


Performance tuning in Informatica1

The goal of performance tuning is optimize session performance so sessions run during the available load window for the Informatica Server. Increase the session performance by following1) Performance of the Informatica Server is related to network connections. Data generally moves across a network at less than 1 MB per second, whereas a local disk moves data five to twenty times faster. Thus network connections often affect on session performance. So avoid network connections. 2) Flat files: If your flat files stored on a machine other than the informatica server, move those files to the machine that consists of informatica server. 3) Relational data sources: Minimize the connections to sources, targets and informatica server to improve session performance. Moving target database into server system may improve session performance. 4) Staging areas: If you use staging areas you force informatica server to perform multiple data passes. Removing of staging areas may improve session performance. 5) You can run the multiple informatica servers against the same repository. Distributing the session load to multiple informatica servers may improve session performance. 6) Run the informatica server in ASCII data movement mode improves the session performance. Because ASCII data movement mode stores a character value in one byte. Unicode mode takes 2 bytes to store a character. 7) If a session joins multiple source tables in one Source Qualifier, optimizing the query may improve performance. Also, single table select statements with an ORDER BY or GROUP BY clause may benefit from optimization such as adding indexes. 8) We can improve the session performance by configuring the network packet size, which allows data to cross the network at one time. To do this go to server manger, choose server configure database connections. 9) If your target consist key constraints and indexes you slow the loading of data. To improve the session performance in this case drop constraints and indexes before u run the session and rebuild them after completion of session. 10) Running a parallel session by using concurrent batches will also reduce the time of loading the data. So concurrent batches may also increase the session performance. 11) Partitioning the session improves the session performance by creating multiple connections to sources and targets and loads data in parallel pipe lines. 12) In some cases if a session contains an aggregator transformation, you can use incremental aggregation to improve session performance.

13) Avoid transformation errors to improve the session performance. If the session contains lookup transformation you can improve the session performance by enabling the look up cache. 14) If your session contains filter transformation, create that filter transformation nearer to the sources or you can use filter condition in source qualifier. 15) Aggregator, Rank and joiner transformation may often decrease the session performance, because they must group data before processing it. To improve session performance in this case use sorted ports option.

Improving Mapping Performance in InformaticaMapping optimization: The best time in the development cycle is after system testing. Focus on mapping-level optimization only after optimizing the target and source databases. Use Session Log to identify if the source, target or transformations are the performance bottleneck Identifying Target Bottlenecks: The most common performance bottleneck occurs when the Informatica Server writes to a target database. You can identify target bottlenecks by configuring the session to write to a flat file target. If the session performance increases significantly when you write to a flat file, you have a target bottleneck. Tasks to be performed to increase performance: * Drop indexes and key constraints. * Increase checkpoint intervals. * Use bulk loading. * Use external loading. * Increase database network packet size. * Optimize target databases. Identifying Source Bottlenecks: If the session reads from relational source, you can use a filter transformation, a read test mapping, or a database query to identify source bottlenecks: * Filter Transformation - measure the time taken to process a given amount of data, then add an always false filter transformation in the mapping after each source qualifier so that no data is processed past the filter transformation. You have a source bottleneck if the new session runs in about the same time. * Read Test Session - compare the time taken to process a given set of data using the session with that for a session based on a copy of the mapping with all transformations after the source qualifier removed with the source qualifiers connected to file targets. You have a source bottleneck if the new session runs in about the same time. 3

* Extract the query from the session log and run it in a query tool. Measure the time taken to return the first row and the time to return all rows. If there is a significant difference in time, you can use an optimizer hint to eliminate the source bottleneck Tasks to be performed to increase performance: * Optimize the query. * Use conditional filters. * Increase database network packet size. * Connect to Oracle databases using IPC protocol. Identifying Mapping Bottlenecks If you determine that you do not have a source bottleneck.

How to Increase Informatica Server Performance:


Many factors can affect session performance. Here are some pointsBefore doing tuning that is specific to Informatica: 1. Check hard disks on related machines. (Slow disk access on source and target databases, source and target file systems, as well as the Informatica Server and repository machines can slow session performance.) 2. Improve network speed. (Slow network connections can slow session performance.) 3. Check CPUs on related machines (make sure the Informatica Server and related machines run on high performance CPUs.) 4. Configure physical memory for the Informatica Server to minimize disk I/O. (Configure the physical memory for the Informatica Server machine to minimize paging to disk.) 5. Optimize database configuration 6. Staging areas. If you use a staging area, you force the Informatica Server to perform multiple passes on your data. Where possible, remove staging areas to improve performance. 7. You can run multiple Informatica Servers on separate systems against the same repository. Distributing the session load to separate Informatica Server systems increases performance. Informatica specific: - Transformation tuning - Using Caches 4

- Avoiding Lookups by using DECODE for smaller and frequently used tables - Applying Filter at the earliest point in the data flow etc.

Informatica PowerCenter Partitioning Option


Delivering High Performance for Processing Massive Data Volumes The PowerCenter Partitioning Option increases the performance of PowerCenter through parallel data processing, and it has been instrumental in establishing PowerCenters industry performance leadership. This option provides a thread-based architecture and automatic data partitioning that optimizes parallel processing on multiprocessor and grid-based hardware environments.

Partitioning Option Key Features Data Smart Parallelism: Automatically aligns PowerCenter partitions with database table partitions to improve performance. Automatically guarantees data integrity by leveraging the parallel engine of 5

PowerCenter, which dynamically realigns data partitions for set-oriented transformations. Session Design Tools: Create user-defined partitioning schemes quickly and easily Provide a graphical partitioning map for determining the best partitioning points Gather statistics on configurable session options, such as error handling, recovery strategy, memory allocation, and logging, to maximize performance. Integrated Monitoring Console: Gathers session statistics, such as throughput, rows/second, error details, and performance optimizations, to identify potential bottlenecks and recognize trends Shows all session execution and dependency details. Multiple Partition Schemes: Support parallelization through multiple mechanisms, including key range, hash algorithm-based, round robin, or file partitions Maximize data throughput via concurrent processing of specified partitions along the data transformation pipeline. Partitioning Option Benefits: Scale Cost-Effectively to Handle Large Data Volumes: With the Partitioning Option, you can execute optimal parallel sessions by dividing data processing into subsets that are run in parallel and spread among available CPUs in a multiprocessor system. When different processors share the computational load, large data volumes can be processed faster. When sourcing and targeting relational databases, the Partitioning Option enables PowerCenter to automatically align its partitions with database table partitions to improve performance. Unlike approaches that require manual data partitioning, data integrity is automatically guaranteed because the parallel engine of PowerCenter dynamically realigns data partitions for set-oriented transformations (e.g., aggregators or sorters). Enhance Developer Productivity: The Partitioning Option provides intuitive, GUI-based, session design tools that reduce the time spent on initial and ongoing configuration and performance tuning tasks. You can easily create user-defined partitioning schemes. A graphical partitioning map helps you determine the best points of partitioning. Configurable session options, such as error handling, recovery strategy, memory allocation, and logging, make it easier to gather statistics used to maximize performance. Optimize System Performance in Response to Changing Business Requirements: The Partitioning Option lets you easily gather in-depth session statistics such as throughput, rows/second, error details, and performance optimizations. These statistics help you identify potential bottlenecks and recognize trends. An integrated monitoring console lets you view all session execution and dependency details. With the metadatadriven architecture of PowerCenter, data transformation logic is abstracted from the physical execution plan. This feature enables rapid performance tuning without 6

compromising the logic and design of the original data mappings. You can continually and easily optimize system performance in the face of increasing data loads and changing business requirements.

Conclusion
The goal of performance tuning is optimize session performance so sessions run during the available load window for the Informatica Server. Informatica is a leading provider of enterprise data integration software and services. With Informatica, organizations can gain greater business value by integrating all their information assets from across the enterprise. Thousands of companies worldwide rely on Informatica to reduce the cost and expedite the time to address data integration needs of any complexity and scale.

You might also like