You are on page 1of 37

Performance Tuning

Informatica PowerCenter (Version 8.6.1) Jishnu Pramanik

1.Performance Tuning Overview

1.1 Overview
The goal of performance tuning is to optimize session performance by eliminating performance bottlenecks. To tune session performance, first identify a performance bottleneck, eliminate it, and then identify the next performance bottleneck until you are satisfied with the session performance. You can use the test load option to run sessions when you tune session performance. If you tune all the bottlenecks, you can further optimize session performance by increasing the number of pipeline partitions in the session. Adding partitions can improve performance by utilizing more of the system hardware while processing the session.

1.2 Need for Performance Tuning
Performance is not just one job loading maximum data in a particular time frame. Performance can be more accurately defined as a combination of several small jobs which affect the overall performance of a system. Informatica is an ETL tool with high performance capability. We need to make maximum utilization of its features to increase its performance. With the ever increasing user requirements and exploding data volumes, we need to achieve more in less time. The goal of performance tuning is optimize session performance. This document lists all the techniques available to tune Informatica performance.

2.Identifying Bottlenecks

2.1 Overview
Performance of Informatica is dependent on the performance of its several components like database, network, transformations, mappings, sessions etc. To tune the performance of Informatica, we have to identify the bottleneck first. Bottleneck may be present in source, target, transformations, mapping, session, database or network. It is best to identify performance issue in components in the order source, target, transformations, mapping and session. After identifying the bottleneck, apply the tuning mechanisms in whichever way they are applicable to the project.

2.2 Identify bottleneck in Source
If source is a relational table, put a filter transformation in the mapping, just after source qualifier; make the condition of filter to FALSE. So all records will be filtered off and none will proceed to other parts of the mapping. In original case, without the test filter, total time taken is as follows:Total Time = time taken by (source + transformations + target load) Now because of filter, Total Time = time taken by source So if source was fine, then in the latter case, session should take less time. Still if the session takes near equal time as former case, then there is a source bottleneck.

2.3 Identify bottleneck in Target
The most common performance bottleneck occurs when the Integration Service writes to a target database. To identify a target bottleneck, configure a copy of the session to write to a flat file target. If the session performance increases significantly when you write to a flat file, you have a target bottleneck. If a session already writes to a flat file target, you probably do not have a target bottleneck.

2.4 Identify bottleneck in Transformation
Remove the transformation from the mapping and run it. Note the time taken. Then put the transformation back and run the mapping again. If the time taken now is significantly more than previous time, then the transformation is the bottleneck. But removal of transformation for testing can be a pain for the developer since that might require further changes for the session to get into the ‘working mode’. So we can put filter with the FALSE condition just after the transformation and run the session. If the session run takes equal time with and without this test filter, then transformation is the bottleneck.

Optimizing the Target . You can view more detailed performance information by using the Performance Monitor on Windows. ♦Pages/second : If pages/second is greater than five.560945] secs. you can view the Performance and Processes tab in the Task Manager. If the processors are utilized at more than 80%. ♦Physical disks queue length : The number of users waiting for access to the same disk device. If physical disk queue length is greater than two. you may consider adding more processors. ♦Physical disks percent time : The percent of time that the physical disk is busy performing read or write requests. 3.6 Identifying System Bottlenecks on Windows On Windows.764368] secs. MASTER> PETL_24019 Thread [TRANSF_1_1_1_1] created for the transformation stage of partition point [SQ_test_all_text_data] has completed: Total Run Time = [11. you may have excessive memory pressure (thrashing).000000]. add another disk device or upgrade to a faster disk device: You can also use a separate disk for each partition in the session. You may consider adding more physical memory. you may consider adding another disk device or upgrading the disk device.304876]. If busy percentage is 100. writer. The Performance tab in the Task Manager provides an overview of CPU usage and total memory used. then that part is the bottleneck. tune the cache for PowerCenter to use in-memory cache instead of writing to disk. and choose Performance Monitor. To access the Performance Monitor. You also can use separate disks for the reader.‘Properties’ tab) is enabled. requests are still in queue.2. all the performance related information would appear in the log created by the session. Use the Windows Performance Monitor to create a chart that provides the following information: ♦Percent processor time : If you have more than one CPU. If you tune the cache. You can use this information to improve network bandwidth. Once the ‘Collect Performance Data’ option (In session. To access the Task Manager. monitor each CPU for percent processor time. click Start > Programs > Administrative Tools. If the percent of time is high. and click Task Manager. and transformation threads. Total Idle Time = [0. Busy Percentage = [18.703201] secs. and the disk busy percentage is at least 50%. Basically we have to rely on thread statistics to identify the cause of performance issues. 2. Session logs contain thread summary records like the following:MASTER> PETL_24018 Thread [READER_1_1_1] created for the read stage of partition point [SQ_test_all_text_data] has completed: Total Run Time = [11. Busy Percentage = [100. Total Idle Time = [9.5 Identify bottleneck in sessions We can use the session log to identify whether the source. target or transformations are the performance bottleneck.000000] secs. ♦Server total bytes per second : This is the number of bytes the server has sent to and received from the network. press use Ctrl+Alt+Del.

-Use pre-session and post-session SQL commands. Configure bulk loading in the session properties. you increase the likelihood that the database performs checkpoints as necessary. When you increase the database checkpoint interval. weigh the importance of improved session performance against the ability to recover an incomplete session. Increasing the commit interval reduces the number of bulk load transactions. define a large commit interval to increase performance. When bulk loading. you may not be able to perform recovery. the Integration Service bypasses the database log. Oracle. instead of performing other tasks. the target database cannot perform rollback. which speeds performance. When you use bulk loading. you can optimize session performance by writing to a flat file target that is local to the Integration Service process node. ♦Use bulk loading : You can use bulk loading to improve the performance of a session that inserts a large amount of data into a DB2. To improve performance. Microsoft SQL Server and Oracle start a new bulk load transaction after each commit. drop indexes and key constraints before running the session.1 Overview You can optimize the following types of targets: ♦Flat file ♦Relational 3.3. If the Integration Service runs on a single node and the session writes to a flat file target. you can optimize session performance by ensuring that the shared storage directory is on a machine that is dedicated to storing and managing files.2 Flat File Target If you use a shared storage directory for flat file targets. when the size of the database log file reaches its limit. Without writing to the database log. As a result. ♦Increase checkpoint intervals : The Integration Service performance slows each time it waits for the database to perform a checkpoint. or Microsoft SQL Server database. When bulk loading to Microsoft SQL Server or Oracle targets. If you decide to drop and rebuild indexes and key constraints on a regular basis. you slow the loading of data to those tables. 3. Sybase ASE. you can perform the following tasks to increase performance: ♦Drop indexes and key constraints : When you define key constraints or indexes in target tables. consider increasing the database checkpoint interval. however. To increase performance. You can rebuild those indexes and key constraints after the session completes. you can use the following methods to perform these operations each time you run the session: -Use pre-load and post-load stored procedures. which increases performance .3 Relational Target If the session writes to a relational target.

The DB2 EEE external loader uses the DB2 Autoloader utility. ♦Increase database network packet size : If you write to Oracle. you can use the Oracle SQL*Loader utility to bulk load target files. preferably on different disks. the databases uses rollback or undo segments during loads. Make sure the redo log size and buffer size are optimal. For Sybase ASE or Microsoft SQL Server. If the Sybase IQ database is local to the Integration Service process on the UNIX system. To use the Teradata external loader utility. When you write to an Oracle database. To improve session performance. you must also change the packet size in the relational connection object in the Workflow Manager to reflect the database server packet size. you can use the DB2 EE or DB2 EEE external loaders to bulk load target files. The database should also store table and index data in separate tablespaces. if necessary. ♦Optimize Oracle target databases : If the target database is Oracle. the deadlock only affects targets in the same target connection group. configure the Integration Service to check resources. You can view redo log properties in the init. MaxSessions. Microsoft SQL Server targets. you can improve the performance by increasing the network packet size.ora file. you can increase performance by loading data to target tables directly from named pipes. and rollback or undo segments. preferably on different disks. space allocation. When you write to Oracle databases. such as Error Limit. in the Workflow Manager. you can use the Sybase IQ external loader utility to bulk load target files. Consult your database documentation for information about how to increase the packet size. If the target database runs on Sybase IQ. If you run the Integration Service on a grid.ora and tnsnames. Tenacity. To use a different target connection group for each target in a session. Consult your database documentation for additional information about increasing the packet size. Increase the network packet size to allow larger packets of data to cross the network at one time. The Integration Service still writes to targets in other target connection groups. Ask the Oracle database administrator to ensure that the database stores rollback or undo segments in appropriate tablespaces. set up the attributes. Increase the network packet size based on the database you write to: -Oracle. you can use the Teradata external loader utility to bulk load target files. The DB2 EE external loader uses the Integration Service db2load utility to load data. You can specify the same connection information for each connection name. You can increase the database server network packet size in listener. you can optimize the target database by checking the storage clause. you can increase performance if you create the Oracle target table with the same number of partitions you use for the pipeline. The rollback or undo segments should also have appropriate storage clauses. Encountering deadlocks can slow session performance. to optimize performance. Sybase ASE or. Make sure that tables are using large initial and next values. assign the Sybase IQ resource to the applicable sessions. make Sybase IQ a resource. check the storage clause for database objects. ♦Minimize deadlocks : If the Integration Service encounters a deadlock when it tries to write to a target. you can increase the number of target connection groups the Integration Service uses to write to the targets in a session. use a different database connection name for each target instance. make the resource available on all nodes of the grid. and then. If you have a Teradata target database. . and Sleep. The Oracle database uses the redo log to log loading operations.♦Use external loading : You can use an external loader to increase session performance. -Sybase ASE and Microsoft SQL.ora. You can optimize the Oracle database by tuning the Oracle redo log. If the target database runs on Oracle. If you have a DB2 EE or DB2 EEE target database. When you load data to an Oracle database using a pipeline with multiple partitions.

then try not to use synonyms or aliases. Drop constraints and indexes of the table before loading.4 Tips and Tricks • • • • If the target is a flat file. Use bulk load whenever possible.If the Integration Service runs on a single node and the Oracle instance is local to the Integration Service process node. you can optimize performance by using IPC protocol to connect to the Oracle database. Increase the commit level. You can set up Oracle database connection in listener.Optimizing the Source 4. 3. If target is a relational table.ora. 4.1 Overview .ora and tnsnames. ensure that the flat file is local to the Informatica server.

4 Increasing Database Network Packet Size If you read from Oracle. 4. rather than returning all rows at once.2 Optimizing the Query If a session joins multiple source tables in one Source Qualifier. ♦Increase database network packet size ♦Connect to Oracle databases using IPC protocol. Sybase ASE or. Increase the network packet size based on the database you read from: . Usually. 4. ♦Use conditional filters. This allows the Integration Service to process rows parallel with the query execution. For example. see the database documentation. ♦Create tempdb to join Sybase ASE or Microsoft SQL Server tables. However. use the SQL override option to take full advantage of these modifications. you might know properties about the source tables that the database optimizer does not. Also. You can test the session with both the database filter and the PowerCenter filter to determine which method improves performance. The database administrator can create optimizer hints to tell the database how to execute the query for a particular set of source tables. Use optimizing hints if there is a long delay between when the query begins executing and when PowerCenter receives the first row of data. you can improve the performance by increasing the network packet size. ♦Use the FastExport utility to extract Teradata data. Whether you should use the PowerCenter conditional filter to improve performance depends on the session. and then create optimizer hints and indexes for the source tables. For more information about configuring parallel queries. Microsoft SQL Server sources. you might be able to improve performance by optimizing the query with optimizing hints. the database optimizer determines the most efficient way to process the source data. 4. if multiple sessions read from the same source simultaneously. The query that the Integration Service uses to read data appears in the session log. You can use the PowerCenter conditional filter in the Source Qualifier to improve performance. single table select statements with an ORDER BY or GROUP BY clause may benefit from optimization such as adding indexes. However. Configure optimizer hints to begin returning rows as quickly as possible. Have the database administrator analyze the query. Increase the network packet size to allow larger packets of data to cross the network at one time. Once you optimize the query.If the session reads from a relational source. the PowerCenter conditional filter may improve performance. some sessions may perform faster if you filter the source data on the source database. You can also find the query in the Source Qualifier transformation. You can also configure the source database to run parallel queries to improve performance.3 Using Conditional Filters A simple source filter on the source database can sometimes negatively impact performance because of the lack of indexes. Queries that contain ORDER BY or GROUP BY clauses may benefit from creating an index on the ORDER BY or GROUP BY columns. review the following suggestions for improving performance: ♦Optimize the query.

7 Using tempdb to Join Sybase or Microsoft SQL Server Tables When you join large tables on a Sybase or Microsoft SQL Server database. 4. To use FastExport. 4.8 Tips and Tricks • • • • If the source is a flat file. You can set up an Oracle database connection in listener. 5.ora. Consult your database documentation for additional information about increasing the packet size.6 Using Teradata FastExport FastExport is a utility that uses multiple Teradata sessions to quickly export large amounts of data from a Teradata database. ensure that the flat file is local to the Informatica server.-Oracle. Consult your database documentation for information about how to increase the packet size. If we do this.ora and tnsnames.1 Overview . If possible. 4. it is possible to improve performance by creating the tempdb as an in-memory database to allocate sufficient memory. you must also change the packet size in the relational connection object in the Workflow Manager to reflect the database server packet size. In the source qualifier. then create an index on the source table and order by the index field of the source table. you can optimize performance by using IPC protocol to connect to the Oracle database. see the Sybase or Microsoft SQL Server documentation. If source is a relational table. give a conditional query in the source qualifier so that the records are filtered off as soon as possible in the process.5 Connecting to Oracle Database Sources If you are running the Integration Service on a single node and the Oracle instance is local to the Integration Service process node. create a mapping with a Teradata source database. You can increase the database server network packet size in listener.ora and tnsnames. use FastExport reader instead of Relational reader. reduce the number of bytes (By default it is 1024 bytes per line) the Informatica reads per line.ora. -Sybase ASE and Microsoft SQL. For more information. if necessary. 4.Optimizing Mappings 5. In the session. If the source is a flat file. if the query has ORDER BY or GROUP BY. then try not to use synonyms or aliases. we can decrease the Line Sequential Buffer Length setting of the session properties. For Sybase ASE or Microsoft SQL Server. Use a FastExport connection to the Teradata tables that you want to export in a session. You can create a PowerCenter session that uses FastExport to read Teradata sources quickly.

You can improve session performance if the source flat file does not contain quotes or escape characters. ♦Optimize delimited flat file sources. ♦Optimize datatype conversions ♦Optimize expressions ♦Optimize external procedures. the Integration Service reads 1024 bytes per line. -Optimizing the Line Sequential Buffer Length If the session reads from a flat file source. The Integration Service reads the delimiter character as a regular character if you include the escape character before the delimiter character. you can decrease the line sequential buffer length in the session properties. ♦Configure single-pass reading. ♦Optimize filters. but it can significantly boost session performance. The Integration Service reads each source . More tags result in a larger file size. you can improve session performance by setting the number of bytes the Integration Service reads per line. Focus on mapping-level optimization after you optimize the targets and sources. Generally. If each line in the source file is less than the default setting. -Optimizing Delimited Flat File Sources If a source is a delimited flat file. You can also perform the following tasks to optimize the mapping: ♦Optimize the flat file sources.2 Optimizing Flat File Sources Complete the following tasks to optimize flat file sources: ♦Optimize the line sequential buffer length. -Optimizing XML and Flat File Sources XML files are usually larger than flat files because of the tag information. Configure the mapping with the least number of transformations and expressions to do the most amount of work possible. 5. You can combine the transformation logic for each mapping in one mapping and use one source qualifier for each source. Delete unnecessary links between transformations to minimize the amount of data moved. By default. ♦Optimizing XML and flat file sources. You must also specify the escape character. the Integration Service may take longer to read and cache XML sources. Consider using single-pass reading if you have multiple sessions that use the same sources. ♦Optimize Simple Pass Through mappings. you reduce the number of transformations in the mapping and delete unnecessary links between transformations to optimize the mapping. The size of an XML file depends on the level of tagging in the XML file.Mapping-level optimization may take time to implement. As a result. you must specify the delimiter character to separate columns of data in the source file. 5.3 Configuring Single-Pass Reading Single-pass reading allows you to populate multiple targets with one source qualifier.

and then split the mapping after the transformation. To pass directly from source to target without any other transformations. . For example. A particular row can be used by all the pipelines. you force the Integration Service to read the same source table twice.5 Optimizing Filters Use one of the following methods to filter data: ♦Use a Source Qualifier transformation : The Source Qualifier transformation filters rows from relational sources. The Filter transformation limits the row set sent to a target. and then sends the appropriate data to the two separate pipelines. you have the Purchasing source table. If you place the Aggregator and Rank transformations in separate mappings and sessions. connect the Source Qualifier transformation directly to the target.4 Optimizing Simple Pass through Mappings You can optimize performance for Simple Pass Through mappings. use a Filter transformation and move it as close to the Source Qualifier transformation as possible to remove unnecessary data early in the data flow. you can minimize work by subtracting the percentage before splitting the pipeline. by any combination of pipelines. When changing mappings to take advantage of single-pass reading. or by no pipelines. ♦Use a Filter transformation : The Filter transformation filters data within a mapping. Use a filter in the Source Qualifier transformation to remove the rows at the source. 5. The Source Qualifier transformation limits the row set extracted from a relational source. However. You can use an Expression transformation to subtract the percentage. you can improve efficiency by filtering early in the data flow. the wizard creates an Expression transformation between the Source Qualifier transformation and the target. If you cannot use a filter in the Source Qualifier transformation. If you use a mapping wizard to create a Simple Pass Through mapping. Note: You can also use a Filter or Router transformation to drop rejected rows from an Update Strategy transformation if you do not need to keep rejected rows. For example. Avoid using complex expressions in filter conditions. if you need to subtract a percentage from the Price ports for both the Aggregator and Rank transformations. If you filter rows from the mapping. 5.once and then sends the data into separate pipelines. you can optimize this feature by factoring out common functions from mappings. and you use that source daily to perform an aggregation and a ranking. if you include the aggregation and ranking logic in one mapping with one source qualifier. the Integration Service reads the Purchasing source table once. You can optimize Filter transformations by using simple integer or true/false expressions in the filter condition. The Filter transformation filters rows from any type of source.

the Integration Service reads COLUMN_A. Where possible. finds the sum. Each time you use an aggregate function call. However. the Integration Service must search and group the data. zip code information as a Char or Varchar datatype. Complete the following tasks to isolate the slow expressions: 1.7 Optimizing Expressions You can also optimize the expressions used in the transformations. Each target requires a Social Security number lookup. then back to an Integer column.S. pass the lookup results to all five targets. the unnecessary datatype conversion slows performance. if a mapping moves data from an Integer column to a Decimal column. and finally finds the sum of the two sums: SUM(COLUMN_A) + SUM(COLUMN_B) If you factor out the aggregate function call. Run the mapping to determine the time it takes to run the mapping without the transformation. Minimizing Aggregate Function Calls When writing expressions. in the following expression. . SUM(COLUMN_A + COLUMN_B) Replacing Common Expressions with Local Variables If you use the same expression multiple times in one transformation. look for ways to optimize the slow expression.6 Optimizing Datatype Conversions You can increase performance by eliminating unnecessary datatype conversions. 2. For example. the Integration Service adds COLUMN_A to COLUMN_B. you can make that expression a local variable. For example. ♦Convert the source dates to strings through port-to-port conversions to increase session performance. isolate slow expressions and simplify them. This helps increase the speed of the lookup comparisons based on zip code. Instead of performing the lookup five times. then finds the sum of both. Factoring Out Common Logic If the mapping performs the same task in multiple places. You can either leave the ports in targets as strings or change the ports to Date/Time ports. finds the sum. When possible. you speed performance.5. as below. reduce the number of times the mapping performs the task by moving the task earlier in the mapping. eliminate unnecessary datatype conversions from mappings. you have a mapping with five target tables. If there is a significant difference in session run time. many databases store U. 5. If you convert the zip code data to an Integer datatype. Next. Use the following datatype conversions to improve system performance: ♦Use integer values in place of other datatypes when performing comparisons using Lookup and Filter transformations. by calculating the variable only once. place the Lookup transformation in the mapping before the data flow splits. For example. the lookup database stores the zip code 94303-1234 as 943031234. You can use a local variable only within the transformation. then reads COLUMN_B. For example. Remove the expressions one-by-one from the mapping. factor out as many aggregate function calls as possible.

For example.Choosing Numeric Versus String Operations The Integration Service processes numeric operations faster than string operations. For example. FLG_C. configuring the lookup around EMPLOYEE_ID improves performance. it slows each time it finds trailing blank spaces in the row. if you look up large amounts of data on two columns. IIF( FLG_A = 'Y' and FLG_B = 'N' AND FLG_C = 'N'. Where possible. you have a source with three Y/N flags: FLG_A.FIRST_NAME || ‘ ’ || CUSTOMERS. IIF( FLG_A = 'Y' and FLG_B = 'Y' AND FLG_C = 'N'. VAL_A + VAL_B + VAL_C. You can use the Treat CHAR as CHAR On Read option when you configure the Integration Service in the Administration Console so that the Integration Service does not trim trailing spaces from the end of Char source fields Choosing DECODE Versus LOOKUP When you use a LOOKUP function.FIRST_NAME. Optimizing Char-Char and Char-Varchar Comparisons When the Integration Service performs comparisons between CHAR and VARCHAR columns. using DECODE may improve performance. when you want to look up a small set of unchanging values. you incorporate the lookup values into the expression so the Integration Service does not have to look up a separate table. you have the following expression that contains nested CONCAT functions: CONCAT( CONCAT( CUSTOMERS. You use the following expression: IIF( FLG_A = 'Y' and FLG_B = 'Y' AND FLG_C = 'Y'. VAL_A + VAL_C. You want to return values based on the values of each flag. which allows for more compact expressions. the Integration Service must look up a table in a database.LAST_NAME) You can rewrite that expression with the || operator as follows: CUSTOMERS. When you use a DECODE function. EMPLOYEE_NAME and EMPLOYEE_ID. . use operators to write expressions.LAST_NAME Optimizing IIF Expressions IIF expressions can return a value and an action. VAL_A + VAL_B. IIF( FLG_A = 'Y' and FLG_B = 'N' AND FLG_C = 'Y'. Using Operators Instead of Functions The Integration Service reads expressions written with operators faster than expressions with functions. FLG_B. Therefore. For example. ‘ ’) CUSTOMERS.

However. VAL_B + VAL_C. Copy the mapping and replace half of the complex expressions with a constant. 0. VAL_B. 4. You can block input data instead of buffering it which usually increases session performance. 16 ANDs. you would need to write the procedure code to buffer incoming data. Time the session with the original expressions. 0. two comparisons. Complete the following steps to evaluate expression performance: 1. 2. VAL_B . 0.0)+ IIF(FLG_C='Y'. If you use blocking.0. VAL_C. you increase performance because the procedure does not need to copy the source data to a buffer. you could write the external procedure to allocate a buffer and copy the data from one input group to the buffer until it is ready to process the data. When you write the external procedure code to block data. two additions. IIF( FLG_A = 'N' and FLG_B = 'Y' AND FLG_C = 'Y'. Run and time the edited session. IIF( FLG_A = 'N' and FLG_B = 'Y' AND FLG_C = 'N'. and a faster session. you can write the external procedure code to block the flow of data from one input group while it processes the data from the other input group.8 Optimizing External Procedures You might want to block input data if the external procedure needs to alternate reading from input groups. you can rewrite that expression as: IIF(FLG_A='Y'. 5. 0. . VAL_C. IIF( FLG_A = 'N' and FLG_B = 'N' AND FLG_C = 'N'. Run and time the edited session. IIF( FLG_A = 'N' and FLG_B = 'N' AND FLG_C = 'Y'. For example. 5.0)+ IIF(FLG_B='Y'. Copying source data to a buffer decreases performance.VAL_A . Without the blocking functionality. 3. )))))))) This expression requires 8 IIFs. Evaluating Expressions If you are not sure which expressions slow performance. The external procedure reads a row from the first input group and then reads a row from the second input group. If you take advantage of the IIF function.0) This results in three IIFs. evaluate the expression performance to isolate the problem. you need to create an external procedure with two input groups. and at least 24 comparisons. Make another copy of the mapping and replace the other half of the complex expressions with a constant. VAL_A.

Try to keep the stored procedures simple in the mappings. Stored procedures reduce performance. Try running the mapping after removing all transformations. then an optimizing query will improve performance. aggregator etc should be used as less as possible. use an expression transformer and do the calculation in the mapping. Keep database interactions as less as possible. In the sql query that Informatica generates. Optimizing Transformations .9 Tips and Tricks • • • • • • • • • • • • • • • • Avoid executing major sql queries from mapplets or mappings. Instead of doing complex calculation in query. For better performance it is best to order by the index field of that table. then dividing them into separate mappings can improve performance. If we need to use a single source more than once in a mapping. Remove all the unnecessary links between the transformations from mapping. 6. On a mapping. then we have to finetune the transformation. then keep only one source and source qualifier in the mapping. If it is taking significantly less time than with the transformations. Otherwise time will be spent on field conversions. Active transformations like rank. filter. Transformation errors result in performance degradation. Then create different data flows as required into different targets or same target. field with the same information should be given the same type and length throughout the mapping. joiner. Combine the mappings that use same set of source data. If a session joins many source tables in one source qualifier. Use optimized queries when we are using them. Unnecessary data type conversions should be avoided since the data type conversions impact performance. removing the staging area will increase performance. If a single mapping contains many targets.5. Remove the ORDER BY clause if not needed or at least reduce the number of column names in that list. Reduce the number of transformations in the mapping. If data is passing through multiple staging areas. ORDERBY will be present.

Use the Sorted Input option to sort data. ♦Sorter transformations. ♦Limit port connections. As the Integration Service reads rows for a group. Use this option with the Source Qualifier Number of Sorted Ports option or a Sorter transformation to pass sorted data to the Aggregator transformation. You can optimize the following transformations in a mapping: ♦Aggregator transformations.1 Overview You can further optimize mappings by optimizing the transformations contained in the mappings. You can use the following guidelines to optimize the performance of an Aggregator transformation: ♦Group by simple columns. ♦Sequence Generator transformations. Aggregator transformations need additional memory to hold intermediate group results. ♦Lookup transformations. it stores group information in memory. Use Sorted Input You can increase session performance by sorting data for the Aggregator transformation. use numbers instead of string and dates in the columns used for the GROUP BY. Group By Simple Columns You can optimize Aggregator transformations when you group by simple columns. When you use the Sorted Input option. the Integration Service assumes all data is sorted by group. ♦Use sorted input. When possible. Avoid complex expressions in the Aggregator expressions.6. ♦Use incremental aggregation.2 Optimizing Aggregator Transformations Aggregator transformations often slow performance because they must group data before processing it. ♦Joiner transformations. ♦SQL transformations ♦Update transformation ♦Filter transformation ♦Expression transformation 6. it performs aggregate calculations. The Sorted Input option reduces the amount of data cached during the session and improves performance. When necessary. . ♦Filter data before you aggregate it. ♦Source Qualifier transformations. The Sorted Input option decreases the use of aggregate caches. ♦Custom transformations.

Limit the number of connected input or output ports. in the GROUP BY clause. This reduces the cache size. ♦You can write the procedure code to perform an algorithm on a block of data instead of each row of data.4 Optimizing Joiner Transformations . You can write the procedure code to specify whether the procedure receives one row or a block of rows. So to improve performance here. If you use a Filter transformation in the mapping. Avoid complex expressions in aggregator conditions. you can use Incremental Aggregation to optimize the performance of Aggregator transformations. 3. Limit Port Connections Limit the number of connected input/output or output ports to reduce the amount of data the Aggregator transformation stores in the data cache. 1. 6. 6.You can benefit from better performance when you use the Sorted Input option in sessions with multiple partitions. AVERAGE etc.3 Optimizing Custom Transformations The Integration Service can pass a single row to a Custom transformation procedure or a block of rows in an array. you apply captured changes in the source to aggregate calculations in a session. ♦You can increase the locality of memory access space for the data. The Integration Service calls the input row notification function fewer times. use numbers instead of strings if possible. 2. and the procedure calls the output notification function fewer times. Use Incremental Aggregation If you can capture changes from the source that affect less than half the target. use sorted ports. In aggregator transformation. When using incremental aggregation. You can increase performance when the procedure receives a block of rows: ♦You can decrease the number of function calls the Integration Service and procedure make. 4. Aggregator. place the transformation before the Aggregator transformation to reduce unnecessary aggregation. Tips and Tricks Aggregator transformation helps in performing aggregate calculations like SUM. Filter Data before You Aggregate Filter the data before you aggregate it. You can increase the index and data cache sizes to hold all data in memory without paging to disk. The Integration Service updates the target incrementally. rank and joiner transformation will decrease performance since they group data before processing. rather than processing the entire source and recalculates the same calculations every time you run the session.

To perform a join in a database. designate the source with fewer rows as the master source. and performance can be slowed. You see the greatest performance improvement when you work with large data sets. In some cases. If you use a Lookup transformation. 1. Normal joins are faster than outer joins and result in fewer rows. 4. the Integration Service must cache more rows. ♦Cache lookup tables. join the tables in the source database rather than using a Lookup transformation. ♦Designate the master source as the source with the fewer rows. perform the following tasks to increase performance: ♦Use the optimal database driver. 2.5 Optimizing Lookup Transformations If the lookup table is on the same database as the source table in your mapping and caching is not feasible. use the following options: ♦Create a pre-session stored procedure to join the tables in a database. it caches rows for one hundred unique keys at a time. you cannot perform the join in the database. 6. the fewer iterations of the join comparison occur. Sort the data before joining. Instead of joiner transformation. the Integration Service improves performance by minimizing disk input and output. perform joins in database. ♦Join sorted data when possible. 3. ♦Perform joins in a database when possible. You can improve session performance by configuring the Joiner transformation to use sorted input. For an unsorted Joiner transformation. In joiner transformation. Use the following tips to improve session performance with the Joiner transformation: ♦Designate the master source as the source with fewer duplicate key values. . 6. When the Integration Service processes a sorted Joiner transformation. During a session. When you configure the Joiner transformation to use sorted data. In joiner transformations.Joiner transformations can slow performance because they need additional space at run time to hold intermediary results. Performing a join in a database is faster than performing a join in the session. Join on fewer columns as possible. ♦Optimize the lookup condition. If the master source contains many rows with the same key value. such as joining tables from two different databases or flat file systems. ♦Use the Source Qualifier transformation to perform the join. which speeds the join process. You can view Joiner performance counter information to determine whether you need to optimize the Joiner transformations. normal joins are faster than outer joins. Use source qualifier to perform joins instead of joiner transformation wherever possible. the source with lesser number of records should be the master source. Tips and Tricks Joiner transformation helps to perform joins of two source tables. The fewer rows in the master. The type of database join you use can affect performance. the Joiner transformation compares each row of the detail source against the master source. 5.

Using Optimal Database Drivers The Integration Service can connect to a lookup table using a native database driver or an ODBC driver. Complete the following tasks to further enhance performance for Lookup transformations: ♦Use the appropriate cache type. using a lookup cache can increase session performance for smaller lookup tables. Use the following information to calculate the minimum and maximum lookup index cache for both connected and unconnected Lookup transformations: To calculate the minimum lookup index cache size. ♦Optimize multiple lookups. When you enable caching. whether or not you cache the lookup table. The result of the Lookup query and processing is the same.♦Index the lookup table. Caching Lookup Tables If a mapping contains Lookup transformations. However. use the formula:Columns in lookup cache = 200 * [<Column size> + 16] . ♦Override the ORDER BY statement. the Integration Service queries the lookup table on a row-by-row basis. Using a persistent cache can improve performance because the Integration Service builds the memory cache from the cache files instead of from the database. If you want to save and reuse the cache files. You can share an unnamed cache between transformations in the same mapping. you want to cache lookup tables that need less than 300 MB. Types of Caches Use the following types of caches to increase performance: ♦Shared cache. you can configure the transformation to use a persistent cache. You can share a named cache between transformations in the same or different mappings. ♦Persistent cache. For best session performance. Use this feature when you know the lookup table does not change between session runs. you might want to enable lookup caching. You can share the lookup cache between multiple transformations. When this option is not enabled. Native database drivers provide better session performance than ODBC drivers. ♦Enable concurrent caches ♦Optimize Lookup condition matching ♦Reduce the number of cached rows. specify the maximum lookup index cache size. In general. OPTIMUM CACHE SIZE IN LOOKUPS -Calculating Lookup Index Cache The lookup index cache holds data for the columns used in the lookup condition. ♦Use a machine with more memory. the Integration Service caches the lookup table and queries the lookup cache during the session.

It uses the lookup condition. use the formula:Columns in lookup cache = <Number of rows in lookup table> * [<Column size of connected output ports not in lookup condition > + 8] Example:Suppose the lookup table has column names as PROMOTION_ID and DISCOUNT which are connected output ports not in lookup condition Column size of each is 16. In an unconnected transformation. 40. Therefore the total column size is 16.000 So this lookup transformation needs a data cache size of 2. Minimum lookup index cache size = 200 * [16 + 16] = 6400 Maximum lookup index cache size = 60000 * [16+16] * 2 = 3.000. This slows Lookup transformation processing.400. . use the formula:Columns in lookup cache = <Number of rows in lookup table>* [<column size> + 16] * 2 Example:Suppose the lookup table has lookup values based in the field ITEM_ID. To calculate the minimum lookup data cache size. The table contains 60000 rows.840.000 So this lookup transformation needs an index cache size between 6400 and 3.000 bytes. For best session performance. Minimum lookup data cache size = 60000 * [32 + 8] = 2. Enable Concurrent Caches When the Integration Service processes sessions that contain Lookup transformations. the Integration Service builds a cache in memory when it processes the first row of data in a cached Lookup transformation.000 bytes. This ITEM_ID has data type as ‘integer’ and size as ‘16’. this lookup transformation needs an index cache size of 3. Therefore total column size is 32. ITEM_ID = IN_ITEM_ID1.840. the data cache contains data for the connected output ports. the data cache contains data from the return port.The table contains 60000 rows.840. -Calculating Lookup Data Cache In a connected transformation. the Integration Service creates the caches sequentially when the first row of data is processed by the Lookup transformation. not including ports used in the lookup condition.To calculate the maximum lookup index cache size. If there are multiple Lookup transformations in a mapping.

Enter the following lookup query in the lookup SQL override: SELECT ITEMS_DIM. When you enable multiple concurrent pipelines. Reducing the Number of Cached Rows You can reduce the number of rows included in the cache to increase performance. even if you enter one in the override. It does not index all ports as it does when you configure the transformation to return the first matching value or the last matching value. the Integration Service generates an ORDER BY statement for a cached lookup. When the number of additional concurrent pipelines is set to one or more. ITEMS_DIM. performance can improve because the transformation does not index on all ports. Use the Lookup SQL Override option to add a WHERE clause to the default SQL statement. the Integration Service builds caches concurrently rather than sequentially. which can slow performance. Joiner. You must also enclose all database reserved words in quotes.ITEM_NAME. When you enter the ORDER BY statement. To increase performance. Other Lookup transformations in the pipeline also build caches concurrently. Indexing the Lookup Table . You can configure the transformation to return any value that matches the lookup condition. Optimize Lookup Condition Matching When the Lookup transformation matches lookup cache data with the lookup condition.You can enable concurrent caches to improve performance. or Sorter transformations. it sorts and orders the data to determine the first matching value and the last matching value. place the conditions with an equal sign first to optimize lookup performance.ITEM_ID FROM ITEMS_DIM ORDER BY ITEMS_DIM. When you configure the Lookup transformation to return any matching value. the Integration Service no longer waits for active sessions to complete before it builds the cache. the transformation returns the first value that matches the lookup condition.PRICE. Place two dashes ‘--’ after the ORDER BY override to suppress the generated ORDER BY statement.PRICE -- Using a Machine with More Memory You can also increase session performance by running the session on an Integration Service machine with a large amount of memory. The Integration Service always generates an ORDER BY statement. ITEMS_DIM. Performance improves greatly when the sessions contain a number of active transformations that may take time to complete. For example. a Lookup transformation uses the following lookup condition: ITEM_ID = IN_ITEM_ID PRICE <= IN_PRICE The Lookup transformation includes three lookup ports used in the mapping. Optimizing the Lookup Condition If you include more than one lookup condition. and PRICE.ITEM_ID. If the Integration Service machine has enough memory. Overriding the ORDER BY Statement By default. When you use any matching value. increase the cache so it can hold all data in memory without paging to disk. ITEMS_DIM. you can suppress the default ORDER BY statement and enter an override ORDER BY with fewer columns. ITEM_NAME. The ORDER BY statement contains all lookup ports. enter the columns in the same order as the ports in the lookup condition. Increase the index and data cache sizes as high as you can without straining the machine. such as Aggregator. ITEM_ID.

To improve performance. ♦Uncached lookups. You can improve performance for the following types of lookups: ♦Cached lookups. To improve performance. We put the following sql query override in Lookup Transform ‘select employee_id from EMPLOYEE_TABLE’ If there are 50. this makes operations run very fast. The session log contains the ORDER BY statement. Informatica can cache all the lookup and reference tables. This reduces the time taken by the process to a large extent. SUPPORT_TABLE. The Integration Service issues a SELECT statement for each row that passes into the Lookup transformation.The Integration Service needs to query. cache the lookup tables. ‘employee_id’ is from the lookup table. (Meaning of cache is given in point 2 of this section and the procedure for determining the optimum cache size is given at the end of this document. To improve performance. then size of the lookup cache will be 50. Lookups slows down the performance. Instead of the above query. 1. sort. 2. we put the following:‘select emp employee_id from EMPLOYEE_TABLE e. Example for caching by a user defined query: Suppose we need to lookup records where employee_id=eno. Optimizing Multiple Lookups If a mapping contains multiple lookups. EMPLOYEE_TABLE and ‘eno’ is the input that comes from the from the source table.000 employee_id. then the size of the lookup cache will be only 1000.) Even after caching. To determine which Lookup transformations process the most data. index the columns in the lookup condition. Tips and Tricks Lookup transformations are used to lookup a set of values in another table. Cache: Cache stores data in memory so that Informatica does not have to read the table each time it is referenced.eno’ If there are 1000 eno. and compare values in the lookup condition columns. . the performance can be further improved by minimizing the size of the lookup cache. the lookups can slow performance. SUPPORT_TABLE s where e. employee_id=s. If those expressions can be optimized. The Lookup transformations that have a large number in this counter might benefit from tuning their lookup expressions. The index needs to include every column used in a lookup condition. index the columns in the lookup ORDER BY statement. even with caching enabled and enough heap memory. examine the Lookup_rowsinlookupcache counters for each Lookup transformation. Reduce the number of cached rows by using a sql override with a restriction.000. Cache is automatically generated by Informatica depending on the marked lookup ports or by a user defined sql query. session performance improves. Tune the Lookup transformations that query the largest amounts of data to improve overall performance.

If lookup transformation specifies several conditions. then also use persistent cache. then use dynamic cache. 12. then it will take too long to cache or fit in memory. 13. then make it reusable & use multiple times in the folder. Too many lookups inside a mapping will slow down the session. If lookup table has a lot of data. 9. 17. then place conditions that use equality operator ‘=’ first in the conditions that appear in the conditions tab. Joiner transformation takes more time than source qualifier transformation. do so). Remove it if not needed or put fewer column names in the ORDER BY list. then using persistent cache will help the sessions to reuse cache files. then use unconnected lookup. set the Number of Cache Values to 0. 3. If the table that we use for look up has an index (or if we have privilege to add index to the table in the database. If we have an index that is even partially in this order. then the performance would increase both for cached and uncached lookups. . If there are several lookups with the same data set. If target table is the lookup table. In case of static lookups. All data are read into cache in the order the fields are listed in lookup ports. 5. 7. Tips and Tricks A sequence generator transformation is used to generate primary keys in the Informatica. If source is huge and lookup table is also huge. 8. -If lookup is done on the primary key of the lookup table. 6. If we are going to return only 1 row. 11. If you do not have to cache values. Our concern is to make the size of the cache as less as possible. 1. 16. Cache the lookup table columns definitely in the following case: -If lookup table is small and source is large. 15. The Informatica server updates the lookup cache as it passes rows to the target. replace lookups by joiner transformation or single source qualifier. 14. use persistent cache. So move those fields to source qualifier and then join with the main table.But here the performance gain will happen only if the number of records in SUPPORT_TABLE is not huge. then share the caches. If possible. Use only the lookups you want in the mapping. 6. You can also optimize Sequence Generator transformations by configuring the Number of Cached Values property. 4. Make sure that the Number of Cached Value is not too small. If lookup data is static. delete all unused columns and keep only the fields that are used in the mapping. You may consider configuring the Number of Cached Values to a value greater than 1. there will be an ORDER BY clause. The Number of Cached Values property determines the number of values the Integration Service caches at one time. which will improve the performance. In lookup tables. 10.000. the loading of these lookups can be speeded up. Do not use caching in the following cases: -Source is small and lookup table is large. If several sessions in the same job use the same lookup table. If we need the sequence generator more than once in a job. Sequence Generator transformations that do not use cache are faster than those that require cache. cache files will be built from memory cache instead of from the database.6 Optimizing Sequence Generator Transformations You can optimize Sequence Generator transformations by creating a reusable Sequence Generator and using it in multiple mappings simultaneously. In the sql override query of the lookup table. Persistent caches help to save and reuse cache files.

Tips and Tricks Sorter transformation is used to sort the input data. Allocating Memory If the Integration Service cannot allocate enough memory to sort data. We can also opt for sequencing in the source qualifier by adding a dummy field in the source definition and source qualifier.. To increase session performance. Configure the sorter cache size setting to be larger than the input data size while using sorter transformation.608 bytes by default. To generate primary keys. configure Sorter cache size with a value less than or equal to the amount of available physical RAM on the Integration Service machine. At the sorter transformation. You can specify any directory on the Integration Service machine to use as a work directory. use Sequence generator transformation instead of using a stored procedure for generating sequence numbers.388. specify work directories on physically separate disks on the Integration Service nodes. For best performance. configure sorter cache size to be larger than the input data size. <Sequence name>. it fails the session. the Integration Service may require much more than twice the amount of disk space available to the work directory. Sorter cache size is set to 8. from <source table name> where <condition if any>’. and then giving a sql query like ‘select seq_name. 3. 3. 1. While using the sorter transformation.7 Optimizing Sorter Transformations Complete the following tasks to optimize a Sorter transformation: ♦Allocate enough memory to sort the data. <other column names>. 2. When you partition a session with a Sorter transformation.2. It stores them in a work directory. If the amount of incoming data is greater than the amount of Sorter cache size. Seq_name is the sequence that generates primary key for our source table. 6. This method of primary key generation is faster than using sequence generator transformation. Use the following formula to determine the size of incoming data: # input rows ([Sum (column size)] + 16) Work Directories for Partitions The Integration Service creates temporary files when it sorts data. the Integration Service temporarily stores data in the Sorter transformation work directory. If the amount of incoming data is significantly greater than the Sorter cache size.608 bytes) of physical memory to sort data using the Sorter transformation. Informatica recommends allocating at least 8 MB (8. . you can specify a different work directory for each partition in the pipeline.388. By default.. The Integration Service requires disk space of at least twice the amount of incoming data when storing data in the work directory. the Integration Service uses the value specified for the $PMTempDir server variable. Nextval is a sequence generator object in Oracle. ♦Specify a different work directory for each partition in the Sorter transformation. use hash auto keys partitioning or hash user keys partitioning.nextval.

When the transformation runs in query mode. you configure the transformation to use external SQL queries or queries that you define in the transformation. it has a performance impact. it calls a function called SQLPrepare to create an SQL procedure and pass it to the database. 1. the Integration Service processes an SQL query that you define in the transformation. 2. You can choose a static connection or you can pass connection information to the transformation at run time.8 Optimizing Source Qualifier Transformations Use the Select Distinct option for the Source Qualifier transformation if you want the Integration Service to select unique values from a source. although the data in the query clause changes. use parameter binding instead of string substitution in the SQL Editor.9 Optimizing SQL Transformations When you create an SQL transformation. To optimize performance. Use Select Distinct option to filter unnecessary data earlier in the data flow. When you use parameter binding you set parameters in the query clause to values in the transformation input ports.10 Optimizing Update Transformation Sorter transformation is used to sort the input data. If elimination of unwanted data can be done by source qualifier instead of filter. use hash auto keys partitioning or hash user keys partitioning. configure sorter cache size to be larger than the input data size. 6. Use conditional filters and keep the filter condition simple. then eliminate them using former. When you create the SQL transformation. Use filter transformation as close to source as possible so that unwanted data gets eliminated sooner. 3. While using the sorter transformation. the Integration Service processes an external SQL script for each input row. the Integration Service must recreate the SQL procedure after each commit or rollback. 6. you choose a connection from the Workflow Manager connections. A static query statement does not change. When you pass dynamic connection information. 2. This can improve performance. you can improve performance by constructing a static query in the transformation.11 Optimizing Filter Transformation Filter transformation is used to filter off unwanted fields based on conditions we specify. . Configure the sorter cache size setting to be larger than the input data size while using sorter transformation. the SQL transformation connects to the database each time the transformation processes an input row. 1. When you configure the transformation to use a static connection. 6. The SQL transformation connects to the database once during the session. When the transformation runs in query mode. involving TRUE/FALSE or 1/0. When an SQL query contains commit and rollback query statements. When you configure an SQL transformation to run in script mode. To create a static query. When the query changes for each input row. do not use transaction statements in an SQL transformation query. you configure how the transformation connects to the database.6. At the sorter transformation. 3. Each time the Integration Service processes a new query in a session.

You can increase session performance by pushing transformation logic to the source or target database. 2. Minimize the usage of string functions. If we use a complex expression multiple times in the expression transformer. Then we need to use only this variable for all computations. 7. 1. you can focus on optimizing the session. Use operators instead of functions. and mapping. . then make that expression as a variable. target database. You can increase performance by using a grid to balance the Integration Service workload. ♦Run sessions and workflows concurrently. You can run independent sessions and workflows concurrently to improve session and workflow performance.Optimizing Sessions 7.6. 3. You can perform the following tasks to improve overall performance: ♦Use a grid. ♦Use pushdown optimization.1 Overview Once you optimize the source database.12 Optimizing Expression Transformation Expression transformation is used to perform simple calculations and also to do source lookups.

If the Integration Service cannot allocate enough memory blocks to hold the data. A grid is an alias assigned to a group of nodes that allows you to automate the distribution of workflows and sessions across nodes. which reduces the number of log events generated by the Integration Service. The Integration Service allocates at least two blocks for each source and target partition. Running workflows and sessions on the nodes of a grid provides the following performance gains: ♦Balances the Integration Service workload. You can increase the buffer memory allocation for sources and targets that require additional memory blocks. Sessions that use a large number of sources and targets might require additional memory blocks. ♦Remove staging areas. ♦Processes concurrent sessions faster. where you have dimension and fact tables. you can reduce the error tracing level. it allocates blocks of memory to hold source and target data. You can eliminate staging areas to improve session performance. If the Integration Service cannot allocate enough memory blocks to hold the data. . 7.5 Allocating Buffer Memory When the Integration Service initializes a session. You can improve session performance by setting the optimal location and size for the caches. the Integration Service executes SQL against the source or target database instead of processing the transformation logic within the Integration Service. Performance slows when the Integration Service reads and manipulates data with the high precision datatype. ♦Increase the commit interval.4 Run Concurrent Sessions and Workflows If possible. if you load data into an analytic schema. Each time the Integration Service commits changes to the target. 7. it fails the session. Based on the mapping and session configuration. To improve performance. 7. ♦Optimize caches. load the dimensions concurrently. ♦Reduce errors tracing.. You can disable high precision to improve session performance. run sessions and workflows concurrently to improve performance. You can increase session performance by increasing the interval at which the Integration Service commits changes.2 Using a Grid You can use a grid to increase session and workflow performance. performance slows. For example. ♦Disable high precision. ♦Processes partitions faster. the Integration Service performs multiple passes on the data. the Integration Service distributes workflow tasks and session threads across multiple nodes. it fails the session.3 Using Pushdown Optimization You can increase session performance by pushing transformation logic to the source or target database.♦Allocate buffer memory. 7. When you use a grid. When you use a staging area.

9) * (DTM Buffer Size) / (Default Buffer Block Size) * (number of partitions) 200 = . based on default settings. you create a session that contains a single partition using a mapping that contains 50 sources and 50 targets. You can increase the number of available memory blocks by adjusting the following session parameters: ♦DTM Buffer Size. The Log Manager writes this warning message even if the number of memory blocks is enough for the session to run successfully. The Log Manager writes a warning message in the session log if the number of memory blocks is so small that it causes performance degradation.You can configure the amount of buffer memory. increase the property by multiples of the buffer block size. Increase the DTM buffer size on the Properties tab in the session properties. When you increase the DTM buffer memory. If you have XML sources or targets in a mapping. Then. For example.000. Then you make the following calculations: 1. set the DTM Buffer Size to at least n times the value for the session with one partition. or you can change the Default Buffer Block Size to 54. To increase the DTM buffer size. DTM buffer memory allocation is not a factor in session performance. open the session properties and click the Properties tab. To configure these settings. use the number of groups in the XML source or target in the calculation for the total number of sources and targets. you determine that you can change the DTM Buffer Size to 15. If you modify the DTM Buffer Size. consider the total memory available on the Integration Service process system.000. Decrease the buffer block size on the Config Object tab in the session properties. Increasing DTM Buffer Size The DTM Buffer Size setting specifies the amount of memory the Integration Service uses as DTM buffer memory.Based on default settings. calculate the buffer size and/or the buffer block size to create the required number of session blocks. Note: Reducing the DTM buffer allocation can cause the session to fail early in the process because the Integration Service is unable to allocate memory to the required processes.9 * 12000000 / 54000 * 1 Note: For a session that contains n partitions.000: (session Buffer Blocks) = (. or you can configure the Integration Service to automatically calculate buffer settings at run time. which improves performance during momentary slowdowns.9 * 14222222 / 64000 * 1 or 200 = . ♦Default Buffer Block Size. The warning message also gives a suggestion for the proper value. Edit the DTM Buffer Size property in the Performance settings. If you do not see a significant increase in performance. The Integration Service uses DTM buffer memory to create the internal data structures and buffer blocks used to bring data into and out of the Integration Service. the Integration Service creates more buffer blocks. When you increase the DTM buffer memory allocation.You determine that the session requires a minimum of 200 memory blocks: [(total number of sources + total number of targets)* 2] = (session buffer blocks) 100 * 2 = 200 2. . first determine the number of memory blocks the Integration Service requires to initialize the session. Increasing DTM buffer memory allocation generally causes performance to improve initially and then level off.

buffer block size is not a factor in session performance. 4. you might need to decrease the buffer block size. If the machine has limited physical memory and the mapping in the session contains a large number of sources.Increase the property by multiples of the buffer block size. you can increase the buffer block size to improve performance.000. Optimizing the Buffer Block Size Depending on the session source data. Perform the following tasks to optimize caches: . If you do not see an increase. In the Mapping Designer. 6. It stores group information in the index cache. and then run and time the session after each increase. Open the target instance. increase the size of the buffers to improve performance.000 bytes. open the session properties and click the Config Object tab. the Integration Service stores the data in a temporary disk file as it processes the session data. If the buffer block size is 64. Click the Ports tab. So if the total precision is greater than 32. open the mapping for the session. if the total precision equals 33. If you are manipulating unusually large rows of data. or partitions. 7. Increase the DTM buffer block setting in relation to the size of the rows. and Joiner transformations. For example.000. 5. If you do not know the approximate size of the rows. Edit the Default Buffer Block Size property in the Advanced settings. To evaluate needed buffer block size: 1. you can determine the configured row size by completing the following steps. The Integration Service stores transformed data in the data cache before returning it to the pipeline. targets.000 bytes in the buffers to move that row. You can also configure the Integration Service to automatically calculate cache memory settings at run time. 7. To increase the buffer block size. The total precision represents the total bytes needed to move the largest row of data. Rank. If the allocated cache is not large enough to store the data. Lookup. Also. Performance slows each time the Integration Service pages to a temporary file. You can configure the amount of cache memory using the cache calculator or by specifying the cache size. Examine the performance details to determine how often the Integration Service pages to a file. a buffer accommodates at least 100 rows at a time. increasing buffer block size should improve performance. 2. Ideally. then the Integration Service requires 33. Choose the largest precision of all the source and target precisions for the total precision in the buffer block size calculation. 3. you might need to increase or decrease the buffer block size. Add the precision for all columns in the target.6 Optimizing Caches The Integration Service uses the index and data caches for XML targets and Aggregator. Repeat steps 2 to 5 for each source definition in the mapping. repeat steps 2 to 4 for each additional target to calculate the precision for each target. the Integration Service can move only one row at a time. If you have more than one target in the mapping. As with DTM buffer memory allocation. the Integration Service uses a cache to store data for Sorter transformations.

you can use the 64-bit PowerCenter version to increase session performance. You can examine the performance details to determine when the Integration Service pages to the temporary file. Limiting the number of connected input/output or output ports reduces the amount of data the transformations store in the data cache. complete the following steps: 1. Using the 64-bit version of PowerCenter If you process large volumes of data or perform memory-intensive transformations. With a 64-bit platform. Make the resource available to the nodes with fast access to the directory. performance slows. This can improve session performance in the following areas: ♦Caching. ♦Increase the cache sizes. The 64-bit version provides a larger memory space that can significantly reduce or eliminate disk input/output. . Each time the Integration Service pages to the temporary file. the Integration Service is not limited to the 2 GB cache limit of a 32-bit platform. configure each session with a large cache to run on the nodes with fast access to the directory. Assign the resource to the session. increase the data cache more than the index cache. 2. limit the number of connected input/output and output only ports.Create a PowerCenter resource. Rank. ♦Select the optimal cache directory location. If all Integration Service processes in a grid have slow access to the cache files. Lookup. Since the data cache is typically larger than the index cache.♦Limit the number of connected input/output and output only ports. Limiting the Number of Connected Ports For transformations that use data cache. Note: You may encounter performance degradation when you cache large quantities of data on a mapped or mounted drive. To configure a session to run on a node with fast access to the directory. set up a separate. If the session contains a transformation that uses a cache and you run the session on a machine with ample memory. Increasing the Cache Sizes If the allocated cache is not large enough to store the data. An Integration Service process may have faster access to the cache files if it runs on the same machine that contains the cache directory. the Integration Service stores the data in a temporary disk file as it processes the session data. Cache Directory Location If you run the Integration Service on a grid and only some Integration Service nodes have fast access to the shared cache file directory. ♦Use the 64-bit version of PowerCenter to run large cache sessions. The Transformation_readfromdisk or Transformation_writetodisk counters for any Aggregator. or Joiner transformation indicate the number of times the Integration Service must page to disk to process the transformation. increase the cache sizes so all data can fit in memory. 3. local cache file directory for each Integration Service process.

With a larger available memory space. The Integration Service reads the Decimal row 3900058411382035317455530282 as 390005841138203 x 1013. Therefore. 7. Click the Performance settings in the session properties to enable high precision. performance slows. Therefore.♦Data throughput.10 Removing Staging Areas When you use a staging area. the reader. When possible. 7.7 Increasing the Commit Interval The commit interval setting determines the point at which the Integration Service commits data to the targets. you may experience significant performance degradation when you run the session. since reading and manipulating the high precision datatype slows the Integration Service. and you do not need to correct them. the more often the Integration Service writes to the target database and the slower the overall performance. the smaller the commit interval. consider the log file limits in the target database. If the commit interval is too high. configure the Integration Service to recognize this datatype by selecting Enable High Precision in the session properties. the Integration Service does not write error messages or rowlevel information for reject data. When you disable high precision. you can improve session performance by disabling high precision. Click the General Options settings in the session properties to review and adjust the commit interval.8 Disabling High Precision If a session runs with high precision enabled. When you increase the commit interval. The Integration Service can read multiple sources with a single pass. 7. the Integration Service may fill the database log file and cause the session to fail. remove staging areas to improve performance. you can reduce the number of log events generated by the Integration Service when it runs the session. The session tracing level overrides any transformation-specific tracing levels within the mapping. set the session tracing level to Terse. weigh the benefit of increasing the commit interval against the additional time you would spend recovering a failed session. the Integration Service converts data to a double. which may alleviate the need for staging areas. If a session contains a large number of transformation errors. At this tracing level. writer. the Integration Service performs multiple passes on the data. To use a high precision Decimal datatype in a session. disabling high precision might improve session performance. Do not use Verbose tracing when you tune performance. However. This is not recommended as a long-term response to high levels of transformation errors. If you increase the commit interval. the number of times the Integration Service commits decreases and performance improves. Each time the Integration Service commits. .9 Reducing Error Tracing To improve performance. The Decimal datatype is a numeric datatype with a maximum precision of 28. If you need to debug the mapping and you set the tracing level to Verbose. 7. and DTM threads can process larger blocks of data.

1. if they are independent of each other. and rtrim reduce the performance.7. where the transformations are done and where the data is to be loaded. sequences etc. For concatenation. 10. Drop constraints and indexes before we run session. If update strategies are used. Use bulk loading. Also the performance will not improve if it does updates and deletes. and loads data in parallel pipelines. Rebuild them after the session run completes. Run the sessions in parallel rather than serial to gain time. Increase the database commit level (The point at which the Informatica server is set to commit data to the target table. Localize all source and target tables. reduce the performance to a big extent in this order. external loading etc. stage all data. stored procedures. Preference should be in the opposite order. 8. the operator ‘||’ is faster than the function CONCAT (). It has various properties that help us to schedule and run the job in the way we want.000 records) By avoiding built in functions as much as possible. In such cases. IFF (). So use operators instead of functions. pre-create the index. The functions like IS_SPACES (). then we have to keep it as ‘Data Driven’. views. 2. Each pipeline will be independent of the other. 5. where possible. For e. dropping indexes and then rebuilding them etc. 9. we can improve the performance. 4.g. In a session we have options to ‘Treat rows as ‘Data Driven. Bulk loading can be used only if the table does not have an index. String functions like substring. So disable ‘high precision’. it has to be kept as ‘Insert’ to improve performance. Insert. 7. Try not to connect across synonyms. IS_NUMBER (). . 3. Synonyms and aliases slow down the performance. Partition the session: This creates many connections to the source and target. use a transportable table space and then load into database. Dropping can be done in pre session script and Rebuilding in post session script.g. will be not possible. commit level can be set at every every 50.11 Tips and Tricks A session specifies the location from where the data is to be taken. But when the session does only insertion of rows into target table. In the sources. Manipulating high precision data types will slow down Informatica server. use delimited strings in case the source flat files or use varchar data type. E. Update and Delete’. But if data is too much. DECODE () etc. 6. But the performance of the session will not improve if the number of records is less. So session partitioning should be used only if the volume of data is huge and the job is mainly insertion of data. ltrim.

Configure the physical memory for the Integration Service process machine to minimize paging to disk. Decrease the number of network hops between the Integration Service process and databases.Optimizing the System 8. use processor binding to limit the resources used by the database. ♦Reduce paging. You can use multiple CPUs to run multiple sessions in parallel and run multiple pipeline partitions in parallel. if the source and target database are on the same machine. and usage by many users. System delays can also be caused by routers. Use processor binding to control processor usage by the Integration Service process. the Integration Service may use a large amount of system resources. it starts paging to disk to free physical memory. and nodes in the domain can slow session performance. source and target file systems. In a multi-processor UNIX environment.1 Overview Often performance slows because the session relies on inefficient connections or an overloaded Integration Service process system. ♦Use processor binding. Slow disk access on source and target databases. After you determine from the system monitoring tools that you have a system bottleneck. Have the system administrator evaluate the hard disks on the machines.8. When an operating system runs out of physical memory.2 Improving Network Speed . Also. ♦Use multiple CPUs. network protocols. make the following global changes to improve the performance of all sessions: ♦Improve network speed. Have the system administrator determine if the network runs at an optimal speed. 8. switches. Slow network connections can slow session performance.

additional CPUs might cause disk bottlenecks. store the files on the same machine as the Integration Service to improve performance. minimize the number of processes accessing the disk. The Sun Solaris environment also provides the psrinfo command to display details about each configured processor and the psradm command to change the operational status of processors. If you use relational source or target databases. In a Sun Solaris environment. have the network administrator analyze the network and make sure it has enough bandwidth to handle the data moving across the network from all partitions. You might want to increase system memory in the following circumstances: ♦You run a session that uses large cached lookups. 8. Multiple CPUs allow the system to run multiple sessions in parallel as well as multiple pipeline partitions in parallel. . When you run sessions that contain multiple partitions. the system administrator can create and manage a processor set using the psrset command. the Integration Service may use a large amount of system resources if you run a large number of sessions. Moving the target database onto a server system might improve Integration Service performance. Consider the following options to minimize network activity and to improve Integration Service performance. try to minimize the number of network hops between the source and target databases and the Integration Service process. If you use flat file as a source or target in a session and the Integration Service runs on a single node. 8. ♦You run a session with many partitions. A local disk can move data 5 to 20 times faster than a network. Also. if the source and target database are on the same machine. 8. you might want to add memory to the system. The system administrator can then use the pbind command to bind the Integration Service to a processor set so the processor set only runs the Integration Service. When you store flat files on a machine other than the Integration Service. use processor binding to limit the resources used by the database. However. Processes that access the disk include database functions and operating system functions. Parallel sessions or pipeline partitions also require disk access. You can free up more memory or increase physical memory to reduce paging and the slow performance that results from paging. Monitor paging activity using system tools. You can use processor binding to control processor usage by the Integration Service process node. To prevent disk bottlenecks. For more information.4 Reducing Paging Paging occurs when the Integration Service process operating system runs out of memory for a particular operation and uses the local disk for memory. Moving the files onto the Integration Service process system and adding disk space might improve performance.5 Using Processor Binding In a multi-processor UNIX environment.The performance of the Integration Service is related to network connections. As a result. session performance becomes dependent on the performance of the network connections. see the system administrator and Sun Solaris documentation. other applications on the machine may not have enough system resources available.3 Using Multiple CPUs Configure the system to use more CPUs to improve performance. If you cannot free up memory.

1 Tips and Tricks To gain the best Informatica performance. 1. the database tables. 9. 3.In an HP-UX environment. stored procedures and queries used in Informatica should be tuned well. So avoid network connections. and disk I/O bandwidth. . whereas a local disk moves data five to twenty times faster. 4. In an AIX environment. Thus network connections often affect on session performance. memory. The Process Resource Manager allocates minimum system resources and uses a maximum cap of resources.Optimizing Database 9. The Workload Manager can allocate resources and manage CPU. system administrators can use the Workload Manager in AIX 5L to manage system resources during peak demands. The performance of the Informatica server is related to network connections. If the source and target are flat files. 2. Optimize target databases. then they should be present in the system in which the Informatica server is present. Increase the network packet size. Data generally moves across a network at less than 1 MB per second. the system administrator can use the Process Resource Manager utility to control CPU usage in the system.

10.1 Overview You can optimize performance of the following PowerCenter components: ♦PowerCenter repository ♦Integration Service If you run PowerCenter on multiple machines. Location of the Repository Service Process and Repository You can optimize the performance of a Repository Service that you configured without the high availability option.Optimizing the PowerCenter Components 10. or restore a repository. To load large amounts of data. To receive expected results and improve performance. the PowerCenter Client and Integration Service access the repository faster than if the repository tables exist on different database nodes. Ordering Conditions in Object Queries When the Repository Service processes a parameter with multiple conditions.2 Optimizing PowerCenter Repository Performance Complete the following tasks to improve PowerCenter repository performance: ♦Ensure the PowerCenter repository is on the same machine as the Repository Service process. When setting up an IBM DB2 EEE database. copy. run the Repository Service on the machine hosting the PowerCenter repository. ♦Order conditions in object queries. ♦Use a single-node tablespace for the PowerCenter repository if you install it on a DB2 database. 10. the DB2 system specifies the default tablespace for each repository table. Using a Single-Node DB2 Database Tablespace You can optimize repository performance on IBM DB2 EEE databases when you store a PowerCenter repository in a single-node tablespace. The DB2 system may or may not specify a single-node tablespace. To optimize performance. run the Repository Service and Integration Service on different machines. If you do not specify the tablespace name when you create. ensure that the Repository Service process runs on the same machine where the repository database resides. run the Integration Service on the higher processing machine. Also. enter parameters in the order you want them to run.3 Optimizing Integration Service Performance Complete the following tasks to improve Integration Service performance: . When the tablespace contains one node. it processes them in the order you enter them. 10. the database administrator can define the database on a single node.

it uses two bytes for each character. Note: When you configure the Integration Service with high availability. configure the Integration Service to run in the ASCII data movement mode. For example. the Integration Service writes the states of each workflow and session to temporary files in a shared directory. Using Native and ODBC Drivers The Integration Service can use ODBC or native drivers to connect to databases. This increases DTM process performance. Caching PowerCenter Metadata for the Repository Service You can use repository agent caching to improve DTM process performance. Only metadata requested by the Integration Service is cached. When you cache metadata. ♦Run Integration Service with high availability. the Integration Service recovers workflows and sessions that may fail because of temporary network or machine failures. The first time you run a workflow with caching enabled. the Repository Service fetches the session metadata from the cache. the Integration Service fetches the session metadata from the repository.000 sessions. . During subsequent runs of the workflow. which can slow session performance. you run a workflow with 1. the Repository Service caches metadata requested by the Integration Service. ♦Cache PowerCenter metadata for the Repository Service. the Integration Service uses one byte to store each character. the Integration Service reads the cache for subsequent runs of the task rather than fetching the metadata from the repository. To recover from a workflow or session. This may decrease performance. Use native drivers to improve performance. In ASCII mode. Running the Integration Service in ASCII Data Movement Mode When all character data processed by the Integration Service is 7-bit ASCII or EBCDIC. When you enable repository agent caching. When you run the Integration Service in Unicode mode.♦Use native drivers instead of ODBC drivers for the Integration Service. ♦Run the Integration Service in ASCII data movement mode if character data is 7-bit ASCII or EBCDIC.