Professional Documents
Culture Documents
The first step in performance tuning is to identify performance bottlenecks. Performance bottlenecks
can occur in the source and target, the mapping, the session, and the system.
When a PowerCenter session is triggered, integration service start Data Transformation Manager (DTM),
which is responsible to start reader thread, transformation thread and writer thread.
Reader thread is responsible to read data from the sources. Transformation threads process data
according to the transformation logic in the mapping and writer thread connects to the target and loads
the data. Any data processing delay in these threads leads to a performance issue.
Source Bottlenecks
Performance bottlenecks can occur when the Integration Service reads from a source database.
Slowness in reading data from the source leads to delay in filling enough data into DTM buffer. So the
transformation and writer threads wait for data. This delay causes the entire session to run slower.
Inefficient query or small database network packet sizes can cause source bottlenecks.
Identify bottleneck in Source :
If source is a relational table, put a filter transformation in the mapping, just after source qualifier; make
the condition of filter to FALSE. So all records will be filtered off and none will proceed to other parts of
the mapping.In original case, without the test filter, total time taken is as follows:-
So if source was fine, then in the latter case, session should take less time. Still if the session takes near
equal time as former case, then there is a source bottleneck.
Target Bottlenecks
When target bottleneck occurs, writer thread will not be able to free up space for reader and
transformer threads, until the data is written to the target. So the the reader and transformer threads to
wait for free blocks. This causes the entire session to run slower.
Small database checkpoint intervals, small database network packet sizes, or problems during
heavy loading operations can cause target bottlenecks.
Identify bottleneck in Target
If the target is a relational table, then substitute it with a flat file and run the session. If the time taken
now is very much less than the time taken for the session to load to table, then the target table is the
bottleneck.
Mapping Bottlenecks
A complex mapping logic or a not well written mapping logic can lead to mapping bottleneck. With
mapping bottleneck, transformation thread runs slower causing the reader thread to wait for free blocks
and writer thread to wait blocks filled up for writing to target.
Session Bottlenecks
If you do not have a source, target, or mapping bottleneck, you may have a session bottleneck.
Session bottleneck occurs normally when you have the session memory configuration is not turned
correctly.
This in turn leads to a bottleneck on the reader, transformation or writer thread. Small cache size, low
buffer memory, and small commit intervals can cause session bottlenecks.
System Bottlenecks
After you tune the source, target, mapping, and session, consider tuning the system to prevent system
bottlenecks.
The Integration Service uses system resources to process transformations, run sessions, and read and
write data.
The Integration Service also uses system memory to create cache files for transformations, such as
Aggregator, Joiner, Lookup, Sorter, XML, and Rank.
Thread statics gives run time information from all the three threads; reader, transformation and writer
thread. The session log provides enough run time thread statistics to help us understand and pinpoint
the performance bottleneck.
You can get thread statistics from the session long file. When you run a session, the session log file lists
run time information and thread statistics with below details.
You can get thread statistics from the session long file. When you run a session, the session log file lists
run time information and thread statistics with below details.
Idle Time : Amount of time the thread is idle. Includes the time the thread waits for other thread
processing.
Busy Time : Percentage of the run time. It is (run time - idle time) / run time x 100.
Thread Work Time : The percentage of time taken to process each transformation in a thread.
Note : Session Log file with normal tracing level is required to get the thread statistics.Amount of time
the thread runs.
If a session joins multiple source tables in one Source Qualifier, you might be able to improve
performance by optimizing the query with optimizing hints.
The database administrator (DBA) can create optimizer hints to tell the database how to execute the
query for a particular set of source tables.
If you read from Oracle, Sybase ASE, or Microsoft SQL Server sources, you can improve the performance
by increasing the network packet size. Increase the network packet size to allow larger packets of data to
cross the network at one time.
When bulk loading, the Integration Service bypasses the database log, which speeds performance.
Without writing to the database log, however, the target database cannot perform rollback. As a result,
you may not be able to perform recovery.
To increase session performance, configure PowerCenter to use an external loader for the following
types of target databases. External loader can be used for Oracle, DB2, Sybase and Teradata.
When you define key constraints or indexes in target tables, you slow the loading of data to those tables.
To improve performance, drop indexes and key constraints before you run the session.
You can rebuild those indexes and key constraints after the session completes.
If you write to Oracle, Sybase ASE, or Microsoft SQL Server targets, you can improve the performance by
increasing the network packet size.
Increase the network packet size to allow larger packets of data to cross the network at one time.
Optimizing the Mappings
Generally, you reduce the number of transformations in the mapping and delete unnecessary links
between transformations to optimize the mapping.
Configure the mapping with the least number of transformations and expressions to do the most
amount of work possible.
Delete unnecessary links between transformations to minimize the amount of data moved.
You can increase performance by eliminating unnecessary datatype conversions. For example, if a
mapping moves data from an Integer column to a Decimal column, then back to an Integer column, the
unnecessary datatype conversion slows performance. Where possible, eliminate unnecessary datatype
conversions from mappings.
2. Optimizing Expressions :
Minimizing Aggregate Function Calls : When writing expressions, factor out as many aggregate function
calls as possible. Each time you use an aggregate function call, the Integration Service must search and
group the data. For example SUM(COL_A + COL_B) performs better than SUM(COL_A) + SUM(COL_B)
Replacing Common Expressions with Local Variables : If you use the same expression multiple times in
one transformation, you can make that expression a local variable by using VARIABLE PORT
Choosing Numeric Versus String Operations : The Integration Service processes numeric operations
faster than string operations. For example, if you look up large amounts of data on two columns,
EMPLOYEE_NAME and EMPLOYEE_ID, configuring the lookup around EMPLOYEE_ID improves
performance.
Using Operators Instead of Functions : The Integration Service reads expressions written with operators
faster than expressions with functions. Where possible, use operators to write expressions.
If your logic contains multiple Informatica Inbuilt function (IIF) functions, try replacing them using
DECODE. The decode function is faster compared to multiple IIFs.
The integration service processes operators faster compared to functions. For example, consider using
|| (pipe) in place of the CONCAT function to concatenate the data.
Passing unnecessary data through the mapping will hamper the performance. It is recommended that if
you do not need certain data, disable the output port of the column so that you don't pass the data.
Optimizing Transformations :
1. Caching the lookup table: When caching is enabled the Informatica server caches the lookup table
and queries the cache during the session. When this option is not enabled the server queries the lookup
table on a row-by row basis.
If your mapping contains multiple lookups that look up on the same lookup table, it is suggested you
share the cache in order to avoid performing caching multiple times.
2. Optimizing the lookup condition : whenever multiple conditions are placed, the condition with
equality sign should take precedence.
3. Lookup Override : You can reduce the processing time if you use lookup SQL override properly in the
lookup transformation. If you are using lookup to look up on the database table, you can use the lookup
SQL override to reduce the amount of data that you look up. This also helps in saving the cache space.
4. Indexing the lookup table : The cached lookup table should be indexed on order by columns.
The session log contains the ORDER BY statement
The un-cached lookup since the server issues a SELECT statement for each row passing
into lookup transformation, it is better to index the lookup table on the columns in the
condition.
5.Replace large lookup tables with joins in the Source Qualifier when possible ( take always less number
of records table as a lookup table)
Use source qualifier to filter the data. You can also use source qualifier SQL override to filter the records,
instead of using filter transformation.
Bring only the required columns from the source to the source qualifier. Most of the times not all the
columns of the source table are required, so bring only the required fields by deleting the unnecessary
columns.
Avoid using order by clause inside the source qualifier SQL override. The order by clause requires
additional processing and performance can be increased by avoiding it.
Optimize Aggregate transformation:
1. Group by simpler columns. Preferably numeric columns.
2. Use Sorted input. The sorted input decreases the use of aggregate caches. The server
assumes all input data are sorted and as it reads it performs aggregate calculations.
5.Limit the no of ports used in the aggregator transformation. This will reduce the volume of data that
aggregator transformation stores inside the cache.
1. Try creating a reusable Seq. Generator transformation and use it in multiple mappings
2. The number of cached value property determines the number of values the Informatica
server caches at one time.
1.It is recommended to assign the table with lesser number of records as master while using joiner
transformation.
2.It is also recommended to perform joining in the source qualifier using SQL override as performing
joins on the database is sometimes faster compared to performing in Informatica.
3.Additionally, pass the sorted data to joiner transformation to enhance the performance as this utilizes
less disk space compared to unsorted data.