You are on page 1of 11

PERFORMANCE TUNINING IN INFORMATICA

The first step in performance tuning is to identify performance bottlenecks. Performance bottlenecks
can occur in the source and target, the mapping, the session, and the system.

Data Transformation Manager (DTM) :

When a PowerCenter session is triggered, integration service start Data Transformation Manager (DTM),
which    is responsible to start reader thread, transformation thread and writer thread.

Reader thread is responsible to read data from the sources. Transformation threads process data
according to the transformation logic in the mapping and writer thread connects to the target and loads
the data. Any data processing delay in these threads leads to a performance issue.

Source Bottlenecks

Performance bottlenecks can occur when the Integration Service reads from a source database.

Slowness in reading data from the source leads to delay in filling enough data into DTM buffer. So the
transformation and writer threads wait for data.    This delay causes the entire session to run slower.     

Inefficient query or small database network packet sizes can cause source bottlenecks.
Identify bottleneck in Source :

If source is a relational table, put a filter transformation in the mapping, just after source qualifier; make
the condition of filter to FALSE. So all records will be filtered off and none will proceed to other parts of
the mapping.In original case, without the test filter, total time taken is as follows:-

Total Time = time taken by (source + transformations + target load)

Now because of filter, Total Time = time taken by source

So if source was fine, then in the latter case, session should take less time. Still if the session takes near
equal time as former case, then there is a source bottleneck.

Target Bottlenecks

When target bottleneck occurs, writer thread will not be able to free up space for reader and
transformer threads, until the data is written to the target. So the the reader and transformer threads to
wait for free blocks. This causes the entire session to run slower.

Small database checkpoint intervals, small database network packet sizes, or problems during
heavy loading operations can cause target bottlenecks.
Identify bottleneck in Target

If the target is a relational table, then substitute it with a flat file and run the session. If the time taken
now is very much less than the time taken for the session to load to table, then the target table is the
bottleneck.

Mapping Bottlenecks

A complex mapping logic or a not well written mapping logic can lead to mapping bottleneck. With
mapping bottleneck, transformation thread runs slower causing the reader thread to wait for free blocks
and writer thread to wait blocks filled up for writing to target.
Session Bottlenecks

If you do not have a source, target, or mapping bottleneck, you may have a session bottleneck.

Session bottleneck occurs normally when you have the session memory configuration is not turned
correctly.

This in turn leads to a bottleneck on the reader, transformation or writer thread.    Small cache size, low
buffer memory, and small commit intervals can cause session bottlenecks.

System Bottlenecks

After you tune the source, target, mapping, and session, consider tuning the system to prevent system
bottlenecks.

The Integration Service uses system resources to process transformations, run sessions, and read and
write data.

The Integration Service also uses system memory to create cache files for transformations, such as
Aggregator, Joiner, Lookup, Sorter, XML, and Rank.

Source, Target & Mapping Bottlenecks Using Thread Statistics

Thread statics gives run time information from all the three threads; reader, transformation and writer
thread. The session log provides enough run time thread statistics to help us understand and pinpoint
the performance bottleneck.

Gathering Thread Statistics

You can get thread statistics from the session long file. When you run a session, the session log file lists
run time information and thread statistics with below details.

Gathering Thread Statistics

You can get thread statistics from the session long file. When you run a session, the session log file lists
run time information and thread statistics with below details.

Run Time : Amount of time the thread runs.

Idle Time : Amount of time the thread is idle. Includes the time the thread waits for other thread
processing.

Busy Time : Percentage of the run time. It is (run time - idle time) / run time x 100.

Thread Work Time : The percentage of time taken to process each transformation in a thread.

Note : Session Log file with normal tracing level is required to get the thread statistics.Amount of time
the thread runs.

Optimizing the Source : INCREASE THE PERFORMANCE OF A SOURCE.

1. Optimizing the Query

If a session joins multiple source tables in one Source Qualifier, you might be able to improve
performance by optimizing the query with optimizing hints.

The database administrator (DBA) can create optimizer hints to tell the database how to execute the
query for a particular set of source tables.

2. Increasing Database Network Packet Size

If you read from Oracle, Sybase ASE, or Microsoft SQL Server sources, you can improve the performance
by increasing the network packet size. Increase the network packet size to allow larger packets of data to
cross the network at one time.

Optimizing the Target :

1. Using Bulk Loads :


You can use bulk loading to improve the performance of a session that inserts a large amount of data
into a DB2, Sybase ASE, Oracle, or Microsoft SQL Server database.

When bulk loading, the Integration Service bypasses the database log, which speeds performance.

Without writing to the database log, however, the target database cannot perform rollback. As a result,
you may not be able to perform recovery.

2. Using External Loaders

To increase session performance, configure PowerCenter to use an external loader for the following
types of target databases. External loader can be used for Oracle, DB2, Sybase and Teradata.

           

3. Dropping Indexes and Key Constraints

When you define key constraints or indexes in target tables, you slow the loading of data to those tables.

To improve performance, drop indexes and key constraints before you run the session.

You can rebuild those indexes and key constraints after the session completes.

4.Increasing Database Network Packet Size

If you write to Oracle, Sybase ASE, or Microsoft SQL Server targets, you can improve the performance by
increasing the network packet size.

Increase the network packet size to allow larger packets of data to cross the network at one time.
Optimizing the Mappings

Generally, you reduce the number of transformations in the mapping and delete unnecessary links
between transformations to optimize the mapping.

Configure the mapping with the least number of transformations and expressions to do the most
amount of work possible.

Delete unnecessary links between transformations to minimize the amount of data moved.

1. Optimizing Datatype Conversions :

You can increase performance by eliminating unnecessary datatype conversions. For example, if a
mapping moves data from an Integer column to a Decimal column, then back to an Integer column, the
unnecessary datatype conversion slows performance. Where possible, eliminate unnecessary datatype
conversions from mappings.

2. Optimizing Expressions :

Minimizing Aggregate Function Calls : When writing expressions, factor out as many aggregate function
calls as possible. Each time you use an aggregate function call, the Integration Service must search and
group the data. For example SUM(COL_A + COL_B) performs better than SUM(COL_A) + SUM(COL_B)

Replacing Common Expressions with Local Variables : If you use the same expression multiple times in
one transformation, you can make that expression a local variable by using VARIABLE PORT

Choosing Numeric Versus String Operations : The Integration Service processes numeric operations
faster than string operations. For example, if you look up large amounts of data on two columns,
EMPLOYEE_NAME and EMPLOYEE_ID, configuring the lookup around EMPLOYEE_ID improves
performance.

Using Operators Instead of Functions : The Integration Service reads expressions written with operators
faster than expressions with functions. Where possible, use operators to write expressions.

If your logic contains multiple Informatica Inbuilt function (IIF) functions, try replacing them using
DECODE. The decode function is faster compared to multiple IIFs.

The integration service processes operators faster compared to functions. For example, consider using
|| (pipe) in place of the CONCAT function to concatenate the data.

Passing unnecessary data through the mapping will hamper the performance. It is recommended that if
you do not need certain data, disable the output port of the column so that you don't pass the data.
Optimizing Transformations :

Optimize Lookup transformation :

1. Caching the lookup table: When caching is enabled the Informatica server caches the lookup table
and queries the cache during the session. When this option is not enabled the server queries the lookup
table on a row-by row basis.

If your mapping contains multiple lookups that look up on the same lookup table, it is suggested you
share the cache in order to avoid performing caching multiple times.

2. Optimizing the lookup condition :    whenever multiple conditions are placed, the condition with
equality sign should take precedence.

3. Lookup Override : You can reduce the processing time if you use lookup SQL override properly in the
lookup transformation. If you are using lookup to look up on the database table, you can use the lookup
SQL override to reduce the amount of data that you look up. This also helps in saving the cache space.

Suppress ORDER BY statement by appending two dashes (--)

Remove all ports not used downstream or in the SQL Override

4. Indexing the lookup table :    The cached lookup table should be indexed on order by            columns.
The session log contains    the ORDER BY statement
The un-cached lookup since the server issues a SELECT statement for each row passing
into lookup transformation, it is better to index the lookup table on the columns in the
condition.

5.Replace large lookup tables with joins in the Source Qualifier when possible ( take always less number
of records table as a lookup table)

Optimize Filter transformation:


You can improve the efficiency by filtering early in the data flow.

Use source qualifier to filter the data. You can also use source qualifier SQL override to filter the records,
instead of using filter transformation.

Replace multiple filters with a router

Source Qualifier Transformation:

Bring only the required columns from the source to the source qualifier. Most of the times not all the
columns of the source table are required, so bring only the required fields by deleting the unnecessary
columns.

Avoid using order by clause inside the source qualifier SQL override. The order by clause requires
additional processing and performance can be increased by avoiding it.
Optimize Aggregate transformation:
1. Group by simpler columns. Preferably numeric columns.
2.    Use Sorted input. The sorted input decreases the use of aggregate caches. The server               
assumes all input data are sorted and as it reads it performs aggregate calculations.

3.Use as early as possible

4.Filter data before aggregating

5.Limit the no of ports used in the aggregator transformation. This will reduce the volume of data that
aggregator transformation stores inside the cache.

. Minimize aggregate function calls.

Optimize Seq. Generator transformation:

1. Try creating a reusable Seq. Generator transformation and use it in multiple mappings
2. The number of cached value property determines the number of values the Informatica
          server caches at one time.

Optimize Expression transformation:

1. Minimize aggregate function calls.


2. Replace common sub-expressions with local variables(Variable ports)
3. Use operators instead of functions.

Optimizing joiner transformation :

1.It is recommended to assign the table with lesser number of records as master while using joiner
transformation.

2.It is also recommended to perform joining in the source qualifier using SQL override as performing
joins on the database is sometimes faster compared to performing in Informatica.

3.Additionally, pass the sorted data to joiner transformation to enhance the performance as this utilizes
less disk space compared to unsorted data.

4.Perform normal joins when possible

 Common sources of problems


 too many transforms
 unused links between ports
 too many input/output or outputs ports connected out of aggregator, ranking, lookup
transformations
 unnecessary data-type conversions
 Common solutions
 eliminate transformation errors   
 if several mappings read from the same source, try single pass reading
 optimize datatypes, use integers for comparisons.
 don’t convert back and forth between datatypes
 optimize lookups and lookup tables, using cache and indexing tables
 put your filters early in the data flow, use a simple filter condition
 for aggregators, use sorted input, integer columns to group by and simplify expressions
 if you use reusable sequence generators, increase number of cached values
 if you use the same logic in different data streams, apply it before the streams branch off
 optimize expressions:
 isolate slow and complex expressions
 reduce or simplify aggregate functions
 use local variables to encapsulate repeated computations
 integer computations are faster than character computations
 use operators rather that the equivalent function, ‘||’ faster than CONCAT().

You might also like