You are on page 1of 29

What are the best mapping development practices and What are the

different mapping design tips for Informatica?

1. Lookup - Performance considerations

1.1. Unwanted columns

By default, when you create a lookup on a table, PowerCenter gives you all the columns in the table. If not all
the columns are required for the lookup condition or return, delete the unwanted columns from the
transformations. By not removing the unwanted columns, the cache size will increase.

1.2. Size of the source versus size of lookup

Let us say, you have 10 rows in the source and one of the columns has to be checked against a big table (1
million rows). Then PowerCenter builds the cache for the lookup table and then checks the 10 source rows
against the cache. It takes more time to build the cache of 1 million rows than going to the database 10 times
and lookup against the table directly. Use uncached lookup instead of building the static cache, as the number of
source rows is quite less than that of the lookup.
1.3. JOIN instead of Lookup

In the same context as above, if the Lookup transformation is after the source qualifier and there is no active
transformation in-between, you can as well go for the SQL over ride of source qualifier and join traditionally to
the lookup table using database joins, if both the tables are in the same database and schema.

1.4. Conditional call of lookup

Instead of going for connected lookups with filters for a conditional lookup call, go for unconnected lookup. Is
the single column return bothering for this? Go ahead and change the SQL override to concatenate the required
columns into one big column. Break them at the calling side into individual columns again.

1.5. SQL query

Find the execution plan of the Lookup SQL and see if you can add some indexes or hints to the query to make it
fetch data faster. You may have to take the help of a database developer to accomplish this if you, yourself are
not a SQL.
1.6. Increase cache

If none of the above options provide performance enhancements, then the problem may lie with the cache. The cache
that you assigned for the lookup is not sufficient to hold the data or index of the lookup. Whatever data that doesn't fit
into the cache is spilt into the cache files designated in $PMCacheDir. When the PowerCenter doesn't find the data you
are looking for in the cache, it swaps the data from the file to the cache and keeps doing this until the data is found. This
is quite expensive being that this type of operation is very I/O intense. To stop this issue from occurring, increase the
size of the cache so the entire data set resides in memory. When increasing the cache, you also have to be aware of the
system constraints. If your cache size is greater than the resources available, the session will fail due to the lack of

1.7. Cache file file-system

In many cases, if you have cache directory in a different file-system than that of the hosting server, the cache file piling
up may take time and result in latency. So with the help of your system administrator try to look into this aspect as well.

1.8. Useful cache utilities

If the same lookup SQL is being used by another lookup, then shared cache or a reusable lookup should be used. Also,
if you have a table where the data is not changed often, you can use the persist cache option to build the cache once and
use it many times by consecutive flows.
2. Workflow performance – basic considerations

Though performance tuning has been the most feared part of development, it is the easiest, if the intricacies are
known. With the newer and newer versions of PowerCenter, there is added flexibility for the developer to build
better performing workflows. The major blocks for performance are the design of the mapping, SQL tuning if
databases are involved.

1.I would always suggest you to think twice before using an Update Strategy, though it adds a certain level of
flexibility in the mapping. If you have a straight-through mapping which takes data from source and directly
inserts all the records into the target, you wouldn’t need an update strategy.

2. Use a pre-SQL delete statement if you wish to delete specific rows from target before loading into the target.
Use truncate option in the session properties, if you wish to clean the table before loading. I would avoid a
separate pipe-line in the mapping that runs before the load with update-strategy transformation.

3. You have 3 sources and 3 targets with one-on-one mapping. If the load is independent according to business
requirement, I would create 3 different mappings and 3 different session instances and they all run in parallel in
my workflow after my “Start” task. I’ve observed that the workflow runtime comes down between 30-60% of
serial processing.
4. PowerCenter is built to work of high volumes of data. So let the server be completely busy. Induce
parallelism as far as possible into the mapping/workflow.

5. If using a transformation like a Joiner or Aggregator transformation, sort the data on the join keys or group by
columns prior to these transformations to decrease the processing time.

6. Filtering should be done at the database level instead within the mapping. The database engine is much more
efficient in filtering than Power Centre.
2.1. SQL tuning

SQL queries/actions occur in PowerCenter in one of the below ways.

Relational Source Qualifier

Lookup SQL Override

Stored Procedures

Relational Target

Using the execution plan to tune a query is the best way to gain an understanding of how the database will process the data. Some
things to keep in mind when reading the execution plan include: "Full Table Scans are not evil", "Indexes are not always fast", and
“Indexes can be slow too".

Analyse the table data to see if picking up 20 records out of 20 million is best using index or using table scan. Fetching 10 records
out of 15 using index is faster or using full table scan is easier.

Many times, the relational target indexes create performance problems when loading records into the relational target. If the
indexes are needed for other purposes, it is suggested to drop the indexes at the time of loading and then rebuild them in post-SQL.
When dropping indexes on a target you should consider integrity constraints and the time it takes to rebuild the index on post load
vs. actual load time.
3. Pre/Post-Session command – Uses

• It is a very good practice to email the success or failure status of a task, once it is done. In the same way, when
a business requirement drives, make use of the Post Session Success and Failure email for proper

• The built-in feature offers more flexibility with Session Logs as attachments and also provides other run-time
data like Workflow run instance ID, etc.

• Any archiving activities around the source and target flat files can be easily managed within the session using
the session properties for flat file command support that is new in PowerCenter v8.6. For example, after writing
the flat file target, you can setup a command to zip the file to save space.

• If you have any editing of data in the target flat files which your mapping couldn’t accommodate, write a
shell/batch command or script and call it in the post-Session command task. I prefer taking trade-offs between
PowerCenter capabilities and the OS capabilities in these scenarios.
4. Sequence generator – design considerations

In most of the cases, I would advice you to avoid the use of sequence generator transformation, while populating
an ID column in the relational target table. I suggest you rather create a sequence on the target database and enable
the trigger on that table to fetch the value from the database sequence.

There are many advantages to using a database sequence generator:

Fewer PowerCenter objects will be present in a mapping which reduces development time and also maintenance

ID generation is PowerCenter independent if a different application is used in future to populate the target.

Migration between environments is simplified because there is no additional overhead of considering the
persistent values of the sequence generator from the repository database.

In all of the above cases, a sequence created in the target database would make life lot easier for the table data
maintenance and also for the PowerCenter development. In fact, databases will have specific mechanisms
(focused) to deal with sequences and so you can implement manual Push-down optimization on your PowerCenter
mapping design for yourself.
5. FTP Connection object – platform independence

If you have any files to be read as source from Windows server when your
PowerCenter server is hosted on UNIX/LINUX, then make use of FTP users on the
Windows server and use File Reader with FTP Connection object. This connection
object can be added as any other connection string. This gives the flexibility of
platform independence. This will further reduce the overhead of having SAMBA
mounts on to the Informatica boxes.
1. Use of Shared Folder for common/shared objects through different
In each of repositories, there is a DW_COMMON_OBJECTS folder. In this folder, it
is possible to place all the Sources, Targets and Transformations which are
common or for sharing between different integrations. For use these objects in
other repository folders we just need to create a shortcut based on the original.

Fig. 1 – Folder with common objects

Fig. 2 – Integration folder shortcuts
Fig. 3 – Use of shortcuts in mapping
Therefore, we can ensure that if we need to change one of these objects, we can do it in
DW_COMMON_OBJECTS, which automatically gets reflected in all mappings where the shortcut for
the object exists
2. Stop on Errors in sessions

In sessions configurations, the parameter “Stop on Errors” must always have the value 1.

Fig. 4 – Parameter location in sessions proprieties

3. Source Qualifiers and Lookups without SQL Override.

Native PWC transformations should always be used instead of Source Qualifier SQL override. Exceptions to
using it will be occasional instances where hints need to be used:

Best Practices in Informatica PowerCenter:

Below some of the best practices when it comes to conducting developments in Informatica PowerCenter.

However, if we look more broadly at these recommendations, it is easy to conclude that these practices can be
used in other tools, even in our everyday developments.

1. Use of Shared Folder for common/shared objects through different integrations

 In each of repositories, there is a DW_COMMON_OBJECTS folder. In this folder, it is possible to place all
the Sources, Targets and Transformations which are common or for sharing between different integrations.
For use these objects in other repository folders we just need to create a shortcut based on the original.
Fig. 1 – Folder with common objects
Fig. 2 – Integration folder shortcuts
Fig. 3 – Use of shortcuts in mapping

Therefore, we can ensure that if we need to change one of these objects, we can do it in
DW_COMMON_OBJECTS, which automatically gets reflected in all mappings where the shortcut for the object
 2. Stop on Errors in sessions

In sessions configurations, the parameter “Stop on Errors” must always have the value 1.

Fig. 4 – Parameter location in sessions proprieties

3. Source Qualifiers and Lookups without SQL Override.

Native PWC transformations should always be used instead of Source Qualifier SQL override. Exceptions to using
it will be occasional instances where hints need to be used:

I. Using Native PWC transformations allows to make use of PWC map partitioning.

II. Using Native PWC transformations allows to make PWC pushdown optimization.

The same logic must be applied when we use Lookups.

4. Data types and port accuracy

Linking between ports must always ensure the same type of data and accuracy. If data type and precision
transformation are required, PWC Expression transformations should be used.
Fig. 5 – Expression Transformation Object
Fig. 6 – Expression Transformation Location on the PWC Designer Toolbar
5. Complex Mappings

Mappings with more than 40 transformations must be broken within the same mapping or made into more than
one mapping.

6. Comments and Descriptions

In mappings, sessions, maplets and workflows, comments or descriptions must be filled in regarding the
purpose of the object as well as alerts or particulars. The same logic must apply to objects within a mapping.

7. Rename the transformations

All transformations in a mapping must be assigned the default name assigned by the PWC.
8. Commit Interval of Sessions

The Commit Interval should not be the default of 10000. The developer should take some time to realise if this value should be
incremented or not. To do this, some mapping runs have to be done to understand its behaviour (execution time increases or decreases).

9. Session Memory Parameters

Session Memory Parameters should never have the maximum memory value parameter to use at 0. The following image shows the
default, and an adjustment to the memory value to be used should be made only when the mapping execution performance is not

10. Lookup Ports and Expression Transformation

Only ports that are needed to proceed with the mapping should be marked as the output. Ports that are marked with output when they
aren´t, they occupy cache, thus slowing the mapping in their execution.

11. Pushdown Optimization

This option should be used whenever possible. Its use implies best practice mapping design, namely, using PowerCenter objects instead
of Source Qualifier SQL Override (should be used in specific and very well justified cases), as well as using procedures.

When there are already a considerable number of mappings to make use of this option, DW support should request from DBAS to
monitor the use of to redefine DB priorities and map execution orders concurrently.
 To enable this option, it is mandatory to perform the following steps in the session:

Fig.7 – Required options to enable pushdown optimization.

Sequences are optional depending on whether mappings make use of it.

Fig. 8 – Check the execution plan
Fig. 9 – A well-implemented mapping is when the pushdown
execution plan is full.

You might also like