Professional Documents
Culture Documents
Author:
D Radhakrishna Sarma
Singapore.
January 2, 2009
1
TABLE OF CONTENTS
ABOUT THE AUTHOR ............................................................................................................3
ABSTRACT .............................................................................................................................3
CONTENT OVERVIEW ...........................................................................................................3
1. LOOKUP - PERFORMANCE CONSIDERATIONS..........................................................3
1.1. UNWANTED COLUMNS .................................................................................................3
1.2. SIZE OF THE SOURCE VERSUS SIZE OF LOOKUP .............................................................3
1.3. JOIN INSTEAD OF LOOKUP ..........................................................................................4
1.4. CONDITIONAL CALL OF LOOKUP ....................................................................................4
1.5. SQL QUERY ...............................................................................................................4
1.6. INCREASE CACHE........................................................................................................4
1.7. CACHEFILE FILE-SYSTEM .............................................................................................4
1.8. USEFUL CACHE UTILITIES.............................................................................................5
2. WORKFLOW PERFORMANCE – BASIC CONSIDERATIONS ......................................5
2.1. SQL TUNING ..............................................................................................................6
3. PRE/POST-SESSION COMMAND - USES .....................................................................6
4. SEQUENCE GENERATOR – DESIGN CONSIDERATIONS ...........................................7
5. FTP CONNECTION OBJECT – PLATFORM INDEPENDENCE .....................................7
2
About the author
Radhakrishna Sarma is an ETL specialist in Credit Suisse bank in Singapore.
He has been developing ETL interfaces and working on databases, since
PowerCenter version 6.x. He is an active participant in the technical forums –
Informatica’s – TechNet and Oracle’s – OTN.
Abstract
This article explains a few of the important development best practices, like
lookups, workflow performance etc.
Content overview
Let us see the different scenarios where you can face problems with Lookup
and also how to tackle them.
By default, when you create a lookup on a table, PowerCenter gives you all the
columns in the table. If not all the columns are required for the lookup
condition or return, delete the unwanted columns from the transformations.
By not removing the unwanted columns, the cache size will increase.
Let us say, you have 10 rows in the source and one of the columns has to be
checked against a big table (1 million rows). Then PowerCenter builds the
cache for the lookup table and then checks the 10 source rows against the
cache. It takes more time to build the cache of 1 million rows than going to the
database 10 times and lookup against the table directly.
3
Use uncached lookup instead of building the static cache, as the number of
source rows is quite less than that of the lookup.
In the same context as above, if the Lookup transformation is after the source
qualifier and there is no active transformation in-between, you can as well go
for the SQL over ride of source qualifier and join traditionally to the lookup
table using database joins, if both the tables are in the same database and
schema.
Instead of going for connected lookups with filters for a conditional lookup
call, go for unconnected lookup. Is the single column return bothering for
this? Go ahead and change the SQL override to concatenate the required
columns into one big column. Break them at the calling side into individual
columns again.
Find the execution plan of the Lookup SQL and see if you can add some
indexes or hints to the query to make it fetch data faster. You may have to take
the help of a database developer to accomplish this if you, yourself are not a
SQLer.
In many cases, if you have cache directory in a different file-system than that
of the hosting server, the cache file piling up may take time and result in
latency. So with the help of your system administrator try to look into this
aspect as well.
4
1.8. Useful cache utilities
If the same lookup SQL is being used by another lookup, then shared cache or
a reusable lookup should be used. Also, if you have a table where the data is
not changed often, you can use the persist cache option to build the cache
once and use it many times by consecutive flows.
The above examples are just some things to consider when tuning a mapping.
5
2.1. SQL tuning
Using the execution plan to tune a query is the best way to gain an
understanding of how the database will process the data. Some things to keep
in mind when reading the execution plan include: "Full Table Scans are
not evil", "Indexes are not always fast", and “Indexes can be slow
too".
Analyse the table data to see if picking up 20 records out of 20 million is best
using index or using table scan. Fetching 10 records out of 15 using index is
faster or using full table scan is easier.
Many times the relational target indexes create performance problems when
loading records into the relational target. If the indexes are needed for other
purposes, it is suggested to drop the indexes at the time of loading and then
rebuild them in post-SQL. When dropping indexes on a target you should
consider integrity constraints and the time it takes to rebuild the index on post
load vs. actual load time.
6
4. Sequence generator – design considerations
In most of the cases, I would advice you to avoid the use of sequence generator
transformation, while populating an ID column in the relational target table. I
suggest you rather create a sequence on the target database and enable the
trigger on that table to fetch the value from the database sequence.
In all of the above cases, a sequence created in the target database would make
life lot easier for the table data maintenance and also for the PowerCenter
development. In fact, databases will have specific mechanisms (focused) to
deal with sequences and so you can implement manual Push-down
optimization on your PowerCenter mapping design for yourself.
DBAs will always complain about triggers on the databases, but I would still
insist on using sequence-trigger combination for huge volumes of data as well.
This connection object can be added as any other connection string. This gives
the flexibility of platform independence. This will further reduce the over-
head of having SAMBA mounts on to the Informatica boxes.