Professional Documents
Culture Documents
TABLE OF CONTENTS
ABOUT THE AUTHOR ............................................................................................................3 ABSTRACT .............................................................................................................................3 CONTENT OVERVIEW ...........................................................................................................3 1. LOOKUP - PERFORMANCE CONSIDERATIONS..........................................................3 1.1. 1.2. 1.3. 1.4. 1.5. 1.6. 1.7. 1.8. 2. 3. 4. 5. 2.1. UNWANTED COLUMNS .................................................................................................3 SIZE OF THE SOURCE VERSUS SIZE OF LOOKUP .............................................................3 JOIN INSTEAD OF LOOKUP ..........................................................................................4 CONDITIONAL CALL OF LOOKUP ....................................................................................4 SQL QUERY ...............................................................................................................4 INCREASE CACHE........................................................................................................4 CACHEFILE FILE-SYSTEM .............................................................................................4 USEFUL CACHE UTILITIES.............................................................................................5 SQL TUNING ..............................................................................................................6
WORKFLOW PERFORMANCE BASIC CONSIDERATIONS ......................................5 PRE/POST-SESSION COMMAND - USES .....................................................................6 SEQUENCE GENERATOR DESIGN CONSIDERATIONS ...........................................7 FTP CONNECTION OBJECT PLATFORM INDEPENDENCE .....................................7
Abstract
This article explains a few of the important development best practices, like lookups, workflow performance etc.
Content overview
Lookup - Performance considerations Workflow performance basic considerations Pre/Post-Session commands - Uses Sequence generator design considerations FTP Connection object platform independence
By default, when you create a lookup on a table, PowerCenter gives you all the columns in the table. If not all the columns are required for the lookup condition or return, delete the unwanted columns from the transformations. By not removing the unwanted columns, the cache size will increase. 1.2. Size of the source versus size of lookup
Let us say, you have 10 rows in the source and one of the columns has to be checked against a big table (1 million rows). Then PowerCenter builds the cache for the lookup table and then checks the 10 source rows against the cache. It takes more time to build the cache of 1 million rows than going to the database 10 times and lookup against the table directly.
Use uncached lookup instead of building the static cache, as the number of source rows is quite less than that of the lookup. 1.3. JOIN instead of Lookup
In the same context as above, if the Lookup transformation is after the source qualifier and there is no active transformation in-between, you can as well go for the SQL over ride of source qualifier and join traditionally to the lookup table using database joins, if both the tables are in the same database and schema. 1.4. Conditional call of lookup
Instead of going for connected lookups with filters for a conditional lookup call, go for unconnected lookup. Is the single column return bothering for this? Go ahead and change the SQL override to concatenate the required columns into one big column. Break them at the calling side into individual columns again. 1.5. SQL query
Find the execution plan of the Lookup SQL and see if you can add some indexes or hints to the query to make it fetch data faster. You may have to take the help of a database developer to accomplish this if you, yourself are not a SQLer. 1.6. Increase cache
If none of the above options provide performance enhancements, then the problem may lie with the cache. The cache that you assigned for the lookup is not sufficient to hold the data or index of the lookup. Whatever data that doesn't fit into the cache is spilt into the cache files designated in $PMCacheDir. When the PowerCenter doesn't find the data you are looking for in the cache, it swaps the data from the file to the cache and keeps doing this until the data is found. This is quite expensive being that this type of operation is very I/O intense. To stop this issue from occurring, increase the size of the cache so the entire data set resides in memory. When increasing the cache you also have to be aware of the system constraints. If your cache size is greater than the resources available, the session will fail due to the lack of resources. 1.7. Cachefile file-system
In many cases, if you have cache directory in a different file-system than that of the hosting server, the cache file piling up may take time and result in latency. So with the help of your system administrator try to look into this aspect as well.
1.8.
If the same lookup SQL is being used by another lookup, then shared cache or a reusable lookup should be used. Also, if you have a table where the data is not changed often, you can use the persist cache option to build the cache once and use it many times by consecutive flows.
2.1.
SQL tuning
Relational Source Qualifier Lookup SQL Override Stored Procedures Relational Target
Using the execution plan to tune a query is the best way to gain an understanding of how the database will process the data. Some things to keep in mind when reading the execution plan include: "Full Table Scans are not evil", "Indexes are not always fast", and Indexes can be slow too". Analyse the table data to see if picking up 20 records out of 20 million is best using index or using table scan. Fetching 10 records out of 15 using index is faster or using full table scan is easier. Many times the relational target indexes create performance problems when loading records into the relational target. If the indexes are needed for other purposes, it is suggested to drop the indexes at the time of loading and then rebuild them in post-SQL. When dropping indexes on a target you should consider integrity constraints and the time it takes to rebuild the index on post load vs. actual load time.
Fewer PowerCenter objects will be present in a mapping which reduces development time and also maintenance effort. ID generation is PowerCenter independent if a different application is used in future to populate the target. Migration between environments is simplified because there is no additional overhead of considering the persistent values of the sequence generator from the repository database.
In all of the above cases, a sequence created in the target database would make life lot easier for the table data maintenance and also for the PowerCenter development. In fact, databases will have specific mechanisms (focused) to deal with sequences and so you can implement manual Push-down optimization on your PowerCenter mapping design for yourself. DBAs will always complain about triggers on the databases, but I would still insist on using sequence-trigger combination for huge volumes of data as well.
Disclaimer: The information provided in this document is offered for general informational and educational purposes only. The views and opinions expressed in this article are the author's own and may not reflect the views and opinions of Credit Suisse.