Informatica PowerCenter Development Best Practices

......................................................................................................................... Unwanted columns............................................................1.........2................................................................................................................ JOIN instead of Lookup............................................................... Useful cache utilities..........................4 1.8 5... Workflow performance – basic considerations.................................7............Uses...........4 1.......... SQL tuning.......................................................4 1................................4 1........................5 2......... Pre/Post-Session command ...........3 1...5..............3 1........................6.............................................................. Cachefile file-system.1................................................................................... FTP Connection object – platform independence..................................................................TABLE OF CONTENTS Abstract..........................3 1............... Size of the source versus size of lookup.........3 1..................Performance considerations.......................... Increase cache.......................................................3 Content overview............................3....8................................................................4 2......................8 ...........................................................7 4........ SQL query........................................................4 1................................................................................................................... Lookup .........................................4........... Conditional call of lookup................ Sequence generator – design considerations...................................6 3..........................................

when you create a lookup on a table.Abstract This article explains a few of the important development best practices. you have 10 rows in the source and one of the columns has to be checked against a big table (1 million rows). A Lookup is an important and useful transformation when used effectively. If not all the columns are required for the lookup condition or return. Then PowerCenter builds the cache for the lookup table and then checks the 10 source rows against the cache.Uses Sequence generator – design considerations FTP Connection object –platform independence 1.Performance considerations What is a lookup transformation? It is just not another transformation that fetches you data to look up against source data.Performance considerations Workflow performance – basic considerations Pre/Post-Session commands . By not removing the unwanted columns. Content overview • • • • • Lookup . Unwanted columns By default.1. performance of your mapping will be severely impaired. the cache size will increase. Let us see the different scenarios where you can face problems with Lookup and also how to tackle them. 1. If used improperly. Use uncached lookup instead of building the static cache. Size of the source versus size of lookup Let us say. PowerCenter gives you all the columns in the table. Lookup . . delete the unwanted columns from the transformations. like lookups. 1. as the number of source rows is quite less than that of the lookup. It takes more time to build the cache of 1 million rows than going to the database 10 times and lookup against the table directly. workflow performance etc.2.

This is quite expensive being that this type of operation is very I/O intense. SQL query Find the execution plan of the Lookup SQL and see if you can add some indexes or hints to the query to make it fetch data faster.1. Useful cache utilities . 1. JOIN instead of Lookup In the same context as above. Conditional call of lookup Instead of going for connected lookups with filters for a conditional lookup call.7. yourself are not a SQLer.4. you can as well go for the SQL over ride of source qualifier and join traditionally to the lookup table using database joins.3. So with the help of your system administrators try to look into this aspect as well. Whatever data that doesn't fit into the cache is spilt into the cache files designated in $PMCacheDir. When the PowerCenter doesn't find the data you are looking for in the cache. 1. if you have cache directory in a different file-system than that of the hosting server. If your cache size is greater than the resources available. the session will fail due to the lack of resources. if the Lookup transformation is after the source qualifier and there is no active transformation in-between.6. Break them at the calling side into individual columns again. Increase cache If none of the above options provide performance enhancements. Cachefile file-system In many cases. if both the tables are in the same database and schema. When increasing the cache you also have to be aware of the system constraints. To stop this issue from occurring. The cache that you assigned for the lookup is not sufficient to hold the data or index of the lookup. You may have to take the help of a database developer to accomplish this if you.8. 1. go for unconnected lookup. 1. then the problem may lie with the cache. the cache file piling up may take time and result in latency. increase the size of the cache so the entire data set resides in memory. 1. Is the single column return bothering for this? Go ahead and change the SQL override to concatenate the required columns into one big column.5. it swaps the data from the file to the cache and keeps doing this until the data is found.

With the newer and newer versions of PowerCenter. I would avoid a separate pipe-line in the mapping that runs before the load with update-strategy transformation. sort the data on the join keys or group by columns prior to these transformations to decrease the processing time. I would create 3 different mappings and 3 different session instances and they all run in parallel in my workflow after my “Start” task. Use truncate option in the session properties. So let the server be completely busy. The above examples are just some things to consider when tuning a mapping. If you have a straight-through mapping which takes data from source and directly inserts all the records into the target. Filtering should be done at the database level instead within the mapping. 1. 6. though it adds a certain level of flexibility in the mapping. you wouldn’t need an update strategy. PowerCenter is built to work of high volumes of data. SQL tuning if databases are involved. Induce parallelism as far as possible into the mapping/workflow. I have few basic considerations to be made. if you have a table where the data is not changed often. Workflow performance – basic considerations Though performance tuning has been the most feared part of development. Use a pre-SQL delete statement if you wish to delete specific rows from target before loading into the target. The database engine is much more efficient in filtering than PowerCenter. Also. there is added flexibility for the developer to build better performing workflows.If the same lookup SQL is being used by another lookup. I would always suggest you to think twice before using an Update Strategy. but will make you act sensibly in different scenarios. I’ve observed that the workflow runtime comes down between 30-60% of serial processing. . if the intricacies are known. 3. 4. If using a transformation like a Joiner or Aggregator transformation. it is the easiest. 2. if you wish to clean the table before loading. Please note that these are not any rules-of-thumb. The major blocks for performance are the design of the mapping. 2. If the load is independent according to business requirement. 5. then shared cache or a reusable lookup should be used. Regarding the design of the mapping. you can use the persist cache option to build the cache once and use it many times by consecutive flows. You have 3 sources and 3 targets with one-on-one mapping.

SQL tuning SQL queries/actions occur in PowerCenter in one of the below ways. "Indexes are not always fast". actual load time. and “Indexes can be slow too". . Analyse the table data to see if picking up 20 records out of 20 million is best using index or using table scan. Many times the relational target indexes create performance problems when loading records into the relational target. • • • • Relational Source Qualifier Lookup SQL Override Stored Procedures Relational Target Using the execution plan to tune a query is the best way to gain an understanding of how the database will process the data. it is suggested to drop the indexes at the time of loading and then rebuild them in postSQL. When dropping indexes on a target you should consider integrity constraints and the time it takes to rebuild the index on post load vs.2. Some things to keep in mind when reading the execution plan include: "Full Table Scans are not evil".1. If the indexes are needed for other purposes. Fetching 10 records out of 15 using index is faster or using full table scan is easier.

Uses • It is a very good practice to email the success or failure status of a task. make use of the Post Session Success and Failure email for proper communication. once it is done. Pre/Post-Session command . you can setup a command to zip the file to save space. Any archiving activities around the source and target flat files can be easily managed within the session using the session properties for flat file command support that is new in PowerCenter v8. when a business requirement drives. • • • . I prefer taking trade-offs between PowerCenter capabilities and the OS capabilities in these scenarios. write a shell/batch command or script and call it in the Post-Session command task. In the same way. after writing the flat file target. For example. If you have any editing of data in the target flat files which your mapping couldn’t accommodate. etc. The built-in feature offers more flexibility with Session Logs as attachments and also provides other run-time data like Workflow run instance ID.3.6.

In fact. a sequence created in the target database would make life lot easier for the table data maintenance and also for the PowerCenter development.4. while populating an ID column in the relational target table. Sequence generator – design considerations In most of the cases. Migration between environments is simplified because there is no additional overhead of considering the persistent values of the sequence generator from the repository database. ID generation is PowerCenter independent if a different application is used in future to populate the target. . FTP Connection object – platform independence If you have any files to be read as source from Windows server when your PowerCenter server is hosted on UNIX/LINUX. There are many advantages to using a database sequence generator: • • • Fewer PowerCenter objects will be present in a mapping which reduces development time and also maintenance effort. databases will have specific mechanisms (focused) to deal with sequences and so you can implement manual Push-down optimization on your PowerCenter mapping design for yourself. 5. I would advice you to avoid the use of sequence generator transformation. In all of the above cases. This will further reduce the overhead of having SAMBA mounts on to the Informatica boxes. then make use of FTP users on the Windows server and use File Reader with FTP Connection object. I suggest you rather create a sequence on the target database and enable the trigger on that table to fetch the value from the database sequence. This connection object can be added as any other connection string. but I would still insist on using sequence-trigger combination for huge volumes of data as well. This gives the flexibility of platform independence. DBAs will always complain about triggers on the databases.

Sign up to vote on this title
UsefulNot useful

Master Your Semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master Your Semester with a Special Offer from Scribd & The New York Times

Cancel anytime.