Informatica Map/Session Tuning Covers basic, intermediate, and advanced tuning practices.

(by: Dan Linstedt)
Table of Contents
• • •

Basic Guidelines Intermediate Guidelines Advanced Guidelines

The following points are high-level issues on where to go to perform "tuning" in Informatica's products. These are NOT permanent instructions, nor are they the end-all solution. Just some items (which if tuned first) might make a difference. The level of skill available for certain items will cause the results to vary. To 'test' performance throughput it is generally recommended that the source set of data produce about 200,000 rows to process. Beyond this - the performance problems / issues may lie in the database - partitioning tables, dropping / recreating indexes, striping raid arrays, etc... Without such a large set of results to deal with, you're average timings will be skewed by other users on the database, processes on the server, or network traffic. This seems to be an ideal test size set for producing mostly accurate averages. Try tuning your maps with these steps first. Then move to tuning the session, iterate this sequence until you are happy, or cannot achieve better performance by continued efforts. If the performance is still not acceptable,. then the architecture must be tuned (which can mean changes to what maps are created). In this case, you can contact us - we tune the architecture and the whole system from top to bottom. KEEP THIS IN MIND: In order to achieve optimal performance, it's always a good idea to strike a balance between the tools, the database, and the hardware resources. Allow each to do what they do best. Varying the architecture can make a huge difference in speed and optimization possibilities. 1. Utilize a database (like Oracle / Sybase / Informix / DB2 etc...) for significant data handling operations (such as sorts, groups, aggregates). In other words, staging tables can be a huge benefit to parallelism of operations. In parallel design - simply defined by mathematics, nearly always cuts your execution time. Staging tables have many benefits. Please see the staging table discussion in the methodologies section for full details. 2. Localize. Localize all target tables on to the SAME instance of Oracle (same SID), or same instance of Sybase. Try not to use Synonyms (remote database links) for anything (including: lookups, stored procedures, target tables, sources, functions, privileges, etc...). Utilizing remote links will most certainly slow things

setting it to NORMAL logging mode. 9. Hopefully Informatica will speed this up in the future. views.this should save you lot's of hours tuning. This slowness is not easily debugged it can only be spotted in the Write Throughput column. If your source is a flat file . 5. sequences in the SOURCE database. 7. It broke speed from 1500+ rows per second without the module . down. No other sessions were running. BCP. but does NOT get along (performance wise) with ANY other engine (reporting engine. it may be necessary to have this turned on DURING your tuning exercise. Remember that Informatica suggests that each session takes roughly 1 to 1 1/2 CPU's. SED. followed by a batch operation to assign sequences is the fastest method for utilizing shared sequence generators. Synonyms (remote database tables) could potentially affect performance by as much as a factor of 3 times or more. For Sybase users. (This was a SPECIFIC case with a SPECIFIC map .you can also use SQL*Loader. IF YOU MUST . then build a staging table from the flat file. Removing this operation reduces reliance on the flat file operations. The session log has a tremendous impact on the overall performance of the map..although minimal at times . If you're dealing with GIG's or Terabytes of information . Force over-ride in the session. It can reveal a lot about the speed of the reader. This also has an impact . java virtual machine. However. and call a POST TARGET LOAD stored procedure to populate that column. Unfortunately the logging mechanism is not "parallel" in the internal core. try not to connect across synonyms. 4 CPU's 2 GIGS RAM. A single call to inside the database. This way . Utilizing these stored procedures has caused performance to drop by a factor of 3 times. Again. java engine. Place the post target load procedure in to the flat file to staging table load map.3..this is how fast you COULD run your map. This requires a wrapper function / stored procedure call. Remove external registered modules.localize all target's not like this for all maps). stored procedures. etc. OLAP engine. functions. Turn off 'collect performance statistics'. AWK. Oracle 8i and Informatica). Place basic logic in . or some other database Bulk-Load utility.Informatica play's well with RDBMS engines on the same machine. If you must use a database generated sequence number. 6. remote mounting of databases can definitely be a hindrance to performance. If you can . it is embedded directly in to the operations.have a shared sequence generator. replace the stored proc call with an internal sequence generator for a test run .it writes a series of performance data to the performance log.utilize a staging table (see the staging table slides in the presentations section of this web site). In keeping with this .) Remove any database based sequence generators. TURN OFF VERBOSE LOGGING. The external module which exhibits speed problems is the regular expression module (Unix: Sun Solaris 486 rows per second with the module. and writer threads. 8. The Application Programmers Interface (API) which calls externals is inherently slow (as of: 1/1/2000). GREP instead. then follow the instructions for the staging table usage. Perform pre-processing / postprocessing utilizing PERL. Copy the map. add a SEQUENCE ID column. 4.

utilize staging tables if possible. medium. however the throughput will be proportionate to the degree of parallelism that is applied. which operations will sort. but Informatica seems to top out. Utilize a PERL script to generate "fake" data for small. and TEMP space on your PMSERVER machine. This does not degrade from the use of the ETL tool. where internal disk continues to be much faster. large.. Don't be afraid to estimate: small. the DBA can watch or monitor throughput as a real load size occurs. (order by). remove all potential lookups from the code. turnaround time for load. Try to utilize the DBMS for what it was built for: reading/writing/sorting/grouping/filtering data en-masse. 13. Here .in this manner. Particularly if the lookup table is also a "growing" or "updated" target table . The balancing act is difficult without DBA knowledge. etc. Help them assess which tables are expected to be high read/high write. and extra large source data set sizes (in terms of: numbers of rows. and the optimizer looses track of the index statistics. large. medium. In order to achieve a balance. Otherwise you may not get a . Try not to read a file across the network. Re-arrange the architecture if necessary to allow for parallel processing. average number of bytes per row). or Informatica's joiner object can be used to join data together . Not having enough disk space could potentially slow down your entire server during processing (in an exponential fashion).the source load map.either one will help dramatically increase speed.try to break the maps out in to logical threaded sections of processing. expected throughput for each. it will limit your read throughput (this only needs to be set if the sessions have a demonstrated problems with constraint based loads) 2) Move the flat file to local internal disk (if at all possible). then check two things: 1) if you have an item in your registry or configuration file which sets the "ThrottleReader" to a specific maximum number of blocks. and extra large data sets. Moving disks. views in the database can be built which join the data together. you must be able to recognize what operations are best in the database. 12. At this point .a link will NOT work to increase speed . or from a RAID device. data integration. In utilizing staging's a MUST if you are performance tuning for high-volume throughput. 10.if your reader is slow. Try to eliminate the use of non-cached must be the full file itself . Run each of these through your mappings . please see this discussion for further details. multiple source feeds. Again . Use Informatica for the more complex logic.. rather it enhances it . and which ones are best in Informatica. TUNE the DATABASE. is it a trickle feed? Give this information to your DBA's and ask them to tune the database for "wost case". There may be more smaller components doing individual tasks.. outside joins. Balance between Informatica and the power of SQL and the database. Most RAID array's are fast. 14. By issuing a non-cached lookup. Sometimes this means watching the disk space as while your session runs. 11. A discussion on HOW to perform this task is posted on the methodologies page. assigning the right table to the right disk space could make all the difference.this generally means the indexes are changing during operation. etc. Be sure there is enough SWAP. you're performance will be impacted significantly. BALANCE.. Separate complex maps .stored locally.

Check the "ThrottleReader" setting (which is not configured by default). then you've got some basic map tuning to do. Once the "slower" of the N targets is discovered. then stabilize (after a few thousand rows). Try to follow the recommendations . Basically what you should try to achieve is: OPTIMAL READ. talk to your DBA about partitioning the table. Change the map target back to the database targets run the session again. The best method to tune a problematic turning on "Collect Performance Statistics" you can get a good feel for what needs to be set in the session . see what the settings are. SESSION SETTINGS. it appears to be extremely fast.or if you have a JOINER object with heterogeneous sources. make note of how much the reader slows down. PMServer plays well with RDBMS (relational database management system) . and where the hot spots are. change the map to a single target table at a time see which target is causing the "slowness" and tune it. Make copies of the original map. make note of the session settings. If you have a slow writer. Except for the database / staging database or Data Warehouse itself. and don't want the writer threads to slow you down.but doesn't play well with application . or lookups that flow to disk Cache directory . Look in to EMC's disk storage array . Over-tuning one of these three pieces can result in ultimately slowing down your session. This time . likewise. Etc.ignore the warning above 128k. Read the performance section carefully in the Informatica manuals. This time. Balancing the throughput is important .watch it closely to understand how the resources are being utilized. NOTE: if your reader session to flat file just doesn't ever "get fast". OPTIMAL WRITE. increase the Default Buffer Size by a factor of 64k each shot . For example: your write throughput is governed by your read and transformation speed. or lookups being performed.. Particularly if your maps contain aggregates. then try increasing the Shared Session Memory from 12MB to 24MB. send the write output to a flat file for less contention .you are attempting to tune the reader here. updating statistics. If the reader still stabilizes.slow targets are the cause. it's optimal performance was reached with a flat file(s). If the Reader still appears to increase during the session. your read throughput is governed by your transformation and write speed.. etc. tune for the reader.or what needs to be changed in the database. OPTIMIAL THROUGHPUT..good picture of the space available during operation. or your CACHE directory is not on internal disk.while may mean upgrading the hardware to achieve throughput. Remove all other "applications" on the PMServer. slow lookups. check your Index and Data cache settings if you have aggregation. 15. then you have a slow source. 17. I've heard (but not verified) that it has improved performance in some cases by up to 50% 16. If the reader's throughput continues to climb above where it stabilized. removing indexes during load. In the session. Try to merge expression objects. There are many database things you can do here. is to break it in to components for testing: 1) Read Throughput. Place some good server load monitoring tools on your PMServer in development . and break down the copies. there is only so much tuning you can do.. set your lookups to unconnected (for re-use if possible). Check the Performance Statistics to make sure the writer throughput is NOT the bottleneck .

particularly JAVA Virtual Machines.none are outside the map. 1.5 GIG of data). / 1. The only time you want to use "DEFAULT value is when you have to provide a default value for a specific port. should these items be addressed. 0 for false as an output port. Also remember. 3. Compute a single numerical flag: 1 for true. start with the BASIC tuning list above. Make sure the client agrees with the approach. you'll need to review the memory usage diagrams (also available on this web site). Having a default value . After going through all the pieces above. Try to create the filter (true/false) answer inside a port expression upstream. Sometimes it's better to sacrifice a little readability. and Report servers. or that the data sets are large enough to warrant this type of tuning. 4. Whenever possible. Complex filter expressions slow down the mapping. Place the actual expression (complex or not) in an EXPRESSION OBJECT upstream from the filter. It's the old paradigm. otherwise. The order of these items is not relevant to speed. application. Pump this in to the filter . this level. Security Servers.5x. Web Servers. Each one has it's own impact on the overall performance. for a little speed. Again. DEFAULT VALUE.5x.only when data sets get very large.the longer the expression. this applies to PowerMart/PowerCenter (4. This is critical to improving performance on the PMServer machine. Remove all "DEFAULT" value expressions where possible.servers. It causes an unnecessary evaluation of values for every data element in the map.the more severe the speed degradation. Back To Top INFORMATICA INTERMEDIATE TUNING GUIDELINES The following numbered items are for intermediate level tuning. Filter Expressions .6x. All of these items should be broken out to other machines. 2. and still having trouble. expressions/conditions operate fastest in an Expression Object with an output port for the result. Variable Ports are "slower" than Output Expressions. use output expressions instead of variable ports. throughput is also gauged by the number of objects constructed within a map/maplet.other versions have NOT been tested.6x) . The variables are good for .try to evaluate them in a port expression.even the "ERROR(xxx)" command slows down the session. Turns out . Keep in mind . These are items within a map which make a difference in performance (We've done extensive performance testing of Informatica to be able to show these affects). then work your way in to these suggestions. xxxx) condition within an expression. and Informatica Objects . the performance isn't affected unless there are more than 1 Million rows (average size: 2. ALL items are Informatica MAP items. This will always be faster (if assigned to an output port) than a default value. or the more complex . weighing readability and maintainability (true modularity) against raw should see the maximum performance ability with this configuration. BE AWARE: The following tuning tips range from "minor" cleanup to "last resort" types of things . There is another method: placing a variable with an IIF(xxxx. 1. To understand the intermediate section. these are some things to look for.

This is not particular to Informatica.when possible. it also introduces multiple code paths across the logic (thus increasing complexity). 8. This is a good thing. Unfortunately . the only possibility here might be (for example) an ORACLE DECODE function applied to a SQL source. ltrim.there is no "quick" method for identifying unused ports.again.). Expressions such as: IS_SPACES tend slow down the mappings. Also . particularly if you are caching values (cache at around 2000) .. don't specify the Identity column as a target . the operations behind each string function are expensive (de-allocate.let the Sybase Server generate the column.and state driven" but do slow down the processing time .arrange the logic to minimize the use of IIF conditionals. we do not recommend removing their use completely. Surprisingly. however it will be slower than creating an output port with an expression like: to_integer(xxxx) and mapping an integer to an integer. including variables. String functions definitely have an impact on performance. perform the LTRIM/RTRIM functions on the data coming in from a database SQL statement. These functions slow the map down considerably. unused output ports have no affect on performance. "static . and re-allocate memory within a READER block in the session). Test Expressions slow down sessions. they are allocated/reallocated each pass of a row through the expression object. or an integer to a string will perform the conversion. These expressions (if at all avoidable) should be removed in cases where it is not necessary to "test" prior to .they tend to slow the session down further. it is costly in ANY programming language. is to use "varchar/varchar2" data types in your database sources. Sequence Generators slow down mappings. If you don't absolutely need the sequence number in the map for calculation reasons. only try to limit them to necessary operations. However .seems to be the suite spot. Unfortunately there is no "fast" and easy way to create sequence generators. When possible . Simply mapping a string to an integer. or to use delimited strings in source flat files (as much as possible). this is one "card" up a sleve that can be played. even with cached values. 7. 9. String Functions. If your sources are in a database. However in general it is good practice to remove any unused ports in the mapping.perform it in a port expression. this will be much faster than operationally performing it mid-stream. Particularly those that change the length of a string (substring. and you are utilizing Oracle. avoid utilizing an IIF conditional . Unused Ports.if at all avoidable. Datatype conversion . etc. String functions are a necessary and important part of ETL.4. IIF Conditionals are costly. It introduces "decisions" within the tool. this is a data validation expression which has to run through the entire string to determine if it is spaces. then let SQL*Loader create the sequence generator for all Insert Rows. This will help reduce the need for "trimming" input. The cost is not that high for using a sequence generator inside of Informatica. One of the ways we advocate tuning these. 6. If you're using Sybase. It's because PMServer is left to decide if the conversion can be done mid-stream which seems to slow things down. rtrim. much the same as IS_NUMBER has to validate an entire string. Therefore .try to avoid "reusable" sequence generators .

and the users support rework. Multiple Targets are too could speed up the map. there is something your SA's and Network Ops folks could do (if necessary) . then utilize it as a source. that means creating "an" (1) expression for each throughput/translation (taking it to an extreme of course). Slow Sources . it will provide a few more cards to the deck with which you can "tune" the session. Frequently.Session set to Update Else Insert. and 3rd normal form architecture (Corporate Data Warehouse Slides). If you absolutely need a test expression for numerics. In doing this. An alpha in this expression should return a NULL to the computation. Going further. Update Expressions .conversion. You may be better off using a source system extract program to dump it to file first. Sometimes it helps to reduce it to 1 source and 1 target per map.000) then the source system itself may be slow.and the session is pulling many many rows (over 10. If you have this switch turned on .this is covered in detail in the advanced section. then follow the above instructions. Yes . Each object adds computational overhead to the session and timings may suffer. Try to compress the source file. and sometimes multiple sources. where the multiplication operator is the actual speed gain. For further info on this topic. All to often. After which you can change the session setting to: INSERT and UPDATE AS UPDATE or UPDATE AS INSERT. and you've opened a named pipe to get them across the network . If you've got slow sources. Frequently maps are generated with multiple targets. If the sources reside on a different machine. decompress it. then tell the update strategy what to do.NULL) preferably you don't care if it's zero. thus reducing the "object" overhead.if the architecture allows more modularization 1 map per target usually does the trick.the IIF condition is slightly faster than the IS_NUMBER because IS_NUMBER parses the entire string. In doing so . Reduce Number of OBJETS in a map. performs an insert. They could . you can integrate several expressions in to one expression object. But . Going this route also introduces parallel operations. This (despite first appearances) can really burn up time. 10. as well as the target table itself. 13. The way to speed this up is to "know" ahead of time if you need to issue a DD_UPDATE or DD_INSERT inside the mapping.Flat Files. You've introduced the network speed as a variable on the speed of the flat file source. and these sources are flat files. 11.Informatica performs 2 operations for each row: update (w/PK). If the architecture permits change. the tuning get's easier. then if it returns a ZERO rows updated. 12. try this: IIF(<field> * 1 >= 0. Once reaching one map per target. Sometimes if performance is an issue / goal.then you've opened (potentially) a can of worms. see my architecture presentations on Staging Tables. you can look at some of the following possibilities. then try to change the architecture -> 1 map per target is the general rule of thumb. Be aware however. FTP PUT it on the local machine (local to PMServer).<field>. that direct conversion without testing (conversion of an invalid value) will kill the transformation. you could break it up: 1 map per target per operation (such as insert vs update). will definitely slow the session down . the idea of these tools is to make the "data translation map" as easy as possible. If you're reaching across the network to a relational table .

The DTM reader/writer/transformer threads are not left with enough memory to be able to run efficiently.6 products and prior these memory areas become critical. If it's not on an INTERNAL DISK then it will be slower than if it were on an internal disk (C drive for you folks on NT).7 solve some of these problems by caching some of these lookups out to disk when the cache is full. Now.6 / 1. placing aggregators end-to-end in mappings will slow the session down by factors of at least 2. use intermediate tables in the database if required to achieve processing goals. and as the row goes from lookup to lookup.with too many lookups. 15. What happens and why? Well . What needs to be known here is that Informatica's products: PM / PC up through 4. But you still end up with contention . or other items in between the two machines). Even still. followed by another aggregator in a mapping you will still have the problem mentioned above in other words. Maplets containing Aggregators. The reason maplets don't affect speed of the mappings. PM 4. please see the database/datawarehouse tuning guide. In other words. If possible. chances are the session will run very very slowly . 14. 16.backbone the two servers together with a dedicated network line (no hubs. If your map has more than 1 aggregator. nor does it put the I/O on threads therefore being a single strung process it becomes easy for a part of the session/map to become a "blocked" process by I/O factors. This is because of all the I/O activity being a bottleneck in Informatica. At the very least. Eliminate "too many lookups". Too Many Aggregators. The lookups and the aggregators fight for memory space as discussed above. PC 1. routers. if your file is local to PMServer but is still means the actual file is local. your cache is eaten in memory . is they are treated as a part of the mapping once the session starts .6 products. and when dealing with many many rows the session is almost certain to cause the server to "thrash" memory in and out of the OS Swap space. with too many lookups. Particularly in the 4.particularly on the 1.6 / 4. If necessary. 17. you're trading in Memory Contention for Disk Contention. But just because an aggregator is in a maplet doesn't mean it won't affect the mapping.7. and the file is remote . Lookups & Aggregators Fight. examine the location of the file (which device is it on). the swapping / thrashing get's worse.unless the CACHE directory is extremely fast. the internal core doesn't put the aggregators on threads. if you have an aggregator in a maplet. Maplets are a good source for replicating data logic.7x are NOT built for parallel processing. The memory contention might be worse than the disk contention. separate the maps . Each requires Index this case. and your drive seek/access times are very high. For I/O contention and resource monitoring. split the map up in to several different maps. See Memory Layout document for more information. they could put the two machines on the same sub-net.perform the lookups in the . and Data Cache and they "share" the same HEAP segments inside the core. Reduce the number of aggregators in the entire mapping (included maplets) to 1 if possible. because the system OS end's up thrashing (swapping in and out of TEMP/SWAP disk space) with small block sizes to try to locate "find" your lookup row. This doesn't mean a unix file LINK exists locally. The end result is there is no memory left for the sessions to run in.

1 per source per target. It also helps to allow each session to be specified for it's intended purpose (no longer mixing a data driven session with INSERTS only to a single target). If necessary. one step at a time. more manageable business logic. Why does this work? Well . These guidelines may require a level of expertise which involves System Administrators. even though "bulk" is specified in the session. Database Administrators. Think . The most important aspect of advanced tuning is to be able to pinpoint specific bottlenecks. Break the mappings out... you're telling a single database connection to handle multiply diverse database statements . data driven then forces the tool down several other layers of internal code before the data actually can reach the database. and Network Operations folks.. It appears as if Maplets do NOT cause any performance hindrance by themselves. The maplets allow you to better break the mappings out.. Basically it's like this: one session per map/target. Once this is done.. the DBMS server can now handle the insert/update/delete requests in parallel against multiple targets. other times hitting that target. You can refer to those for questions surrounding your hardware / software resources. 1. and are pointed at suggestions for the system.eliminating multiple targets in a single mapping can greatly increase speed. Please be patient. A study of parallel processing has shown again and again. Please proceed cautiously. Obviously.. INFORMATICA ADVANCED TUNING GUIDELINES The following numbered items are for advanced level tuning.. 3. parallelism of mappings and sessions become obvious.. Extensive use of maplets means better. Keep the mappings as simple as possible. Another speed improvement. As usual .in this situation it's extremely difficult for Informatica (or any other tool for that matter) to build BULK operations.sometimes hitting this target. then have the funding to address them. With multiple targets in the same mapping.then that would be the key. position the data in an intermediate target table . Each session can then be placed in to a batch marked "CONCURRENT" if preferences allow. Do not attempt to follow these guidelines if you haven't already made it through all the basic and intermediate guidelines first. If you can avoid complex logic all together . There are other advanced tuning guidelines available for Data Warehousing Tuning.these advanced tuning guidelines come last. and that the tool will revert to NORMAL load if it can't provide a BULK operation on a series of consecutive rows.then a second map reads the target table and performs the aggregation (also provides the option for a group by to be done within the database).first section of the maps. Remember that "BULK" means this is your preference. Because of the unique database connection. Bury complex logic (if you must) in to a maplet. 1 per target. Develop maplets for complex business logic. that the operations can be completed sometimes in half the time of their original counterparts merely by streaming them at the same time. 2. Each session establishes it's own database connection. The old rule of thumb applies here (common sense) the straighter the path .

use an IPC connection instead of a TCP/IP connection. the Informatica Unix User must be given a higher priority. 5. IPC is utilized by Oracle.. 6.the faster the data loads.ORA and TNSNames. 8. This can greatly improve performance. These tasks when logged in to the database then can over-ride others. In Oracle: you'll need to adjust the Listener. Maximum network packet size is a Database Wide Setting. but it should make sense. Change the protocol in the TNSNames. If BCP or SQL*Loader or some other bulk-load facility is utilized. Change Network Packet Size (for Sybase. MS-SQL Server & Oracle users). Sizing memory for these tasks (in shared global areas. but is defined as a Unix System 5 standard specification. The SDU and TDU should be set equally. Setting the maximum database packet size doesn't necessarily hurt any of the other users. between two points. these priorities must also be set. The Unix SA should understand what it takes to rank the Unix logins.if the reader is slow.this protocol can only be used locally... and restart the listener on the server. same effect. Keep in mind that this should only be relegated to the production machines. and doesn't substitute for tuning the database. In other words .4. it's only suggested as a last resort method. Change to IPC Database Connection for Local Oracle Database. SDU = Service Layer Data Buffer Size (in bytes). or the mapping processes. Remember the TIMING is affected by READER/TRANSFORMER/WRITER threads. then the rest of the threads suffer. See the Informatica FAQ page for more information on setting these up. Include the parameters: SDU. Prioritizing the database login that any of the connections use (setup in Server Manager) can assist in changing the priority given to the Informatica executing tasks. Change the Unix User Priority. Be careful . don't forget that each ELEMENT (field) must be weighed . if the writer is slow. Again. and TDU.ORA files. TDU = Transport Layer Data Buffer Size (in bytes). Sorry for the metaphors.simply . and grant priorities to particular tasks. In order to gain speed. If PMServer and Oracle are running on the same server. With complex mappings. and only in certain instances where the Load cycle that Informatica is utilizing is NOT impeding other users.. 7. It should only be utilized when all other methods have been exhausted (tuned). You can find more information on IPC by reading about in in Unix System 5 manuals. Change Database Priorities for the PMServer Database User. Or . it does however allow the Informatica database setting to make use of the larger packet sizes .in this light a firm understanding of how to read performance statistics generated by Informatica becomes important. The typical 'best' settings are between 10k and 20k.ORA and Listener.ORA files. and server settings) must be done if priorities are to be changed. which is usually defaulted at 512 bytes or 1024 bytes.. Translated as: the shorter the distance between the source qualifier and the target . A pipe is only as big as it's smallest diameter. however the speed increases from using Inter Process Communication can be between 2x and 6x. the shorter the distance. A chain is only as strong as it's weakest link.thus transfer more data in a single packet faster.

Particularly if your sources contain 13 Million + rows. Try not to load across the network. Typically I've seen folks attempt to assign a session large heaps of memory (in hopes it will increase speed). The other thing is: try to co-locate the two machines (pmserver and Target database server) on the same sub-net. This should only be used as a last resort . Set Session Shared Memory Settings between 12MB and 24MB. Not having the database local means: 1) the repository is across the network (slow). Again. or if you have a dedicated Unix machine on which Informatica is running. 9. Having a localized database also allows you to setup a target table locally .have the pmserver executed under a super user (SU) command. the less likely you are to see a gain by increasing either buffer block size. This seems to be a "sweet spot" for handling blocks of rows in side the Informatica process. at least try to localize the repository on a database instance on the same machine as the server. go ahead and specify the Session Shared Memory size at 1 or 2 GIG's. The more complex the mapping. DS3/T1. 11. set the Buffer Block Size to 12MB. 12. Also .because Informatica potentially has to process cells (ports/fields/values) inside of a huge memory block. and bulk-load in to the target table.and have optimizations built in to the core to transfer data incrementally. or as a whole refresh. at that point you can index any way you like .once all other tuning avenues have been exhausted. If at all possible. Place Informatica processes to read from the snapshot. ftp to the target server. Keep in mind that the Shared Buffer Block Size should be set in relative size to the Shared Memory Setting. If you have dedicated lines. It may be to your advantage.the result will be more memory "handling" going on in the background. try to co-locate PMServer executable with a local database.which you can then "dump" following a load. and you're running only 1 to 3 sessions concurrently. also potentially slow. 2) the sources / targets are across the network. 10. If you set a Shared Mem to 124 MB. Then schedule the snapshot before running processes. between servers. and why simply giving it more memory doesn't necessarily provide speed. 13.. If you've got 12+ GIG's. Set Shared Buffer Block Size around 128k. All it tends to do is slow down the processing. something that's covered in the memory layout document. keep them in relative sizes. If you don't . or shared memory settings . this will take care of reprioritizing Informatica's core process. If you have to load across the network. This works extremely well for situations where append or complete refresh is taking place. so less actual work will be done by Informatica.this holds true for the simpler mappings. The RDBMS servers are built for this kind of data transfer . See the memory layout document for further information on how this affects Informatica and it's memory handling. Use SNAPSHOTS with your Database.. even the same hub if possible. use a snapshot or Advanced Replication to get data out of the source systems and in to a staging table (duplicate of the source). etc. MEMORY SETTINGS: The settings above are for an average configured machine. thus resulting in a potential re-allocation of the whole block. This eliminates unnecessary routing of packets all over the network. any machine with less than 10 GIG's of RAM should abide by the above settings.

or your load window exceeds 5 hours. I'd say potentially in that order. You can gain added performance with sorted ports when you partition the session. Sorting – performance issues You can improve Aggregator transformation performance by using the Sorted Input option. Both Oracle and Sybase perform extremely well when given 6+ CPU's and 8 or 12 GIG's RAM setup on an EMC device at 7200 RPM with minimum of 4 controllers. 16.GIG's per hour/MB per Hour. I recommend a minimum of 8 CPU's as a starter box. I recommend at least 4 to 6 controllers. INCREASE THE DISK SPEED. and increase to 12 as necessary. One of the most common fallacies is that a Data Warehouse RDBMS needs only 2 controllers. RAM. 14. horrible for Data Warehouse performance. or for smaller systems (totalling less than 5 Million rows in the warehouse). particularly on bulk loads. However. To use the Sorted Input option. However.set on a Raid 0+1 array. RAM. As the Informatica Server reads rows for a group. and 13 disks to survive. plunk the money down and go get an EMC device. or you want to create 10 indexes in 4 hours on 34 million rows. Upgrade your Hardware. When the Sorted Input option is selected. the Informatica Server performs aggregate calculations as it reads. then add CPU power. this is for huge Data Warehousing systems . since data is not sorted. So what's the bottom line? Disk RPM's. You should see a significant increase in performance after installing or upgrading to such a configuration. Switch to Raid 0+1. Raid Level 5 is great for redundancy. A box with 4 CPU's is great for development. it stores group information in memory. I've heard of a 4 CPU Dec-Alpha system outperforming a 6 CPU system. if you want Gigabytes per second throughput. Again. A 4 CPU machine just won't cut the mustard today for this size of operation. the Informatica Server assumes all data is sorted by group. and # of CPU's. particularly since the Hardware is now nearly all hotswappable.. Yes Snapshots only work if your sources are homogeneous to your targets (on the same type of system).. If it's necessary. This is fine if you're running less than 5 Million Rows total through your system. . spinning at 7200 RPM or better. Raid 0+1 is the preferred method for data warehouses out there. and the software to manage this has improved greatly. and most folks find that the replication is just as safe as a Raid 5. you must pass sorted data to the Aggregator transformation.and increase the throughput speed without affecting the source systems. On your production box.. When Sorted Input is not selected. it performs aggregate calculations as it reads. and at least 50 disks . When necessary. 15. and the Disk modifications discussed above. Bus Speed. keep in mind that Bus Speed is also a huge factor here.

For example.5. or the Update Strategy transformation appears before the Aggregator transformation in the mapping. Sorted Input Conditions Do not use the Sorted Input option if any of the following conditions are true: • The aggregate expression uses nested aggregate functions. you pass sorted data through the Aggregator. • Using the same sort order configured for the session. one Aggregator has the STORE_ID and ITEM Group By ports. reading all values before performing aggregate calculations. the Informatica Server performs an aggregation for the three records in the 101/battery group as soon as it finds the new group. the session fails.19 2. • Input data is data-driven. Pre-Sorting Data To use the Sorted Input option. • The mapping is upgraded from PowerMart 3. 201/battery: STORE_ID 101 101 101 201 201 ITEM ‘battery’ ‘battery’ ‘battery’ ‘battery’ ‘battery’ 3 1 2 4 1 QTY 2.59 1.99 PRICE If you use the Sorted Input option and do not presort data correctly. in the order they appear in the Aggregator transformation. Data must be sorted as follows: • By the Aggregator group by ports. When you pass the following data through the Aggregator.the Informatica Server stores data for each group until it reads the entire source to ensure all aggregate calculations are accurate. • The session uses incremental aggregation. If you use the Sorted Input option under these circumstances. You choose to treat source data as data driven in the session properties. . the Informatica Server reverts to default aggregate behavior. with the Sorted Input option selected.99 3.59 1.

the Informatica Server fails the session. see Sorted Ports. Indexes – Make sure indexes are in place and tables have been analyzed Might be able to use index hints in source qualifier . data passing into the Aggregator transformation must be sorted using the French sort order. you can use the Number of Sorted Ports option in the Source Qualifier transformation to sort group by columns in the source database.If data is not in strict ascending or descending order based on the session sort order. you can use an external utility to sort file data before starting the session. Group By columns must be in the exact same order in both the Aggregator and Source Qualifier transformations . If the session uses relational sources. If the session uses file sources. if you configure a session to use a French sort order. For example. For details on sorting data in the Source Qualifier.

Sign up to vote on this title
UsefulNot useful