Tuning Mappings For Better Performance

Tuning Mappings for Better Performance

Challenge
In general, mapping-level optimization takes time to implement, but can significantly boost performance. Sometimes the mapping is the biggest bottleneck in the load process because business rules determine the number and complexity of transformations in a mapping. Before deciding on the best route to optimize the mapping architecture, you need to resolve some basic issues. Tuning mappings is a tiered process. The first tier can be of assistance almost universally, bringing about a performance increase in all scenarios. The second tier of tuning processes may yield only small performance increase, or can be of significant value, depending on the situation. Some factors to consider hen choosing tuning processes at the mapping level include the
specific environment, soft are! hard are limitations, and the number of records going through a mapping. This Best "ractice offers some guidelines for tuning mappings.
Description
#nalyze mappings for tuning only after you have tuned the system, source, and target for peak performance. To optimize mappings, you generally reduce the number of transformations in the mapping and delete unnecessary links bet een transformations. $or transformations that use data cache %such as #ggregator, &oiner, 'ank, and (ookup transformations), limit connected input!output or output ports. *oing so can reduce the amount of data the transformations store in the data cache. Too many (ookups and #ggregators encumber performance because each re+uires index cache and data cache. Since both are fighting for memory space, decreasing the number of these transformations in a mapping can help improve speed. Splitting them up into different mappings is another option. (imit the number of #ggregators in a mapping. # high number of #ggregators can increase I!, activity on the cache directory. -nless the seek!access time is fast on the directory itself, having too many #ggregators can cause a bottleneck. Similarly, too many (ookups in a mapping causes contention of disk and memory, insufficient memory to run a mapping efficiently. hich can lead to thrashing, leaving
Consider Single-Pass Reading

If several mappings use the same data source, consider a single-pass reading. .onsolidate separate mappings into one mapping ith either a single Source /ualifier Transformation or one set of Source /ualifier Transformations as the data source for the separate data flo s. Similarly, if a function is used in several mappings, a single-pass reading number of times that function ill be called in the session. ill reduce the

Optimize SQL O errides
0hen S/( overrides are re+uired in a Source /ualifier, (ookup Transformation, or in the update override of a target ob1ect, be sure the S/( statement is tuned. The extent to hich and ho S/( can be tuned depends on the underlying source or target database system. See the section Tuning S/( ,verrides and 2nvironment for Better "erformance for more information.
Scrutinize Datat!pe Con ersions

"o er.enter Server automatically makes conversions bet een compatible datatypes. 0hen these conversions are performed unnecessarily performance slo s. $or example, if a mapping moves data from an Integer port to a *ecimal port, then back to an Integer port, the conversion may be unnecessary. In some instances ho ever, datatype conversions can help improve performance. This is especially true hen integer values are used in place of other datatypes for performing comparisons using (ookup and $ilter transformations.
"liminate Transformation "rrors

(arge numbers of evaluation errors significantly slo performance of the "o er.enter Server. *uring transformation errors, the "o er.enter Server engine pauses to determine the cause of the error, removes the ro causing the error from the data flo , and logs the error in the session log. Transformation errors can be caused by many things including3 conversion errors, conflicting mapping logic, any condition that is specifically set up as an error, and so on. The session log can help point out the cause of these errors. If errors recur consistently for certain transformations, re-evaluate the constraints for these transformation. #ny source of errors should be traced and eliminated.
Optimize Loo#up Transformations

There are a number of ays to optimize lookup transformations that are setup in a mapping.
$hen to Cache Loo#ups 0hen caching is enabled, the "o er.enter Server caches the lookup table and +ueries the lookup cache during the session. 0hen this option is not enabled, the "o er.enter Server +ueries the lookup table on a ro -by-ro basis. %OT"3 #ll the tuning options mentioned in this Best "ractice assume that memory and cache sizing for lookups are sufficient to ensure that caches ill not page to disks. "ractices regarding memory and cache sizing for (ookup transformations are covered in Best "ractice3 Tuning Sessions for Better Performance . In general, if the lookup table needs less than 4556B of memory, lookup caching should be enabled.

# better rule of thumb than memory size is to determine the size of the potential lookup cache ith regard to the number of ro s expected to be processed. $or example, consider the follo ing example. In 6apping 7, the source and lookup contain the follo ing number of records3
IT26S %source)3
8555 records :55 records <55555 records
6#9-$#.T-'2'3
*I6;IT26S3
%um&er of Dis# Reads
Cached Loo#up L(P)Manufacturer Build .ache 'ead Source 'ecords 2xecute (ookup Total = of *isk 'eads L(P)D*M)*T"MS Build .ache 'ead Source 'ecords 2xecute (ookup Total = of *isk 'eads <55555 8555 5 <58555 :55 8555 5 8:55
'n-cached Loo#up
5 8555 8555 <5555
5 8555 8555 <5555
.onsider the case here 6#9-$#.T-'2' is the lookup table. If the lookup table is cached, it ill take a total of 8:55 disk reads to build the cache and execute the lookup. If the lookup table is not cached, then it ill take a total of <5,555 total disk reads to execute the lookup. In this case, the number of records in the lookup table is small in comparison ith the number of times the lookup is executed. So this lookup should be cached. This is the more likely scenario. .onsider the case here *I6;IT26S is the lookup table. If the lookup table is cached, it ill result in <58,555 total disk reads to build and execute the lookup. If the lookup table is not cached, then the disk reads ould total <5,555. In this case the number of records in the lookup table is not small in comparison ith the number of times the lookup ill be executed. Thus the lookup should not be cached. -se the follo ing eight step method to determine if a lookup should be cached3 <. :. 4. >. 8. A. C. D. .ode the lookup into the mapping. Select a standard set of data from the source. $or example, add a here clause on a relational source to load a sample <5,555 ro s. 'un the mapping ith caching turned off and save the log. 'un the mapping ith caching turned on and save the log to a different name than the log created in step 4. (ook in the cached lookup log and determine ho long it takes to cache the lookup ob1ect. 9ote this time in seconds3 (,,?-" TI62 I9 S2.,9*S @ (S. In the non-cached log, take the time from the last lookup cache to the end of the load in seconds and divide it into the number or ro s being processed3 9,9-.#.B2* ',0S "2' S2.,9* @ 9'S. In the cached log, take the time from the last lookup cache to the end of the load in seconds and divide it into number or ro s being processed3 .#.B2* ',0S "2' S2.,9* @ .'S. -se the follo ing formula to find the breakeven ro point3 %(SE9'SE.'S)!%.'S-9'S) @ 7 0here 7 is the breakeven point. If your expected source records is less than 7, it is better to not cache the lookup. If your expected source records is more than 7, it is better to cache the lookup. $or example3 #ssume the lookup takes <AA seconds to cache %(S@<AA). #ssume ith a cached lookup the load is :4: ro s per second %.'S@:4:). #ssume ith a non-cached lookup the load is <>C ro s per second %9'S @ <>C). The formula ould result in3 %<AAE<>CE:4:)!%:4:-<>C) @ AA,A54.
Thus, if the source has less than AA,A54 records, the lookup should not be cached. If it has more than AA,A54 records, then the lookup should be cached. Sharing Lookup Caches There are a number of methods for sharing lookup caches.
$ithin a specific session run for a mapping, if the same lookup is used multiple times in a mapping, the "o er.enter Server ill re-use the cache for the multiple

instances of the lookup. -sing the same lookup multiple times in the mapping more resource intensive ill be
ith each successive instance. If multiple cached lookups are
from the same table but are expected to return different columns of data, it may be better to setup the multiple lookups to bring back the same columns even though not all return ports are used in all lookups. Bringing back a common set of columns may reduce the number of disk reads.
+cross sessions of the same mapping, the use of an unnamed persistent cache allo s multiple runs to use an existing cache file stored on the "o er.enter Server. If the option of creating a persistent cache is set in the lookup properties, the memory cache created for the lookup during the initial run is saved to the "o er.enter Server. This can improve performance because the Server builds the memory cache from cache files instead of the database. This feature should only be used table is not expected to change bet een session runs. hen the lookup
+cross different mappings and sessions, the use of a named persistent cache allo s sharing of an existing cache file.
Reducing the Number of Cached Rows There is an option to use a S/( override in the creation of a lookup cache. ,ptions can be added to the 0B2'2 clause to reduce the set of records included in the resulting cache. %OT", If you use a S/( override in a lookup, the lookup must be cached. Optimizing the Lookup Condition In the case here a lookup uses more than one lookup condition, set the conditions e+ual sign first in order to optimize lookup performance. Indexing the Lookup Table The "o er.enter Server must +uery, sort and compare values in the lookup condition columns. #s a result, indexes on the database table should include every column used in a lookup condition. This can improve performance for both cached and un-cached lookups. F In the case of a cached lookup, an ,'*2' BG condition is issued in the S/( statement used to create the cache. .olumns used in the ,'*2' BG condition should be indexed. The session log ill contain the ,'*2' BG statement. F In the case of an un-cached lookup, since a S/( statement created for each ro passing into the lookup transformation, performance can be helped by indexing columns in the lookup condition. ith an
Optimize -ilter and Router Transformations

-iltering data as earl! as possi&le in the data flo. improves the efficiency of a mapping. Instead of using a $ilter Transformation to remove a sizeable number of ro s in the middle or end of a mapping, use a filter on the Source /ualifier or a $ilter Transformation immediately after the source +ualifier to improve performance.

+ oid comple/ e/pressions .hen creating the filter condition . $ilter transformations are most effective hen a simple integer or T'-2!$#(S2 expression is used in the filter condition. -ilters or routers should also &e used to drop re1ected ro s from an -pdate Strategy transformation if re1ected ro s do not need to be saved. Replace multiple filter transformations .ith a router transformation . This reduces the number of transformations in the mapping and makes the mapping easier to follo .
Optimize +ggregator Transformations

#ggregator Transformations often slo processing it. performance because they must group data before
'se simple columns in the group &! condition to make the #ggregator Transformation more efficient. 0hen possible, use numbers instead of strings or dates in the H',-" BG columns. #lso avoid complex expressions in the #ggregator expressions, especially in H',-" BG ports. 'se the Sorted *nput option in the aggregator. This option re+uires that data sent to the aggregator be sorted in the order in hich the ports are used in the aggregators group by. The Sorted Input option decreases the use of aggregate caches. 0hen it is used, the "o er.enter Server assumes all data is sorted by group and, as a group is passed through an aggregator, calculations can be performed and information passed on to the next transformation. 0ithout sorted input, the Server must ait for all ro s of data before processing aggregate calculations. -se of the Sorted Inputs option is usually accompanied by a Source /ualifier hich uses the 9umber of Sorted "orts option. 'se an "/pression and 'pdate Strateg! instead of an #ggregator Transformation. This techni+ue can only be used if the source data can be sorted. $urther, using this option assumes that a mapping is using an #ggregator ith Sorted Input option. In the 2xpression of of data to determine hether the Transformation, the use of variable ports is re+uired to hold data from the previous ro data processed. The premise is to use the previous ro current ro is a part of the current group, then its data Transformation and set the first ro of a ne is a part of the current group or is the beginning of a ne ould follo
group. Thus, if the ro
ould be used to continue calculating the current the 2xpression group to insert and the follo ing ro s to update.
group function. #n -pdate Strategy Transformation
Optimize 0oiner Transformations

&oiner transformations can slo performance because they need additional space in memory at run time to hold intermediate results. Define the ro.s from the smaller set of data in the 1oiner as the Master ro.s . The 6aster ro s are cached to memory and the detail records are then compared to ro s in the

cache of the 6aster ro s. In order to minimize memory re+uirements, the smaller set of data should be cached and thus set as 6aster. 'se %ormal 1oins .hene er possi&le. 9ormal 1oins are faster than outer 1oins and the resulting set of data is also smaller. 'se the data&ase to do the 1oin hen sourcing data from the same database schema. *atabase systems usually can perform the 1oin more +uickly than the Informatica Server, so a S/( override or a 1oin condition should be used hen 1oining multiple tables from the same database schema.
Optimize Se2uence 3enerator Transformations

Se+uence Henerator transformations need to determine the next available se+uence number, thus increasing the 9umber of .ached Ialues property can increase performance. This property determines the number of values the Informatica Server caches at one time. If it is set to cache no values then the Informatica Server must +uery the Informatica repository each time to determine hat is the next number hich can be used. .onfiguring the 9umber of .ached Ialues to a value greater than <555 should be considered. It should be noted any cached values not used in the course of a session are lost since the se+uence generator value in the repository is set, hen it is called next time, to give the next set of cache values.
+ oid "/ternal Procedure Transformations

$or the most part, making calls to external procedures slo s do n a session. If possible, avoid the use of these Transformations, hich include Stored "rocedures, 2xternal "rocedures and #dvanced 2xternal "rocedures.
-ield Le el Transformation Optimization

#s a final step in the tuning process, expressions used in transformations can be tuned. 0hen examining expressions, focus on complex expressions for possible simplification. To help isolate slo <. Time the session expressions, do the follo ing3 ith the original expression. ith a constant.
:. .opy the mapping and replace half the complex expressions 4. 'un and time the edited session.
>. 6ake another copy of the mapping and replace the other half of the complex expressions ith a constant. 8. 'un and time the edited session. "rocessing field level transformations takes time. If the transformation expressions are complex, then processing ill be slo er. Its often possible to get a <5- :5J performance improvement by optimizing complex field level transformations. -se the target table mapping

reports or the 6etadata 'eporter to examine the transformations. (ikely candidates for optimization are the fields ith the most complex expressions. ?eep in mind that there may be more than one field causing performance problems.
actoring out Common Logic

This can reduce the number of times a mapping performs the same logic. If a mapping performs the same logic multiple times in a mapping, moving the task upstream in the mapping may allo the logic to be done 1ust once. $or example, a mapping has five target splits. tables. 2ach target re+uires a Social Security 9umber lookup. Instead of performing the lookup right before each target, move the lookup to a position before the data flo
!inimize unction Calls

#nytime a function is called it takes resources to process. There are several common examples here function calls can be reduced or eliminated. +ggregate function calls can sometime be reduced. In the case of each aggregate function call, the Informatica Server must search and group the data. Thus the follo ing expression3 S-6%.olumn #) K S-6%.olumn B) .an be optimized to3 S-6%.olumn # K .olumn B) *n general4 operators are faster than functions, so operators should be used possible. $or example if you have an expression henever
hich involves a .,9.#T function such as3
.,9.#T%.,9.#T%$I'ST;9#62, ), (#ST;9#62) It can be optimized to3 $I'ST;9#62 LL LL (#ST;9#62 Remem&er that **-56 is a function that returns a alue , not 1ust a logical test. This allo s many logical statements to be ritten in a more compact fashion. $or example3 II$%$(H;#@G and $(H;B@G and $(H;.@G, I#(;#KI#(;BKI#(;., II$%$(H;#@G and $(H;B@G and $(H;.@9, I#(;#KI#(;B, II$%$(H;#@G and $(H;B@9 and $(H;.@G, I#(;#KI#(;., II$%$(H;#@G and $(H;B@9 and $(H;.@9, I#(;#,

II$%$(H;#@9 and $(H;B@G and $(H;.@G, I#(;BKI#(;., II$%$(H;#@9 and $(H;B@G and $(H;.@9, I#(;B, II$%$(H;#@9 and $(H;B@9 and $(H;.@G, I#(;., II$%$(H;#@9 and $(H;B@9 and $(H;.@9, 5.5)))))))) .an be optimized to3 II$%$(H;#@G, I#(;#, 5.5) K II$%$(H;B@G, I#(;B, 5.5) K II$%$(H;.@G, I#(;., 5.5) The original expression had D II$s, <A #9*s and :> comparisons. The optimized expression results in 4 II$s, 4 comparisons and t o additions. Be creati e in making expressions more efficient. The follo ing is an example of re ork of an expression hich eliminates three comparisons do n to one3 $or example3 II$%7@< ,' 7@8 ,' 7@M, NyesN, NnoN) .an be optimized to3 II$%6,*%7, >) @ <, NyesN, NnoN) Calculate Once" #se !an$ Times #void calculating or testing the same value multiple times. If the same sub-expression is used several times in a transformation, consider making the sub-expression a local variable. The local variable can be used only once can speed performance. Choose Numeric %ersus String Operations The Informatica Server processes numeric operations faster than string operations. $or example, if a lookup is done on a large amount of data on t o columns, 26"(,G22;9#62 and 26"(,G22;I*, configuring the lookup around 26"(,G22;I* improves performance. Optimizing Char&Char and Char&'archar Comparisons 0hen the Informatica Server performs comparisons bet een .B#' and I#'.B#' columns, it slo s each time it finds trailing blank spaces in the ro . The Treat .B#' as .B#' ,n 'ead option can be set in the Informatica Server setup so that the Informatica Server does not trim trailing spaces from the end of .B#' source fields. #se ()CO() instead of LOO*#+ ithin the transformation but by calculating the variable only

0hen a (,,?-" function is used, the Informatica Server must lookup a table in the database. 0hen a *2.,*2 function is used, the lookup values are incorporated into the expression itself so the Informatica Server does not need to lookup a separate table. Thus, small set of unchanging values, using *2.,*2 may improve performance. Reduce the Number of Transformations in a !apping 0henever possible the number of transformations should be reduced. #s there is al ays overhead involved in moving data bet een transformations. #long the same lines, unnecessary links bet een transformations should be removed to minimize the amount of data moved. This is especially important Transformation. ith data being pulled from the Source /ualifier hen looking up a
#se +re& and +ost&Session S,L Commands Gou can specify pre- and post-session S/( commands in the "roperties tab of the Source /ualifier transformation and in the "roperties tab of the target instance in a mapping. To increase the load speed, use these commands to drop indexes on the target before the session runs, then recreate them hen the session completes. hen using the S/( statements3
#pply the follo ing guidelines
Gou can use any command that is valid for the database type. Bo ever, the "o er.enter Server does not allo might. nested comments, even though the database
Gou can use mapping parameters and variables in S/( executed against the source, but not against the target. -se a semi-colon %O) to separate multiple statements. The "o er.enter Server ignores semi-colons ithin !E ...E!. If you need to use a semi-colon outside of +uotes or comments, you can escape it a back slash %P). The 0orkflo 6anager does not validate the S/(. ith ithin single +uotes, double +uotes, or
#se )n%ironmental S,L $or relational databases, you can execute S/( commands in the database environment connection. $or instance, you can set isolation levels on the source and target systems to avoid deadlocks. $ollo the guidelines mentioned above for using the S/( statements. hen
connecting to the database. Gou can use this for source, target, lookup, and stored procedure
10

Tuning Mappings For Better Performance

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Tuning Mappings For Better Performance

Uploaded by

Copyright:

Available Formats

Tuning Mappings for Better Performance