Professional Documents
Culture Documents
specific environment, soft are! hard are limitations, and the number of records going through a mapping. This Best "ractice offers some guidelines for tuning mappings.
Description
#nalyze mappings for tuning only after you have tuned the system, source, and target for peak performance. To optimize mappings, you generally reduce the number of transformations in the mapping and delete unnecessary links bet een transformations. $or transformations that use data cache %such as #ggregator, &oiner, 'ank, and (ookup transformations), limit connected input!output or output ports. *oing so can reduce the amount of data the transformations store in the data cache. Too many (ookups and #ggregators encumber performance because each re+uires index cache and data cache. Since both are fighting for memory space, decreasing the number of these transformations in a mapping can help improve speed. Splitting them up into different mappings is another option. (imit the number of #ggregators in a mapping. # high number of #ggregators can increase I!, activity on the cache directory. -nless the seek!access time is fast on the directory itself, having too many #ggregators can cause a bottleneck. Similarly, too many (ookups in a mapping causes contention of disk and memory, insufficient memory to run a mapping efficiently. hich can lead to thrashing, leaving
$hen to Cache Loo#ups 0hen caching is enabled, the "o er.enter Server caches the lookup table and +ueries the lookup cache during the session. 0hen this option is not enabled, the "o er.enter Server +ueries the lookup table on a ro -by-ro basis. %OT"3 #ll the tuning options mentioned in this Best "ractice assume that memory and cache sizing for lookups are sufficient to ensure that caches ill not page to disks. "ractices regarding memory and cache sizing for (ookup transformations are covered in Best "ractice3 Tuning Sessions for Better Performance . In general, if the lookup table needs less than 4556B of memory, lookup caching should be enabled.
IT26S %source)3
6#9-$#.T-'2'3
*I6;IT26S3
Cached Loo#up L(P)Manufacturer Build .ache 'ead Source 'ecords 2xecute (ookup Total = of *isk 'eads L(P)D*M)*T"MS Build .ache 'ead Source 'ecords 2xecute (ookup Total = of *isk 'eads <55555 8555 5 <58555 :55 8555 5 8:55
'n-cached Loo#up
.onsider the case here 6#9-$#.T-'2' is the lookup table. If the lookup table is cached, it ill take a total of 8:55 disk reads to build the cache and execute the lookup. If the lookup table is not cached, then it ill take a total of <5,555 total disk reads to execute the lookup. In this case, the number of records in the lookup table is small in comparison ith the number of times the lookup is executed. So this lookup should be cached. This is the more likely scenario. .onsider the case here *I6;IT26S is the lookup table. If the lookup table is cached, it ill result in <58,555 total disk reads to build and execute the lookup. If the lookup table is not cached, then the disk reads ould total <5,555. In this case the number of records in the lookup table is not small in comparison ith the number of times the lookup ill be executed. Thus the lookup should not be cached. -se the follo ing eight step method to determine if a lookup should be cached3 <. :. 4. >. 8. A. C. D. .ode the lookup into the mapping. Select a standard set of data from the source. $or example, add a here clause on a relational source to load a sample <5,555 ro s. 'un the mapping ith caching turned off and save the log. 'un the mapping ith caching turned on and save the log to a different name than the log created in step 4. (ook in the cached lookup log and determine ho long it takes to cache the lookup ob1ect. 9ote this time in seconds3 (,,?-" TI62 I9 S2.,9*S @ (S. In the non-cached log, take the time from the last lookup cache to the end of the load in seconds and divide it into the number or ro s being processed3 9,9-.#.B2* ',0S "2' S2.,9* @ 9'S. In the cached log, take the time from the last lookup cache to the end of the load in seconds and divide it into number or ro s being processed3 .#.B2* ',0S "2' S2.,9* @ .'S. -se the follo ing formula to find the breakeven ro point3 %(SE9'SE.'S)!%.'S-9'S) @ 7 0here 7 is the breakeven point. If your expected source records is less than 7, it is better to not cache the lookup. If your expected source records is more than 7, it is better to cache the lookup. $or example3 #ssume the lookup takes <AA seconds to cache %(S@<AA). #ssume ith a cached lookup the load is :4: ro s per second %.'S@:4:). #ssume ith a non-cached lookup the load is <>C ro s per second %9'S @ <>C). The formula ould result in3 %<AAE<>CE:4:)!%:4:-<>C) @ AA,A54.
Thus, if the source has less than AA,A54 records, the lookup should not be cached. If it has more than AA,A54 records, then the lookup should be cached. Sharing Lookup Caches There are a number of methods for sharing lookup caches.
$ithin a specific session run for a mapping, if the same lookup is used multiple times in a mapping, the "o er.enter Server ill re-use the cache for the multiple
from the same table but are expected to return different columns of data, it may be better to setup the multiple lookups to bring back the same columns even though not all return ports are used in all lookups. Bringing back a common set of columns may reduce the number of disk reads.
+cross sessions of the same mapping, the use of an unnamed persistent cache allo s multiple runs to use an existing cache file stored on the "o er.enter Server. If the option of creating a persistent cache is set in the lookup properties, the memory cache created for the lookup during the initial run is saved to the "o er.enter Server. This can improve performance because the Server builds the memory cache from cache files instead of the database. This feature should only be used table is not expected to change bet een session runs. hen the lookup
+cross different mappings and sessions, the use of a named persistent cache allo s sharing of an existing cache file.
Reducing the Number of Cached Rows There is an option to use a S/( override in the creation of a lookup cache. ,ptions can be added to the 0B2'2 clause to reduce the set of records included in the resulting cache. %OT", If you use a S/( override in a lookup, the lookup must be cached. Optimizing the Lookup Condition In the case here a lookup uses more than one lookup condition, set the conditions e+ual sign first in order to optimize lookup performance. Indexing the Lookup Table The "o er.enter Server must +uery, sort and compare values in the lookup condition columns. #s a result, indexes on the database table should include every column used in a lookup condition. This can improve performance for both cached and un-cached lookups. F In the case of a cached lookup, an ,'*2' BG condition is issued in the S/( statement used to create the cache. .olumns used in the ,'*2' BG condition should be indexed. The session log ill contain the ,'*2' BG statement. F In the case of an un-cached lookup, since a S/( statement created for each ro passing into the lookup transformation, performance can be helped by indexing columns in the lookup condition. ith an
'se simple columns in the group &! condition to make the #ggregator Transformation more efficient. 0hen possible, use numbers instead of strings or dates in the H',-" BG columns. #lso avoid complex expressions in the #ggregator expressions, especially in H',-" BG ports. 'se the Sorted *nput option in the aggregator. This option re+uires that data sent to the aggregator be sorted in the order in hich the ports are used in the aggregators group by. The Sorted Input option decreases the use of aggregate caches. 0hen it is used, the "o er.enter Server assumes all data is sorted by group and, as a group is passed through an aggregator, calculations can be performed and information passed on to the next transformation. 0ithout sorted input, the Server must ait for all ro s of data before processing aggregate calculations. -se of the Sorted Inputs option is usually accompanied by a Source /ualifier hich uses the 9umber of Sorted "orts option. 'se an "/pression and 'pdate Strateg! instead of an #ggregator Transformation. This techni+ue can only be used if the source data can be sorted. $urther, using this option assumes that a mapping is using an #ggregator ith Sorted Input option. In the 2xpression of of data to determine hether the Transformation, the use of variable ports is re+uired to hold data from the previous ro data processed. The premise is to use the previous ro current ro is a part of the current group, then its data Transformation and set the first ro of a ne is a part of the current group or is the beginning of a ne ould follo
ould be used to continue calculating the current the 2xpression group to insert and the follo ing ro s to update.
:. .opy the mapping and replace half the complex expressions 4. 'un and time the edited session.
>. 6ake another copy of the mapping and replace the other half of the complex expressions ith a constant. 8. 'un and time the edited session. "rocessing field level transformations takes time. If the transformation expressions are complex, then processing ill be slo er. Its often possible to get a <5- :5J performance improvement by optimizing complex field level transformations. -se the target table mapping
.,9.#T%.,9.#T%$I'ST;9#62, ), (#ST;9#62) It can be optimized to3 $I'ST;9#62 LL LL (#ST;9#62 Remem&er that **-56 is a function that returns a alue , not 1ust a logical test. This allo s many logical statements to be ritten in a more compact fashion. $or example3 II$%$(H;#@G and $(H;B@G and $(H;.@G, I#(;#KI#(;BKI#(;., II$%$(H;#@G and $(H;B@G and $(H;.@9, I#(;#KI#(;B, II$%$(H;#@G and $(H;B@9 and $(H;.@G, I#(;#KI#(;., II$%$(H;#@G and $(H;B@9 and $(H;.@9, I#(;#,
#se +re& and +ost&Session S,L Commands Gou can specify pre- and post-session S/( commands in the "roperties tab of the Source /ualifier transformation and in the "roperties tab of the target instance in a mapping. To increase the load speed, use these commands to drop indexes on the target before the session runs, then recreate them hen the session completes. hen using the S/( statements3
Gou can use any command that is valid for the database type. Bo ever, the "o er.enter Server does not allo might. nested comments, even though the database
Gou can use mapping parameters and variables in S/( executed against the source, but not against the target. -se a semi-colon %O) to separate multiple statements. The "o er.enter Server ignores semi-colons ithin !E ...E!. If you need to use a semi-colon outside of +uotes or comments, you can escape it a back slash %P). The 0orkflo 6anager does not validate the S/(. ith ithin single +uotes, double +uotes, or
#se )n%ironmental S,L $or relational databases, you can execute S/( commands in the database environment connection. $or instance, you can set isolation levels on the source and target systems to avoid deadlocks. $ollo the guidelines mentioned above for using the S/( statements. hen
connecting to the database. Gou can use this for source, target, lookup, and stored procedure
10