You are on page 1of 40

Informatica

Performance Tuning
Performance Tuning Methodology
• It is an iterative process – Take measurements 
Analyze  Make one adjustment  Take
measurements

• Quit after the point of diminishing returns

• Overall plan
Establish benchmark Optimize memory Isolate
bottleneck  Eliminate bottleneck  Take adv of
underutilized CPU and memory.
The Tuning Environment
• Hardware (CPU bandwidth, RAM, disk space,
etc.) should be similar to production
• Database configuration should be similar to
production
• Data volume and characteristics should be similar to
production
• Challenge: production data is constantly
changing
Optimal tuning may be data dependent
Estimate “average” behavior
Estimate “worst case” behavior
Preliminary steps
• Eliminate transformation errors & data rejects
• Override tracing level to terse or normal
• Source row logging requires reader to hold onto
buffers until data is written to target  EVEN IF
THERE ARE NO ERRORS
Identify the bottleneck
• Target
• Source
• Transformations
• Mapping/Session
Thread Statistics in Session Log
Before Tuning
Thread [READER_1_2_1] created for [the read stage] of partition point [SQ_NON_SEARCH] has completed.
Total Run Time = [1467.493398] secs
Total Idle Time = [1367.666442] secs
Busy Percentage = [6.802549]
Thread [TRANSF_1_2_1] created for [the transformation stage] of partition point [SQ_NON_SEARCH] has completed.
Total Run Time = [1379.375692] secs
Total Idle Time = [464.380539] secs
Busy Percentage = [66.334006]
Thread work time breakdown:
Union: 0.145138 percent
EXP_NON_SEARCH: 99.854862 percent

After Tuning
Thread [READER_1_3_1] created for [the read stage] of partition point [SQ_NON_SEARCH] has completed.
Total Run Time = [461.219082] secs
Total Idle Time = [412.240083] secs
Busy Percentage = [10.619465]
Thread [TRANSF_1_3_1] created for [the transformation stage] of partition point [SQ_NON_SEARCH] has completed.
Total Run Time = [421.306325] secs
Total Idle Time = [212.582643] secs
Busy Percentage = [49.542024]
Thread work time breakdown:
Union: 44.549763 percent
AGG: 16.587678 percent
LKP_NON_SRCH: 38.862559 percent
Collect performance data
Performance Counters in WF Monitor
Target Bottleneck

Reader Thread | Transformation Thread |Transform |Writer Thread

(First Stage) (Second Stage) (Third Stage) (Fourth Stage)


Busy% Busy% Busy%=15 Busy%=95
Other Methods of Bottleneck Isolation
• Write to flat file
If significantly faster than relational target–Target
Bottleneck
• Place FALSE Filter right after Source Qualifier
If significantly faster–Transformation Bottleneck
• If target & transformation bottlenecks are ruled out–
Source Bottleneck
• Use a read test mapping for Source Bottleneck
Remove transformations and check if session
performance is same.
Target Optimization
• Target Optimization often involves non-Informatica
components
• Drop Indexes and Constraints
Use pre/post SQL to drop and rebuild
Use pre/post-load stored procedures
• Use constraint-based loading only when necessary
• Use Bulk Loading
Informatica bypasses the database log
Target cannot perform rollback
Weigh importance of performance over recovery
• Use External Loader
Similar to bulk loader, but the DB reads from a flat file
Source Bottlenecks
• Source optimization often involves non-Informatica
components
• Generated SQL available in session log
Execute directly against DB
Update statistics on DB
Used tuned SELECT as SQL override
• Set the Line Sequential Buffer Length session
property to correspond with the record size
• Avoid reading same data more than once
• Filter at source if possible (reduce data set)
• Minimize connected outputs from the source qualifier
Tuning Mapping Design
Basic guidelines
• Consistency - Naming conventions, Descriptions,
environments, documentation
• Modularity – Modular design, common error
handling, reprocessing
• Reusability – shortcuts, mapplets
• Scalability – caching, queries , partitioning, reduce
data set (reduce ports and rows).
• Simplicity – multiple simple processes, simple
queries, staging table.
Sources and Targets
• Use shortcuts
• Extract only what is necessary
• Limit reads on source
• Distinguish between similar sources and targets
• Apply to update non-key columns on the target.
Source Qualifier
• Apply default query when possible.
• SQL override
Adv – utilize d/b optimizers, can accommodate
complex queries
Co nt d.

Cons – impacts d/b resources, unable to utilize partitioning,


unable to utilize pushdown optimization option, lose
transformation logic in metadata searched.
• SQ can be used as a lookup.
Tip – put a copy of override query in desc to avoid losing it
when pressing ‘generate SQL query’
Transformations
• Calculate once, use many times
• Filter as early as possible
• Reduce data for transformations with caches.
• Avoid data type conversions – expensive
• Reduce coding outside Informatica
• Don’t have high precision for decimal type data unless
needed.
Expressions
• Functions are more expensive than operators
Use || instead of CONCAT()
• Use variable ports to factor out common logic
• Simplify nested functions when possible
Try DECODE instead of IIF
• Provide comments in expression editor.
Filters
• Consider SQ as a filter to limit rows within relational
sources
• Filter close to source
• Replace multiple filter with router
Aggregators
• Use sorted i/p
• Limit connected i/p o/p ports
• Filter data before aggregating
• Use early as possible
Joiners
• Perform joins in SQ when possible (relational srcs)
• Perform normal joins
• Join sorted i/p
• Source with fewer rows  master
Lookups
• Relational lkps should only return ports that meet
the condition
• Apply unnconnected lkp  to return only 1 port
• Use SQL override in lkp (comment out ‘order by’)
• Replace large lkp tables with joins in SQ when
possible.
• Use SQ as lkp table.
• Use persistent cache to save lkp cache files for re-
use.
• Apply cache calculator in session for huge volume of
data.
Anatomy of session
Memory optimization
Reader Bottleneck
Transformer Bottleneck
Writer Bottleneck
Tuning the DTM Buffer
• Buffer block size
Recommendation: at least 100 rows / block
Compute based on largest source or target row size
Typically not a significant bottleneck unless below 10
rows/buffer
• Number of blocks
Minimum of 2 blocks required for each source, target
and XML group
(number of blocks) =0.9 x ((DTM buffer size)/(buffer
block size))
Contd.
• Determine the minimum DTM buffer size
(DTM buffer size) =(buffer block size) x (minimum
number of blocks) / 0.9
• Increase by a multiple of the block size
• If performance does not improve, return to previous
setting
• There is no “formula” for optimal DTM buffer size
• Auto setting may be adequate for some sessions
Transformation Caches
• Temporary storage area for certain transformations
• Except for Sorter, each is divided into a Data & Index
Cache
• The size of each transformation cache is tunable
• If runtime cache requirement > setting, overflow
written to disk
• The default setting for each cache is Auto
Tuning the Transformation Caches
• If a cache setting is too small, DTM writes overflow to
disk
• Determine if transformation caches are overflowing:
Watch the cache directory on the file
system while the session runs
Use the session performance counters
• Options to tune:
• Increase the maximum memory allowed for Auto
transformation cache sizes
• Set the cache sizes for individual transformations
manually
Performance Counters
Tuning the Transformation Caches
• Non-0 counts for readfromdisk and writetodisk
indicate sub-optimal settings for transformation
index or data caches
• This may indicate the need to tune transformation
caches manually
• Any manual setting allocates memory outside of
previously set maximum
• Cache Calculators provide guidance in manual tuning
of transformation caches
Aggregator Caches
• Unsorted Input
• Must read all input before releasing any output
rows
• Index cache contains group keys
• Data cache contains non-group-by ports
• Sorted Input
• Releases output row as each input group is
processed
• Does not require data or index cache (both =0)
• May run much faster than unsorted BUT must
consider the expense of sorting
Joiner Caches: Unsorted Input
Lookup Caches
• To cache or not to cache?
• Large number of invocations–cache
• Large lookup table–don’t cache
• Flat file lookup is always cached
• Data cache
• Only connected output ports included in data cache
• For unconnected lookup, only “return” port
included in data cache
• Index cache size
• Only lookup keys included in index cache
Lookup Caches
• Lookup Transformation–Fine-tuning the Cache
• SQL override
• Persistent cache (if the lookup data is static)
• Optimize sort
• Default- lookup keys, then connected output ports in port order
• Can be commented out or overridden in SQL override
• Indexing strategy on table may impact performance
• Use Any Value property suppresses sort
• Can build lookup caches concurrently
• May improve session performance when there is significant activity
upstream from the lookup & the lookup cache is large
• This option applies to the individual session
Performance Tuning features
Pushdown optimization
• Push transformation logic to source or target d/b.
• Executes SQL against src or tgt d/b instead of processing
transformation logic within the IS
Recommendations:
• Use when there is a large mismatch in the processing power
of Informatica server and the d/b server.
• Some transformations can never be ‘pushed down’ because
they may have multiple connections. (Jnr, lkp, union, target)
• Connections properties must be identical. (connect string,
code page, connection environment SQL, Transaction
environment SQL).
Performance Tuning features
Pipeline partitioning
• Improves session performance by creating threads to move
data down the pipeline.
• The data is moved in pipeline stages defined by partition
points; stages run in parallel.
• By default, there is a partition point at the SQ, Target,
Aggregator and Rank transformations.
• Cannot add a partition on certain transformations – sequence
generator, unconnected lkp and the source definition.
• Partition types – pass through, key range, round robin, hash
auto keys, hash user keys, database.
Partition Recommendations
• Make sure you have ample CPU BW and memory.
• Make sure you have gone through other optimization
techniques.
• Add one partition at a time and monitor – if CPU usage is
closer to 100%, don’t add any more.
• Multiply the DTM buffer size by the number of partitions.
• Multiply the transformation cache sizes for aggregators, ranks,
joiners & sorters by the number of partitions.
• Partition the source data evenly.
• If you have >1 partitions, add where data needs to be
redistributed – aggregator, rank or sorter where data must be
grouped, or where data is distributed unevenly.
64 bit vs. 32 bit OS
• Take advantage of large memory support in 64-bit
• Cache based transformations like Sorter, Lookup, Aggregator,
Joiner, and XML Target can address larger blocks of memory

How to increase Informatica Server performance:


• Check hard disks on related machines (slow disk access can
slow down session performance)
• Improve the network speed.
• Check CPU’s on related machines.
• Configure physical memory for the Informatica Server to
minimize disk I/O.
• Optimize database configuration
Contd.
• Staging areas. If you use a staging area, you force the
Informatica Server to perform multiple passes on your data.
Where possible, remove staging areas to improve
performance
• You can run multiple Informatica Servers on separate systems
against the same repository. Distributing the session load to
separate Informatica Server systems increases performance
Maximum Memory Allocation Example
• Parameters
• 64 Bit OS
• Total system memory: 32 GB
• Maximum allowed for transformation caches: 5 GB
or 10%
• DTM Buffer: 24 MB
• One transformation manually configured
Index Cache: 10 MB
Data Cache: 20 MB
• All other transformations set to Auto
Maximum Memory Allocation Example
• Result
• 10% = 3.2 GB < 5 GB:
max allowed for transformation caches =3.2GB=3200
MB
• Manually configured transformation uses 30 MB
• DTM Buffer uses 24 MB
• 3200 + 30 + 24 = 3254 MB
• Note that 3254 MB represents an upper limit;
cached
transformations may use less than the 3200 MB max
Summary
This presentation showed you how to:
• Approach the performance tuning challenge
• Create a performance tuning test environment
• Identify Bottlenecks
• Test for CPU (thread) utilization
• Tune mappings and transformations
• Test and adjust memory and cache usage

You might also like