Performance Tuning PDD FINAL

1
Performance Tuning
Version 8.6
Bert Peters
Global Education Services, Principal Instructor
2
Objectives
After completing this course you will be able to:

• Control how PowerCenter uses memory
• Control how PowerCenter uses CPUs
• Understand the performance counters
• Isolate source, target and engine bottlenecks
• Tune different types of bottlenecks
• Configure Workflow and Session on Grid
3
Agenda
• Memory optimization
• Performance tuning methodology
• Tuning source, target, & mapping bottlenecks
• Pipeline partitioning
• Server Grid
• Q&A
• Course evaluation
4
Anatomy of a Session
Integration Service
Data Transformation Manager

(DTM)
DTM Buffer
Source Target
WRITER data
data READER
Transformation
caches
TRANSFORMER
5
Memory Optimization
DTM Buffer
READER WRITER
TRANSFORMER
Transformation Caches
6
DTM Buffer
• Temporary storage area for data

• Buffer is divided into blocks
• Buffer size and block size are tunable
• Default setting for each is Auto
7
DTM Buffer Size – Session Property
• Default is Auto meaning DTM estimates optimal size

• Check session log for actual size allocation
8
DTM Buffer Block Size
• Default is Auto
• Check session log for actual size allocation
9
Reader Bottleneck
Transformer & writer threads wait for data
DTM Buffer
waiting
READER WRITER
Slow reader
waiting waiting
TRANSFORMER
10
Transformer Bottleneck
Reader waits for free blocks; writer waits for data
DTM Buffer
waiting waiting
READER WRITER
TRANSFORMER
Slow transformer
11
Writer Bottleneck
Reader & transformer wait for free blocks
DTM Buffer
waiting
READER WRITER
Slow writer
waiting waiting
TRANSFORMER
12
Source Row Logging
DTM Buffer
waiting
READER WRITER
TRANSFORMER
Source rows must remain in the buffers until transformation/
writer threads process corresponding rows downstream
13
Large Commit Interval
DTM Buffer
waiting
READER WRITER
TRANSFORMER
Target rows remain in the buffers until the DTM reaches the
commit point
14
Tuning the DTM Buffer
Extra buffers can keep threads busy

DTM Buffer
READER WRITER
TRANSFORMER
15
• Temporary slowdowns in reading, transforming

or writing may cause large fluctuations in
throughput
• A “slow” thread typically provides data in
spurts
• Extra memory blocks can act as a “cushion”,
keeping other threads busy in case of a
bottleneck
16
• Buffer block size

• Recommendation: at least 100 rows / block
• Compute based on largest source or target row size
• Typically not a significant bottleneck unless below 10
rows/buffer
• Number of blocks
• Minimum of 2 blocks required for each source, target and
XML group
• (number of blocks) =
0.9 x ((DTM buffer size)/(buffer block size))
17
• Determine the minimum DTM buffer size

(DTM buffer size) =
(buffer block size) x (minimum number of blocks) / 0.9
• Increase by a multiple of the block size

• If performance does not improve, return to
previous setting
• There is no “formula” for optimal DTM buffer size
• Auto setting may be adequate for some sessions
18
Transformation Caches
• Temporary storage area for certain transformations

• Except for Sorter, each is divided into a Data & Index
Cache
• The size of each transformation cache is tunable
• If runtime cache requirement > setting, overflow
written to disk
• The default setting for each cache is Auto
19
Tuning the Transformation Caches
Default is Auto
20
Max Memory for Transformation Caches
Only applies to transformation caches set to Auto
21
Max Memory for Transformation Caches
• Two settings: fixed number & percentage

• System uses the smaller of the two
• If either setting is 0, DTM assigns a default size to each
transformation cache that’s set to Auto
• Recommendation: use fixed limit if this is the

only session running; otherwise, use percentage
• Use percentage if running in grid or HA
environment
22
• If a cache setting is too small, DTM writes

overflow to disk
• Determine if transformation caches are
overflowing:
• Watch the cache directory on the file system while the
session runs
• Use the session performance counters
• Options to tune:
• Increase the maximum memory allowed for Auto
transformation cache sizes
• Set the cache sizes for individual transformations manually
23
Session Performance Counters
24
Performance Counters
25
• Non-0 counts for readfromdisk and writetodisk

indicate sub-optimal settings for transformation
index or data caches
• This may indicate the need to tune transformation
caches manually
• Any manual setting allocates memory outside of
previously set maximum
• Cache Calculators provide guidance in manual
tuning of transformation caches
26
Aggregator Caches
• Unsorted Input
• Must read all input before releasing any output rows
• Index cache contains group keys
• Data cache contains non-group-by ports
• Sorted Input
• Releases output row as each input group is processed
• Does not require data or index cache
(both =0)
• May run much faster than unsorted BUT
must consider the expense of sorting
27
Aggregator Caches – Manual Tuning
28
Joiner Caches: Unsorted Input
MASTER
Staging algorithm:
All master data loaded
into cache
Specify smaller data

set as master
DETAIL
• Index cache contains join keys

• Data cache contains non-key connected outputs
29
Joiner Caches: Sorted Input
Streaming algorithm: MASTER

Both inputs must be
sorted on join keys
Selected master data

loaded into cache
Specify data set with

fewest records under a
single key as master DETAIL
• Index cache contains up to 100 keys

• Data cache contains non-key connected outputs
associated with the 100 keys
30
Joiner Caches – Manual Tuning
Cache calculator detects the sorted input property
31
Lookup Caches
• To cache or not to cache?

• Large number of invocations – cache
• Large lookup table – don’t cache
• Flat file lookup is always cached
32
Lookup Caches
• Data cache
• Only connected output ports included in data cache
• For unconnected lookup, only “return” port included in
data cache
• Index cache size

• Only lookup keys included in index cache
33
Lookup Caches
• Lookup Transformation – Fine-tuning the

Cache
• SQL override
• Persistent cache (if the lookup data is static)
• Optimize sort
• Default- lookup keys, then connected output ports in port order
• Can be commented out or overridden in SQL override
• Indexing strategy on table may impact performance
• Use Any Value property suppresses sort
34
Lookup Caches
• Can build lookup caches concurrently

• May improve session performance when there is significant
activity upstream from the lookup & the lookup cache is large
• This option applies to the individual session
• Integration Service builds lookup caches at the

beginning of the session run, even if no row has
entered a Lookup transformation
Session properties > Config

Object tab > Advanced settings
35
Lookup Caches – Manual Tuning
36
Rank Caches
• Index cache contains group keys

• Data cache contains non-group-by ports
• Cache sizes related to the number of groups &
the number of ranks
37
Rank Caches – Manual Tuning
38
Sorter Cache
• Sorter Transformation
• May be faster than a DB sort or 3rd party sorter
• Index read from RDB = pre-sorted data
• SQL SELECT DISTINCT may reduce the volume of data
across the network versus sorter with “Distinct” property set
• Single cache
(no separation of index & data)
39
Sorter Cache – Manual Tuning
40
64 bit vs. 32 bit OS
• Take advantage of large memory support in 64-

bit
• Cache based transformations like Sorter,
Lookup, Aggregator, Joiner, and XML Target can
address larger blocks of memory
41
Maximum Memory Allocation Example
• Parameters
• 64 Bit OS
• Total system memory: 32 GB
• Maximum allowed for transformation caches: 5 GB or 10%
• DTM Buffer: 24 MB
• One transformation manually configured
Index Cache: 10 MB
Data Cache: 20 MB
• All other transformations set to Auto
42
Maximum Memory Allocation Example
• Result
• 10% = 3.2 GB < 5 GB:
max allowed for transformation caches = 3.2 GB = 3200
MB
• Manually configured transformation uses 30 MB
• DTM Buffer uses 24 MB
• 3200 + 30 + 24 = 3254 MB
• Note that 3254 MB represents an upper limit; cached
transformations may use less than the 3200 MB max
43
Performance Tuning Methodology
• It is an iterative process
• Establish benchmark
• Optimize memory
• Isolate bottleneck
• Tune bottleneck
• Take advantage of under-utilized CPU & memory
44
The Production Environment
• Multi-vendor, multi-system environment with

many components:
• Operating systems, databases, networks and I/O
• Usually need to monitor performance in several places
• Usually need to monitor outside Informatica as well
Disk Disk
Disk Disk Disk Disk

LAN / DBMS
Disk WAN OS Disk
Disk Disk
Disk PowerCenter Disk
Disk Disk
45
The Production Environment
• Tuning involves an iterative approach

1. Identify the biggest performance problem
2. Eliminate or reduce it
3. Return to step 1
Disk Disk
Disk Disk Disk Disk

LAN / DBMS
Disk WAN OS Disk
Disk Disk
Disk PowerCenter Disk
Disk Disk
46
Preliminary Steps
• Eliminate transformation errors & data rejects

“first make it work, then make it faster”
• Source row logging requires reader to hold onto buffers
until data is written to target, EVEN IF THERE ARE NO
ERRORS; can significantly increase DTM buffer
requirement
• You may want to set stop on errors to 1
47
Preliminary Steps
• Override tracing level to terse or normal

• Override at session level to avoid having to examine each
transformation in the mapping
• Only use verbose tracing during development & only with
very small data sets
• If you expect row errors that you will not need to correct,
avoid logging them by overriding the tracing level to terse
(not recommended as a long-term error handling solution)
48
Benchmarking
• Hardware (CPU bandwidth, RAM, disk space,

etc.) should be similar to production
• Database configuration should be similar to
production
• Data volume should be similar to production
• Challenge: production data is constantly
changing
• Optimal tuning may be data dependent
• Estimate “average” behavior
• Estimate “worst case” behavior
49
Benchmarking – Conditional Branching
Scenario: a high percentage of test data goes to TARGET1;

but a high percentage of production data goes to TARGET2
Tuning of sorter & aggregator could be overlooked in test
50
Scenario: a high percentage of production data goes to

TARGET1 on Monday’s load; but a high percentage of production
data goes to TARGET2 on Tuesday’s load
Performance of 2 loads may differ significantly
51
• Conditional branching poses a challenge in

performance tuning
• Volume & CHARACTERISTICS of data should be
consistent between test & production
• May need to estimate average behavior
• May want to tune for worst-case scenario
52
Identifying Bottlenecks
• The first challenge is to identify the bottleneck

• Target
• Source
• Transformations
• Mapping/Session
• Tuning the most severe bottleneck may reveal

another one
• This is an iterative process
53
Thread Statistics
• The DTM spawns multiple threads

• Each thread has busy time & idle time
• Goal – maximize the busy time & minimize the
idle time
54
Thread Statistics - Terminology
• A pipeline consists of:

• A source qualifier
• The sources that feed that source qualifier
• All transformations and targets that receive data from that
source qualifier
55
PIPELINE 1 A pipeline on the master

MA
input of a joiner terminates
ST
ER at the joiner
DETAIL
PIPELINE 2
56
• Stage
a portion of a pipeline; implemented at runtime
as a thread
• Partition Point
boundary between 2 stages; always associated
with a transformation
57
Using Thread Statistics
• By default PowerCenter assigns a partition point

( ) at each Source Qualifier, Target, Aggregator
and Rank.
Partition Points
Reader Thread Transformation Thread Transform Writer Thread

Thread
(First Stage) (Second Stage) (Third Stage) (Fourth Stage)
58
Target Bottleneck
• The Aggregator transformation stage is waiting

for target buffers

Thread
Busy% Busy% Busy%=15 Busy%=95
59
Transformation Bottleneck
• Both the reader & writer are waiting for buffers

Thread
Busy%=15 Busy%=60 Busy%=95 Busy%=10
60
Thread Statistics in Session Log
***** RUN INFO FOR TGT LOAD ORDER GROUP [1], CONCURRENT SET [1] *****
Thread [READER_1_1_1] created for [the read stage] of partition point
[SQ_SortMergeDataSize_Detail] has completed.
Total Run Time = [318.271977] secs
Total Idle Time = [176.488675] secs
Busy Percentage = [44.547843]
Thread [TRANSF_1_1_1] created for [the transformation stage] of partition point
[SQ_SortMergeDataSize_Detail] has completed.
Total Run Time = [707.803168] secs
Total Idle Time = [105.303059] secs
Busy Percentage = [85.122550]
Thread work time breakdown:
JNRTRANS: 10.869565 percent
SRTTRANS: 89.130435 percent
61
Performance Counters in WF Monitor
62
Integration Service Monitor in WFMonitor
63
Session Statistics in WFMonitor
64
Other Methods of Bottleneck Isolation
• Write to flat file

If significantly faster than relational target – Target
Bottleneck
• Place FALSE Filter right after Source Qualifier
If significantly faster – Transformation Bottleneck
• If target & transformation bottlenecks are ruled
out – Source Bottleneck
65
Target Optimization
• Target Optimization often involves non-

Informatica components
• Drop Indexes and Constraints
• Use pre/post SQL to drop and rebuild
• Use pre/post-load stored procedures
• Use constraint-based loading only when

necessary
66
Target Optimization
• Use Bulk Loading

• Informatica bypasses the database log
• Target cannot perform rollback
• Weigh importance of performance over recovery
• Use External Loader

• Similar to bulk loader, but the DB reads from a flat file
67
Target Optimization
Transaction Control
• Target commit type
• Best performance, least precise control
• System avoids writing partially-filled buffers
• Source commit type
• Last active source to feed a target becomes a transaction generator
• Commit interval provides precise control
• Slower than target commit type
• Avoid setting commit interval too low
• User Defined commit type
• Required when mapping contains transaction control transformation
• Provides precise data-driven control
• Slower than target and source commit types
68
Target Optimization
• “update else insert” session property

• Works well if you rarely insert
• Index required for update key but slows down insert
• PowerCenter must wait for database to return error before
inserting
• Alternative – lookup followed by update strategy
69
Source Bottlenecks
• Source optimization often involves non-

Informatica components
• Generated SQL available in session log
• Execute directly against DB
• Update statistics on DB
• Used tuned SELECT as SQL override
• Set the Line Sequential Buffer Length session

property to correspond with the record size
70
Source Bottlenecks
• Avoid transferring more than once from remote

machine
• Avoid reading same data more than once
• Filter at source if possible (reduce data set)
• Minimize connected outputs from the source
qualifier
• Only connect what you need
• The DTM only includes connected outputs when it
generates the SQL SELECT statement
71
Reduce Data Set
• Remove Unnecessary Ports

• Not all ports are needed
• Fewer ports = better performance & lower memory req.
• Reduce Rows in Pipeline

• Place Filter Transformation as far upstream as possible
• Filter before aggregator, rank, or sorter if possible
• Filter in source qualifier if possible
72
Avoid Unnecessary Sorting
XML_PARSER_ srt_ENT_EXCH_
PME_EQT_ENT IDNT_SEDOL
_v1_2
srt_ENT_EXCH_
IDNT_GRP
srt_ENT_EXCH
ANGE_GRP
srt_ENT_MKT_I
DNT_GRP
srt_ENT_MKT_
GRP1
srt_ENTITLEME jnr_ENT_TO_M srt_ENT_MKT_ jnr_ENT_MKT_ srt_ENT_MKT_ jnr_ENT_MKTG srt_ENT_EXCH_ jnr_ENT_EXCH_ srt_ENT_EXCH_

T KT_GRP GRP GRP_TO_MKT_ AND_MKT_IDN RP_TO_EXCHG GRP_PK GRP_TO_EXCH IDNT_GRP_PK
IDNT_GRP T_GRP RP_WITH_MKT _IDNT
_CODES
jnr_ENT_EXCH_
srt_ENT_EXCH_ jnr_ENT_EXCH_ srt_ENT_EXCH_ IDNT_GRP_TO srt_ENT_EXCH_
IDNT_GRP_RIC IDNT_GRP_TO CODE_PK _SEDOL GRP_PK2
_RIC
srt_ENT_EXCH_ jnr_ENT_EXCH_
IDNT_TICKER_ IDNT_GRP_TO
SYM _TICK_SYM
srt_ENT_EXCH_
IDNT_BBT_EXC
H_TICKR
73
Expressions Language Tips
• Functions are more expensive than operators

• Use || instead of CONCAT()
• Use variable ports to factor out common logic
74
Expressions Language Tips
• Simplify nested functions when possible
instead of:
IIF(condition1,result1,IIF(condition2,
result2,IIF… ))))))))))))
try:
DECODE (TRUE,
condition1, result1,
:
conditionn, resultn)
75
General Guidelines
• Data Type Conversions are expensive, avoid if

possible
• All-input transformations (such as Aggregator,
Rank etc) are more expensive than pass-through
transformations
• An all-input transformation must process multiple input
rows before it can produce any output
76
General Guidelines
• High precision (session property) is expensive

but only applies to “decimal” data type
• UNICODE requires 2 bytes per character; ASCII
requires 1 byte per character
• Performance difference depends on number of string ports
only.
77
Transformation Specific
Reusable Sequence Generator –

Number of Cached Values Property
• Purpose: enables different sessions to share the
same sequence without generating the same
numbers
• >0: allocates the specified number of values &
updates the current value in the repository at the
end of each block
(each session gets a different block of numbers)
78
Transformation Specific
Reusable Sequence Generator –

Number of Cached Values Property
• Setting too low causes frequent repository
access, which impacts performance
• Unused values in a block are lost; this leads to
gaps in the sequence
• Consider alternatives
example: non-reusable sequence generators,
one generates even numbers, & the other
generates odd numbers
79
Other Transformations
• Normalizer
• This transformation INCREASES the number of rows
• Place as far downstream as possible
• XML Reader/ Mid Stream XML Parser

• Remove groups that are not projected
• We do not allocate memory for these groups, but still need
to maintain PK/FK relationships
• Don’t leave port size lengths as infinite. Use appropriate
length.
80
Iterative Process
• After tuning your bottlenecks, revisit memory

optimization
• Tuning often REDUCES memory requirements
(you might even be able to change some settings back to Auto)
• Change one thing at a time & record your results
81
Partitioning
• Apply after optimizing source, target, &

transformation bottlenecks
• Apply after optimizing memory usage
• Exploit under-utilized CPU & memory
• To customize partitioning settings, you need the
partitioning license
82
Partitioning Terminology
• Partition
subset of the data
• Stage
a portion of a pipeline
• Partition Point
boundary between 2 stages
• Partition Type
algorithm for distributing data among partitions;
always associated with a partition point
83
Threads, Partition Points and Stages
• The DTM implements each stage as a thread;

hence, stages run in parallel
• You may add or remove partition points

Thread
84
Rules for Adding Partition Points
• You cannot add a partition point to a Sequence

Generator
• You cannot add a partition point to an unconnected
transformation
• You cannot add a partition point on a source
definition
• If a pipeline is split and then concatenated, you
cannot add a partition point on any transformation
between the split and the concatenation
• Adding or removing partition points requires the
partitioning license
85
Guidelines for Adding Partition Points
• Make sure you have ample CPU bandwidth

• Make sure you have gone through other optimization
techniques
• Add on complex transformations that could benefit
from additional threads
• If you have >1 partitions, add where data needs to be
re-distributed
• Aggregator, Rank, or Sorter, where data must be grouped
• Where data is distributed unevenly
• On partitioned sources and targets
86
Partition Points & Partitions
• Partitions subdivide the data

• Each partition represents a thread within a stage
• Each partition point distributes the data among the partitions
Threads - partition 1
Threads – partition 2
Threads – partition 3
3 Reader Threads 3 Transformation Threads 3 more trans threads 3 Writer Threads

87
Session Partitioning GUI
• The number next to each flag shows the number of partitions

• The color of each flag indicates the partition type
88
Rules for Adding Partitions
• The master input of a joiner can only have 1 partition

unless you add a partition point at the joiner
• A pipeline with an XML target can only have 1
partition
• If the pipeline has a relational source or target and
you define n partitions, each database must support
n parallel connections
• A pipeline containing a custom or external
procedure transformation can only have 1 partition
unless those transformations are configured to allow
multiple partitions
89
Rules for Adding Partitions
• The number of partitions is constant on a given

pipeline
• If you have a partition point on a Joiner, the number of
partitions on both inputs will be the same
• At each partition point, you can specify how you

want the data distributed among the partitions
(this is known as the partition type)
90
Guidelines for Adding Partitions
• Make sure you have ample CPU bandwidth &

memory
• Make sure you have gone through other
optimization techniques
• Add 1 partition at a time & monitor the CPU
• When CPU usage approaches 100%, don’t add anymore
partitions
• Take advantage of database partitioning
91
Partition Types
• Each partition point is associated with a partition

type
• The partition type defines how the DTM is to
distribute the data among the partitions
• If the pipeline has only 1 partition, the partition
point serves only to add a stage to the pipeline
• There are restrictions, enforced by the GUI, on
which partition types are valid at which partition
points
92
Partition Types – Pass Through
• Data is processed without redistributing the rows

among partitions
• Serves only to add a stage to the pipeline
• Use when you want an additional thread for a
complex transformation but you don’t need to
redistribute the data (or you only have 1 partition)
93
Partition Types – Key Range
• The DTM passes data to each partition depending on

user-specified ranges
• You may use several ports to form a compound
partition key
• The DTM discards rows not falling in any specified
range
• If 2 or more ranges overlap, a row can go down more
than 1 partition resulting in duplicate data
• Use key range partitioning when the sources or
targets in the pipeline are partitioned by key range
94
Partition Types – Round Robin
• The Integration Service distributes rows of data

evenly to all partitions
• Use when there is no need to group data among
partitions
• Use when reading flat file sources of different
sizes
• Use when data has been partitioned unevenly
upstream and requires significantly more
processing before arriving at the target
95
Partition Types – Hash Auto Keys
• The DTM applies a hash function to a partition

key to group data among partitions
• Use hash partitioning to ensure that groups of
rows are processed in the same partition
• The DTM automatically determines the partition
key based on:
• aggregator or rank group keys
• join keys
• sort keys
96
Partition Types – Hash User Keys
• This is similar to hash auto keys except the user

specifies which ports make up the partition key
• Alternative to hard-coded key range partition on
relational target (if DB table is partitioned)
97
Partition Types – Database
• Only valid for DB2 and Oracle databases in a

multi-node database
• Sources: Oracle and DB2
• Targets: DB2 only
• The number of partitions does not have to equal

the number of database nodes
• Performance may be better if they are equal, however
98
Partitioning with Relational Sources
• PowerCenter creates a separate source

database connection for each partition
• If you define n partitions, the source database
must support n parallel connections
• The DTM generates a separate SQL Query for
each partition
• Each query can be overridden
• PowerCenter reads the data concurrently
99
Partitioning with Flat File Sources
• Multiple flat files

• Each partition reads a different file
• PowerCenter reads the files in parallel
• If the files are of unequal sizes, you may want to repartition
the data round-robin
• Single flat file

• PowerCenter makes multiple parallel connections to the
same file based on the number of partitions specified
• PowerCenter distributes the data randomly to the partitions
• Over a large volume of data, this random distribution tends
to have an effect similar to round robin—partition sizes
tend to be equal
100
Partitioning with Relational Targets
• The DTM creates a separate target database

connection for each partition
• The DTM loads data concurrently
• If you define n partitions, database must support
n concurrent connections
101
Partitioning with Flat File Targets
• The DTM writes output for each partition to a

separate file
• Connection settings and properties can be
configured for each partition
• The DTM can merge the target files if all have
connections local to the Integration Service
machine
• The DTM writes the data concurrently
102
Partitioning—Memory Requirements
• Minimum number of buffer blocks multiplied by

number of partitions
(2 blocks per source, target, & XML group) x
(number of partitions)
• Optimal number of buffer blocks =

(optimal number for 1 partition) x
(number of partitions)
103
Cache Partitioning
• DTM may create separate caches for each

partition for each cached transformation; this is
called cache partitioning
• DTM treats cache size settings as per partition
for example, if you configure an aggregator with:
2 MB for the index cache,
3 MB for the data cache,
& you create 2 partitions–
DTM will allocate up to 4 MB & 6 MB total
• DTM does not partition lookup or joiner caches

unless the lookup or joiner itself is a partition
point
104
Cache Partitioning
Index cache Each partition

has its own
Data cache cache(s)
Index cache
Data cache
Sorter cache
Sorter cache
105
Cache Partitioning
Index cache With a partition

Data cache point on the joiner,
each partition
has its own
Index cache cache(s)
Data cache
106
Cache Partitioning
With no
partition point
on the joiner,
however, all
Index cache partitions
share 1 set of
Data cache
caches
107
Monitoring Partitions
• The Workflow Monitor provides runtime details

for each partition
• Per partition, you can determine the following:
• Number of rows processed
• Memory usage
• CPU usage
• If one partition is doing more work than the

others, you may want to redistribute the data
108
Pipeline Partitioning Example
• Scenario:
• Student record processing
• XML source and Oracle target
• XML source is split into 3 files
109
Partition 1
Partition 2
Partition 3
Solution: Define a partition for each of the 3 source files
110
RR
RR
RR
Problem: Source files vary in size, resulting

in unequal workloads for each partition
Solution: Use Round Robin partitioning on the
filter to balance load
111
RR H
RR H
RR H
Problem: Potential for splitting rank groups

Solution: Use hash auto-keys partitioning on the rank
to group rows appropriately
112
RR H K
RR H K
RR H K
Problem: Target tables are partitioned on Oracle by key range

Solution: Use target Key Range partitioning to optimize writing
to target tables
113
Dynamic Partitioning
• Integration Service can automatically set the

number of partitions at runtime.
• Useful when the data volume increases or the
number of CPU’s available changes.
• Basis for the number of partitions is specified as
a session property
114
Concurrent Workflow Execution (8.5)
• Prior to 8.5
• Only one instance of Workflow can run
• Users duplicate workflows – maintenance issues
• Concurrent sessions required duplicate of session
115
Concurrent Workflow Execution
• Allow workflow instances to be run

concurrently
• Override parameters/ variables across run
instances
• Same scheduler across multiple instances
• Supports independent recovery/ failover
semantics
116
Concurrent Workflow Execution
117
Workflow on Grid (WonG)
• Integration Service is deployed on a Grid – an IS

service process (pmserver) runs on each node in
the grid
• Allows tasks of a workflow to be distributed
across a grid – no user configuration necessary
if all nodes homogenous
118
Workflow on Grid (WonG)
• Different sessions in a workflow are dispatched

on different nodes to balance load
• Use workflow on grid if:
• There are many concurrent sessions and workflows
• Leverage multiple machines in the environment
119
Load Balancer Modes
• Round Robin
• Honors Max Number of Processes per Node
• Metric-based
• Evaluates nodes in round-robin
• Honors resource provision thresholds
• Uses stats from last 3 runs - if no statistics is collected yet,
defaults used (40 MB memory, 15% CPU)
120
Load Balancer Modes
• Adaptive
• Selects node w/ the most available CPU
• Honors resource provision thresholds
• Uses statistics from last 3 runs of a task to determine whether a
task can run on a node
• Bypass in dispatch queue: skip tasks in the queue that are more
resource intensive and can’t be dispatch to any currently
available nodes
• CPU Profile - Ranks node CPU performance against a baseline
system
• All modes take into account the service level

assigned to workflows
121
Session on Grid (SonG)
• Session partitioned and dispatched across

multiple nodes
• Allows Unlimited Scalability
• Source and targets may be on different nodes
• More suited for large sessions
• Smaller machines in a grid is a lower cost option
than large multi-CPU machines
122
Session on Grid (SonG)
• Session on Grid will scale if:

• Sessions are CPU/memory intensive and overcomes
overhead of data movement over network
• I/O is kept localized to each node running the partition
• There is a fast shared storage (e.g. NAS, clustered FS)
• Partitions are independent
• Source and target have different connections that are
only available on different machines
• E.g. source Excel files on Windows and target is only
available on UNIX
• Supported on a homogeneous grid
123
Configuring Session on Grid
• Enable Session on Grid attribute in session configuration

tab
• Assign workflow to be executed by an integration service
that has been assigned to a grid
124
Dynamic Partitioning
• Based on user specification (# partitions)

• Can parameterize as $DynamicPartitionCount
• Based on # of nodes in grid

• Based on source partitioning (Database partitioning)
125
SonG Partitioning Guidelines
• Set # of partitions = # of nodes to get an even

distribution
• Tip: use dynamic partitioning feature to ease expansion of
grid
• In addition, continue to create partition-points to

achieve parallelism
126
SonG Partitioning Guidelines
• To minimize data traffic across nodes:

• Use pass-through partition type which will try to keep
transformations on the same node
• Use resource-map to dispatch the source and target
transformations to the node where source or target are
located
• Keep the target files unmerged whenever possible (e.g. if
being used for staging)
• Resource requirement should be specified at the

lowest granularity e.g. transformation instead of
session (as far as possible)
• This will ensure better distribution in SonG
127
File Placement Best Practices
• Files that should be placed on a high-bandwidth shared

file system (CFS / NAS)
• Source files
• Lookup source files [sequential file access]
• Target files [sequential file access]
• Persistent cache files for lookup or incremental aggregation [random file
access]
• Files that should be placed on a shared file system but

bandwidth requirement is low (NFS)
• Parameter files
• Other configuration files
• Indirect source or target files
• Log files.
128
File Placement Best Practices
• Files that should be put on local storage

• Non-persistent cache files (i.e. sorter temporary files)
• Intermediate target files for sequential merge
• Other temporary files created during a session execution
• $PmTempFileDir should point to a local file system
• For best performance, ensure sufficient

bandwidth for shared file system and local
storage (possibly by using additional disk i/o
controllers)
129
Data Integration Certification Path
Level Certification Title Recommended Training Required Exams
Informatica Certified » PowerCenter QuickStart (eLearning) »Architecture & Administration;

Administrator » PowerCenter 8.5+ Administrator (4 days) »Advanced Administration

Developer » PowerCenter 8.5+ Administrator (4 days) »Mapping Design
» PowerCenter Developer 8.x Level I (4 days) »Advanced Mapping Design
» PowerCenter Developer 8 Level II (4 days)

Consultant » PowerCenter 8.5+ Administrator (4 days) »Advanced Administration
» PowerCenter Developer 8.x Level I (4 days) »Mapping Design
» PowerCenter Developer 8 Level II (4 days) »Advanced Mapping Design
»Enablement Technologies
» PowerCenter 8 Data Migration (4 days)
» PowerCenter 8 High Availability (1 day)
Additional Training:
» PowerCenter 8.5 New Features » PowerCenter 8 Team-Based Development
» PowerCenter 8.6 New Features » PowerCenter 8.5 Unified Security `
» PowerCenter 8 Upgrade
130
Q&A
Bert Peters
131
Course Evaluation
Bert Peters
132
Appendix
Informatica Services by
Solution
133
B2B Data Exchange
Recommended Services
B2B
Professional Services Education Services

Strategy Engagements Recommended Courses
• B2B Data Transformation • Informatica B2B Data
Architectural Review Transformation (D)
Baseline Engagements • Informatica B2B Data Exchange
(D)
• B2B Data Transformation
Baseline Architecture
Implement Engagements
• B2B Full Project Lifecycle
• Transaction/Customer/
Payment Hub
Target Audience for Courses

D = Developer M = Project Manager
A = Administrator
134
Data Governance

• Informatica Environment • PowerCenter Level I Developer (D)
Assessment Service • Informatica Data Explorer (D)
• Metadata Strategy and Enablement • Informatica Data Quality (D)
• Data Quality Audit
Related Courses
Baseline Engagements • PowerCenter Administrator (A)
• Data Governance Implementation • Metadata Manager (D)
• Metadata Manager Quick Start
• Informatica Data Quality Baseline
Deployment Certifications:
• PowerCenter
Implement Engagements • Data Quality
• Metadata Manager Customization
• Data Quality Management
Implementation
A = Administrator
135
Data Migration
Data Migration

• Data Migration Readiness • Data Migration (M)
Assessment • Informatica Data Explorer (D)
• Informatica Data Quality Audit • Informatica Data Quality (D)
Baseline Engagements • PowerCenter Level I Developer (D)
• PowerCenter Baseline Deployment Related Courses
• Informatica Data Quality (IDQ), • PowerExchange Basics (D)
and/or Informatica Data Explorer • PowerCenter Administrator (A)
(IDE) Baseline Deployment
Certifications
• PowerCenter
• Data Migration Jumpstart
• Data Quality
• Data Migration End-to-End
Implementation

A = Administrator
136
Data Quality
Data Quality

• Data Quality Management Strategy • Informatica Data Explorer (D)
• Informatica Data Quality Audit • Informatica Data Quality (D)
Baseline Engagements Related Courses
• Informatica Data Quality (IDQ), • Informatica Identity Resolution (D)
and/or Informatica Data Explorer • PowerCenter Level I Developer (D)
(IDE) Baseline Deployment
• Informatica Data Quality Web Certifications
Services Quick Start • Data Quality
• Data Quality Management
Implementation

A = Administrator
137
Data Synchronization
Data
Synchronization

• Project Definition and Assessment • PowerCenter Level I Developer (D)
Baseline Engagements • PowerCenter Level II Developer (D)
• PowerExchange Baseline • PowerCenter Administrator (A)
Architecture Deployment
Related Courses
• PowerCenter Baseline Architecture
• PowerExchange Basics Oracle Real-
Deployment
Time CDC (D)
Implement Engagements • PowerExchange SQL RT (D)
• Data Synchronization • PowerExchange for MVS DB2 (D)
Implementation
Certifications
• PowerCenter

A = Administrator
138
Enterprise Data Warehousing
Data Warehouse

• Enterprise Data Warehousing (EDW) • PowerCenter Level I Developer (D)
Strategy • PowerCenter Level II Developer (D)
• Informatica Environment • PowerCenter Metadata Manager (D)
Assessment Service
• Metadata Strategy & Enablement Related Courses
• Informatica Data Quality (D)
Baseline Engagements • Data Warehouse Development (D)
• PowerCenter Baseline Architecture
Deployment Certifications
• PowerCenter
• EDW Implementation

A = Administrator
139
Integration Competency Centers
ICC

• ICC Assessment • ICC Overview (M)
Baseline Engagements • PowerCenter Level I Developer (D)
• ICC Master Class Series • PowerCenter Administrator (A)
• ICC Director
Implement Engagements Related Courses
• ICC Launch • Metadata Manager (D)
• ICC Implementation • Informatica Data Explorer (D)
• Informatica Production Support • Informatica Data Quality (D)
Certifications
• PowerCenter
• Data Quality

A = Administrator
140
Master Data Management
Master Data
Management

• Master Data Management (MDM) • Informatica Data Explorer (D)
Strategy • Informatica Data Quality (D)
• Informatica Data Quality Audit • PowerCenter Level I Developer (D)
Baseline Engagements Related Courses
• Informatica Data Explorer (IDE) • Metadata Manager (D)
Baseline Deployment • Informatica Identity Resolution (D)
• Informatica Data Quality (IDQ)
Baseline Deployment Certifications
• PowerCenter Baseline Architecture • PowerCenter
Deployment • Data Quality
Implementation
• MDM Implementation

A = Administrator
141
Services Oriented Architecture
Data Services

• Data Services (SOA) Strategy • PowerCenter Level I Developer (D)
Baseline Engagements • Informatica Data Quality (D)
• Informatica Web Services Quick Certifications
Start • PowerCenter
• Informatica Data Quality Web • Data Quality
Services Quick Start
• Data Services (SOA) Implementation

A = Administrator
142
Governance, Risk & Compliance (GRC)

• Informatica Environment • PowerCenter Level I Developer (D)
Assessment Service • Informatica Data Explorer (D)
• Enterprise Data Warehouse Strategy • Informatica Data Quality (D)
• Data Quality Audit
Related Courses
Baseline Engagements
• Data Warehouse Development (D)
• Informatica Data Quality Baseline
Deployment • ICC Overview (M)
• Metadata Manager Quick Start • Metadata Manager (D)
Implement Engagements Certifications

• Risk Management Enablement Kit • PowerCenter
• Enterprise Data Warehouse • Data Quality
Implementation

A = Administrator
143
Mergers & Acquisitions (M&A)

• Data Migration Readiness • Data Migration (M)
Assessment • PowerCenter Level I Developer (D)
• Informatica Data Quality Audit
Related Courses
Baseline Engagements • Informatica Data Explorer (D)
• PowerCenter Baseline Deployment • Informatica Data Quality (D)
• Informatica Data Quality (IDQ), • PowerExchange Basics (D)
and/or Informatica Data Explorer
(IDE) Baseline Deployment Certifications
• PowerCenter
• Data Quality
• Data Migration Jumpstart
• Data Migration End-to-End
Implementation

A = Administrator
144
Deliver Your
Project Right
the First Time
with
Informatica
Professional
Services
145
Informatica Global Education Services
Joe Caputo, Director, Pfizer
"We launched an aggressive data migration project

that was to be completed in one year. The
complexity of the data schema along with the use of
Informatica PowerCenter tools proved challenging
to our top colleagues.
We believe that Informatica training led us to triple

productivity, helping us to complete the project on
its original 1-year schedule.”
146
Informatica Contact Information
Informatica Corporation Headquarters

100 Cardinal Way
Redwood City, CA 94063
Tel: 650-385-5000
Toll-free: 800-653-3871
Toll-free Sales: 888-635-0899
Fax: 650-385-5500
Informatica EMEA Headquarters Informatica Asia/Pacific Headquarters

Informatica Nederland B.V. Informatica Australia Pty Ltd
Edisonbaan 14a Level 5, 255 George Street
3439 MN Nieuwegein Sydney
Postbus 116 N.S.W. 2000
3430 AC Nieuwegein Australia
Tel: +31 (0) 30-608-6700 Tel: +612-8907-4400
Fax: +31 (0) 30-608-6777 Fax: +612-8907-4499
Global Customer Support
support@informatica.com
Register at my.informatica.com to open a new service request
or to check on the status of an existing SR.
http://www.informatica.com
147

Performance Tuning PDD FINAL

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Performance Tuning PDD FINAL

Uploaded by

Copyright:

Available Formats

1

After completing this course you will be able to:

Data Transformation Manager

• Temporary storage area for data

• Default is Auto meaning DTM estimates optimal size

Transformer & writer threads wait for data

Reader waits for free blocks; writer waits for data

Reader & transformer wait for free blocks

Extra buffers can keep threads busy

• Temporary slowdowns in reading, transforming

• Buffer block size

• Determine the minimum DTM buffer size

• Increase by a multiple of the block size

• Temporary storage area for certain transformations

Only applies to transformation caches set to Auto

• Two settings: fixed number & percentage

• Recommendation: use fixed limit if this is the

• If a cache setting is too small, DTM writes

• Non-0 counts for readfromdisk and writetodisk

Specify smaller data

• Index cache contains join keys

Streaming algorithm: MASTER

Selected master data

Specify data set with

• Index cache contains up to 100 keys

Cache calculator detects the sorted input property

• To cache or not to cache?

• Index cache size

• Lookup Transformation – Fine-tuning the

• Can build lookup caches concurrently

• Integration Service builds lookup caches at the

Session properties > Config

• Index cache contains group keys

• Take advantage of large memory support in 64-

• Multi-vendor, multi-system environment with

Disk Disk Disk Disk

• Tuning involves an iterative approach

Disk Disk Disk Disk

• Eliminate transformation errors & data rejects

• Override tracing level to terse or normal

• Hardware (CPU bandwidth, RAM, disk space,

Scenario: a high percentage of test data goes to TARGET1;

Tuning of sorter & aggregator could be overlooked in test

Scenario: a high percentage of production data goes to

Performance of 2 loads may differ significantly

• Conditional branching poses a challenge in

• The first challenge is to identify the bottleneck

• Tuning the most severe bottleneck may reveal

• The DTM spawns multiple threads

• A pipeline consists of:

PIPELINE 1 A pipeline on the master

• By default PowerCenter assigns a partition point

Reader Thread Transformation Thread Transform Writer Thread

• The Aggregator transformation stage is waiting

Reader Thread Transformation Thread Transform Writer Thread

• Both the reader & writer are waiting for buffers

Reader Thread Transformation Thread Transform Writer Thread

• Write to flat file

• Target Optimization often involves non-

• Use constraint-based loading only when

• Use Bulk Loading

• Use External Loader

• “update else insert” session property

• Alternative – lookup followed by update strategy