Informatica Performance Tuning PDD FINAL

1
Performance Tuning
Version 8.6
Bert Peters
Global Education Services, Principal Instructor
Objectives
After completing this course you will be able to:
Control how PowerCenter uses memory
Control how PowerCenter uses CPUs
Understand the performance counters
Isolate source, target and engine bottlenecks
Tune different types of bottlenecks
Configure Workflow and Session on Grid
Agenda
Memory optimization
Performance tuning methodology
Tuning source, target, & mapping bottlenecks
Pipeline partitioning
Server Grid
Q&A
Course evaluation
Anatomy of a Session
Integration Service
Data Transformation Manager
(DTM)
DTM Buffer
Target
WRITER data
Source
data READER
Transformation
caches
TRANSFORMER
5
Memory Optimization
DTM Buffer
WRITER
READER
TRANSFORMER
Transformation Caches
DTM Buffer
Temporary storage area for data
Buffer is divided into blocks
Buffer size and block size are tunable
Default setting for each is Auto
DTM Buffer Size Session Property
Default is Auto meaning DTM estimates optimal size

Check session log for actual size allocation
DTM Buffer Block Size
Default is Auto
Check session log for actual size allocation
Reader Bottleneck
Transformer & writer threads wait for data
DTM Buffer
waiting
WRITER
READER
Slow reader
waiting
waiting
TRANSFORMER
10
Transformer Bottleneck
Reader waits for free blocks; writer waits for data
DTM Buffer
waiting
waiting
WRITER
READER
TRANSFORMER
Slow transformer
11
Writer Bottleneck
Reader & transformer wait for free blocks
DTM Buffer
waiting
WRITER
READER
Slow writer
waiting
waiting
TRANSFORMER
12
Source Row Logging

DTM Buffer
waiting
WRITER
READER
TRANSFORMER
Source rows must remain in the buffers until transformation/
writer threads process corresponding rows downstream
13
Large Commit Interval

DTM Buffer
waiting
WRITER
READER
TRANSFORMER
Target rows remain in the buffers until the DTM reaches the
commit point
14
Tuning the DTM Buffer

Extra buffers can keep threads busy
DTM Buffer
WRITER
READER
TRANSFORMER
15

Temporary slowdowns in reading, transforming
or writing may cause large fluctuations in
throughput
A slow thread typically provides data in
spurts
Extra memory blocks can act as a cushion,
keeping other threads busy in case of a
bottleneck
16

Buffer block size
Recommendation: at least 100 rows / block
Compute based on largest source or target row size
Typically not a significant bottleneck unless below 10
rows/buffer
Number of blocks
Minimum of 2 blocks required for each source, target and
XML group
(number of blocks) =
0.9 x ((DTM buffer size)/(buffer block size))
17

Determine the minimum DTM buffer size
(DTM buffer size) =
(buffer block size) x (minimum number of blocks) / 0.9
Increase by a multiple of the block size

If performance does not improve, return to
previous setting
There is no formula for optimal DTM buffer size
Auto setting may be adequate for some sessions
18
Transformation Caches
Temporary storage area for certain transformations
Except for Sorter, each is divided into a Data & Index
Cache
The size of each transformation cache is tunable
If runtime cache requirement > setting, overflow
written to disk
The default setting for each cache is Auto
19
Tuning the Transformation Caches
Default is Auto
20
Max Memory for Transformation Caches
Only applies to transformation caches set to Auto
21
Max Memory for Transformation Caches

Two settings: fixed number & percentage
System uses the smaller of the two
If either setting is 0, DTM assigns a default size to each
transformation cache thats set to Auto
Recommendation: use fixed limit if this is the

only session running; otherwise, use percentage
Use percentage if running in grid or HA
environment
22

If a cache setting is too small, DTM writes
overflow to disk
Determine if transformation caches are
overflowing:
Watch the cache directory on the file system while the
session runs
Use the session performance counters
Options to tune:
Increase the maximum memory allowed for Auto
transformation cache sizes
Set the cache sizes for individual transformations manually
23
Session Performance Counters
24
Performance Counters
25

Non-0 counts for readfromdisk and writetodisk
indicate sub-optimal settings for transformation
index or data caches
This may indicate the need to tune transformation
caches manually
Any manual setting allocates memory outside of
previously set maximum
Cache Calculators provide guidance in manual
tuning of transformation caches
26
Aggregator Caches
Unsorted Input
Must read all input before releasing any output rows
Index cache contains group keys
Data cache contains non-group-by ports
Sorted Input
Releases output row as each input group is processed
Does not require data or index cache
(both =0)
May run much faster than unsorted BUT
must consider the expense of sorting
27
Aggregator Caches Manual Tuning
28
Joiner Caches: Unsorted Input

MASTER
Staging algorithm:
All master data loaded
into cache
Specify smaller data
set as master
DETAIL
Index cache contains join keys

Data cache contains non-key connected outputs
29
Joiner Caches: Sorted Input

Streaming algorithm:
MASTER
Both inputs must be

sorted on join keys
Selected master data
loaded into cache
Specify data set with
fewest records under a
single key as master
DETAIL
Index cache contains up to 100 keys

Data cache contains non-key connected outputs
associated with the 100 keys
30
Joiner Caches Manual Tuning
Cache calculator detects the sorted input property
31
Lookup Caches
To cache or not to cache?
Large number of invocations cache
Large lookup table dont cache
Flat file lookup is always cached
32
Lookup Caches
Data cache
Only connected output ports included in data cache
For unconnected lookup, only return port included in
data cache
Index cache size

Only lookup keys included in index cache
33
Lookup Caches
Lookup Transformation Fine-tuning the
Cache
SQL override
Persistent cache (if the lookup data is static)
Optimize sort
Default- lookup keys, then connected output ports in port order

Can be commented out or overridden in SQL override
Indexing strategy on table may impact performance
Use Any Value property suppresses sort
34
Lookup Caches
Can build lookup caches concurrently
May improve session performance when there is significant
activity upstream from the lookup & the lookup cache is large
This option applies to the individual session
Integration Service builds lookup caches at the

beginning of the session run, even if no row has
entered a Lookup transformation
Session properties > Config

Object tab > Advanced settings
35
Lookup Caches Manual Tuning
36
Rank Caches
Index cache contains group keys
Data cache contains non-group-by ports
Cache sizes related to the number of groups &
the number of ranks
37
Rank Caches Manual Tuning
38
Sorter Cache
Sorter Transformation
May be faster than a DB sort or 3rd party sorter
Index read from RDB = pre-sorted data
SQL SELECT DISTINCT may reduce the volume of data
across the network versus sorter with Distinct property set
Single cache
(no separation of index & data)
39
Sorter Cache Manual Tuning
40
64 bit vs. 32 bit OS

Take advantage of large memory support in 64bit
Cache based transformations like Sorter,
Lookup, Aggregator, Joiner, and XML Target can
address larger blocks of memory
41
Maximum Memory Allocation Example

Parameters
64 Bit OS
Total system memory: 32 GB
Maximum allowed for transformation caches: 5 GB or 10%
DTM Buffer: 24 MB
One transformation manually configured
Index Cache: 10 MB
Data Cache: 20 MB
All other transformations set to Auto
42
Maximum Memory Allocation Example

Result
10% = 3.2 GB < 5 GB:
max allowed for transformation caches = 3.2 GB = 3200
MB
Manually configured transformation uses 30 MB
DTM Buffer uses 24 MB
3200 + 30 + 24 = 3254 MB
Note that 3254 MB represents an upper limit; cached
transformations may use less than the 3200 MB max
43
Performance Tuning Methodology

It is an iterative process
Establish benchmark
Optimize memory
Isolate bottleneck
Tune bottleneck
Take advantage of under-utilized CPU & memory
44
The Production Environment

Multi-vendor, multi-system environment with
many components:
Operating systems, databases, networks and I/O

Usually need to monitor performance in several places
Usually need to monitor outside Informatica as well
Disk
Disk
Disk
Disk
Disk
Disk
Disk
Disk
LAN /
WAN
Disk
Disk
Disk
Disk
Disk
Disk
DBMS
OS
PowerCenter
45
The Production Environment

Tuning involves an iterative approach
1. Identify the biggest performance problem
2. Eliminate or reduce it
3. Return to step 1
Disk
Disk
Disk
Disk
Disk
Disk
Disk
Disk
LAN /
WAN
Disk
Disk
Disk
Disk
Disk
Disk
DBMS
OS
PowerCenter
46
Preliminary Steps
Eliminate transformation errors & data rejects
first make it work, then make it faster

Source row logging requires reader to hold onto buffers
until data is written to target, EVEN IF THERE ARE NO
ERRORS; can significantly increase DTM buffer
requirement
You may want to set stop on errors to 1
47
Preliminary Steps
Override tracing level to terse or normal
Override at session level to avoid having to examine each
transformation in the mapping
Only use verbose tracing during development & only with
very small data sets
If you expect row errors that you will not need to correct,
avoid logging them by overriding the tracing level to terse
(not recommended as a long-term error handling solution)
48
Benchmarking
Hardware (CPU bandwidth, RAM, disk space,
etc.) should be similar to production
Database configuration should be similar to
production
Data volume should be similar to production
Challenge: production data is constantly
changing
Optimal tuning may be data dependent
Estimate average behavior
Estimate worst case behavior
49
Benchmarking Conditional Branching
Scenario: a high percentage of test data goes to TARGET1;

but a high percentage of production data goes to TARGET2
Tuning of sorter & aggregator could be overlooked in test
50
Scenario: a high percentage of production data goes to

TARGET1 on Mondays load; but a high percentage of production
data goes to TARGET2 on Tuesdays load
Performance of 2 loads may differ significantly
51

Conditional branching poses a challenge in
performance tuning
Volume & CHARACTERISTICS of data should be
consistent between test & production
May need to estimate average behavior
May want to tune for worst-case scenario
52
Identifying Bottlenecks
The first challenge is to identify the bottleneck
Target
Source
Transformations
Mapping/Session
Tuning the most severe bottleneck may reveal

another one
This is an iterative process
53
Thread Statistics
The DTM spawns multiple threads
Each thread has busy time & idle time
Goal maximize the busy time & minimize the
idle time
54
Thread Statistics - Terminology

A pipeline consists of:
A source qualifier
The sources that feed that source qualifier
All transformations and targets that receive data from that
source qualifier
55

PIPELINE 1
MA
ST
ER
A pipeline on the master

input of a joiner terminates
at the joiner
DETAIL
PIPELINE 2
56

Stage
a portion of a pipeline; implemented at runtime
as a thread
Partition Point
boundary between 2 stages; always associated
with a transformation
57
Using Thread Statistics

By default PowerCenter assigns a partition point
( ) at each Source Qualifier, Target, Aggregator
and Rank.
Partition Points
Reader Thread
Transformation Thread
(First Stage)
(Second Stage)
Transform
Writer Thread
Thread
(Third Stage) (Fourth Stage)
58
Target Bottleneck
The Aggregator transformation stage is waiting
for target buffers
Reader Thread
(First Stage)
(Second Stage)
Busy%
Busy%
Transform
Thread
(Third Stage)
Busy%=15
Writer Thread
(Fourth Stage)
Busy%=95
59
Transformation Bottleneck
Both the reader & writer are waiting for buffers
Reader Thread
(First Stage)
(Second Stage)
Busy%=15
Busy%=60
Transform
Writer Thread
Thread
Busy%=95
Busy%=10
60
Thread Statistics in Session Log

***** RUN INFO FOR TGT LOAD ORDER GROUP [1], CONCURRENT SET [1] *****
Thread [READER_1_1_1] created for [the read stage] of partition point
[SQ_SortMergeDataSize_Detail] has completed.
Total Run Time = [318.271977] secs
Total Idle Time = [176.488675] secs
Busy Percentage = [44.547843]
Thread [TRANSF_1_1_1] created for [the transformation stage] of partition point
[SQ_SortMergeDataSize_Detail] has completed.
Total Run Time = [707.803168] secs
Total Idle Time = [105.303059] secs
Busy Percentage = [85.122550]
Thread work time breakdown:
JNRTRANS: 10.869565 percent
SRTTRANS: 89.130435 percent
61
Performance Counters in WF Monitor
62
Integration Service Monitor in WFMonitor
63
Session Statistics in WFMonitor
64
Other Methods of Bottleneck Isolation

Write to flat file
If significantly faster than relational target Target
Bottleneck
Place FALSE Filter right after Source Qualifier
If significantly faster Transformation Bottleneck
If target & transformation bottlenecks are ruled
out Source Bottleneck
65
Target Optimization
Target Optimization often involves nonInformatica components
Drop Indexes and Constraints
Use pre/post SQL to drop and rebuild
Use pre/post-load stored procedures
Use constraint-based loading only when

necessary
66
Target Optimization
Use Bulk Loading
Informatica bypasses the database log
Target cannot perform rollback
Weigh importance of performance over recovery
Use External Loader

Similar to bulk loader, but the DB reads from a flat file
67
Target Optimization
Transaction Control
Target commit type

Best performance, least precise control
System avoids writing partially-filled buffers
Source commit type
Last active source to feed a target becomes a transaction generator

Commit interval provides precise control
Slower than target commit type
Avoid setting commit interval too low
User Defined commit type

Required when mapping contains transaction control transformation
Provides precise data-driven control
Slower than target and source commit types
68
Target Optimization
update else insert session property
Works well if you rarely insert
Index required for update key but slows down insert
PowerCenter must wait for database to return error before
inserting
Alternative lookup followed by update strategy
69
Source Bottlenecks
Source optimization often involves nonInformatica components
Generated SQL available in session log
Execute directly against DB
Update statistics on DB
Used tuned SELECT as SQL override
Set the Line Sequential Buffer Length session

property to correspond with the record size
70
Source Bottlenecks
Avoid transferring more than once from remote
machine
Avoid reading same data more than once
Filter at source if possible (reduce data set)
Minimize connected outputs from the source
qualifier
Only connect what you need
The DTM only includes connected outputs when it
generates the SQL SELECT statement
71
Reduce Data Set

Remove Unnecessary Ports
Not all ports are needed
Fewer ports = better performance & lower memory req.
Reduce Rows in Pipeline

Place Filter Transformation as far upstream as possible
Filter before aggregator, rank, or sorter if possible
Filter in source qualifier if possible
72
Avoid Unnecessary Sorting

XML_PARSER_
PME_EQT_ENT
_v1_2
srt_ENT_EXCH_
IDNT_SEDOL
srt_ENT_EXCH_
IDNT_GRP
srt_ENT_EXCH
ANGE_GRP
srt_ENT_MKT_I
DNT_GRP
srt_ENT_MKT_
GRP1
srt_ENTITLEME
T
srt_ENT_EXCH_
IDNT_GRP_RIC
srt_ENT_EXCH_
IDNT_TICKER_
SYM
jnr_ENT_TO_M
KT_GRP
srt_ENT_MKT_
GRP
jnr_ENT_MKT_
GRP_TO_MKT_
IDNT_GRP
srt_ENT_MKT_
AND_MKT_IDN
T_GRP
jnr_ENT_MKTG
RP_TO_EXCHG
RP_WITH_MKT
_CODES
srt_ENT_EXCH_
GRP_PK
jnr_ENT_EXCH_
GRP_TO_EXCH
_IDNT
srt_ENT_EXCH_
IDNT_GRP_PK
jnr_ENT_EXCH_
IDNT_GRP_TO
_RIC
srt_ENT_EXCH_
CODE_PK
jnr_ENT_EXCH_
IDNT_GRP_TO srt_ENT_EXCH_
GRP_PK2
_SEDOL
jnr_ENT_EXCH_
IDNT_GRP_TO
_TICK_SYM
srt_ENT_EXCH_
IDNT_BBT_EXC
H_TICKR
73
Expressions Language Tips

Functions are more expensive than operators
Use || instead of CONCAT()
Use variable ports to factor out common logic
74
Expressions Language Tips

Simplify nested functions when possible
instead of:
IIF(condition1,result1,IIF(condition2,
result2,IIF ))))))))))))
try:
DECODE (TRUE,
condition1, result1,
:
conditionn, resultn)
75
General Guidelines
Data Type Conversions are expensive, avoid if
possible
All-input transformations (such as Aggregator,
Rank etc) are more expensive than pass-through
transformations
An all-input transformation must process multiple input
rows before it can produce any output
76
General Guidelines
High precision (session property) is expensive
but only applies to decimal data type
UNICODE requires 2 bytes per character; ASCII
requires 1 byte per character
Performance difference depends on number of string ports
only.
77
Transformation Specific
Reusable Sequence Generator
Number of Cached Values Property
Purpose: enables different sessions to share the
same sequence without generating the same
numbers
>0: allocates the specified number of values &
updates the current value in the repository at the
end of each block
(each session gets a different block of numbers)
78
Transformation Specific
Reusable Sequence Generator
Number of Cached Values Property
Setting too low causes frequent repository
access, which impacts performance
Unused values in a block are lost; this leads to
gaps in the sequence
Consider alternatives
example: non-reusable sequence generators,
one generates even numbers, & the other
generates odd numbers
79
Other Transformations
Normalizer
This transformation INCREASES the number of rows
Place as far downstream as possible
XML Reader/ Mid Stream XML Parser

Remove groups that are not projected
We do not allocate memory for these groups, but still need
to maintain PK/FK relationships
Dont leave port size lengths as infinite. Use appropriate
length.
80
Iterative Process
After tuning your bottlenecks, revisit memory
optimization
Tuning often REDUCES memory requirements
(you might even be able to change some settings back to Auto)
Change one thing at a time & record your results
81
Partitioning
Apply after optimizing source, target, &
transformation bottlenecks
Apply after optimizing memory usage
Exploit under-utilized CPU & memory
To customize partitioning settings, you need the
partitioning license
82
Partitioning Terminology
Partition
subset of the data
Stage
a portion of a pipeline
Partition Point
boundary between 2 stages
Partition Type
algorithm for distributing data among partitions;
always associated with a partition point
83
Threads, Partition Points and Stages

The DTM implements each stage as a thread;
hence, stages run in parallel
You may add or remove partition points
Reader Thread
(First Stage)
(Second Stage)
Transform
Writer Thread
Thread
84
Rules for Adding Partition Points

You cannot add a partition point to a Sequence
Generator
You cannot add a partition point to an unconnected
transformation
You cannot add a partition point on a source
definition
If a pipeline is split and then concatenated, you
cannot add a partition point on any transformation
between the split and the concatenation
Adding or removing partition points requires the
partitioning license
85
Guidelines for Adding Partition Points

Make sure you have ample CPU bandwidth
Make sure you have gone through other optimization
techniques
Add on complex transformations that could benefit
from additional threads
If you have >1 partitions, add where data needs to be
re-distributed
Aggregator, Rank, or Sorter, where data must be grouped
Where data is distributed unevenly
On partitioned sources and targets
86
Partition Points & Partitions
Partitions subdivide the data
Each partition represents a thread within a stage
Each partition point distributes the data among the partitions
Threads - partition 1
Threads partition 2
Threads partition 3
3 Reader Threads
(First Stage)
3 Transformation Threads
(Second Stage)
3 more trans threads

(Third Stage)
3 Writer Threads
(Fourth Stage)
87
Session Partitioning GUI
The number next to each flag shows the number of partitions
The color of each flag indicates the partition type
88
Rules for Adding Partitions

The master input of a joiner can only have 1 partition
unless you add a partition point at the joiner
A pipeline with an XML target can only have 1
partition
If the pipeline has a relational source or target and
you define n partitions, each database must support
n parallel connections
A pipeline containing a custom or external
procedure transformation can only have 1 partition
unless those transformations are configured to allow
multiple partitions
89
Rules for Adding Partitions

The number of partitions is constant on a given
pipeline
If you have a partition point on a Joiner, the number of
partitions on both inputs will be the same
At each partition point, you can specify how you

want the data distributed among the partitions
(this is known as the partition type)
90
Guidelines for Adding Partitions

Make sure you have ample CPU bandwidth &
memory
Make sure you have gone through other
optimization techniques
Add 1 partition at a time & monitor the CPU
When CPU usage approaches 100%, dont add anymore
partitions
Take advantage of database partitioning
91
Partition Types
Each partition point is associated with a partition
type
The partition type defines how the DTM is to
distribute the data among the partitions
If the pipeline has only 1 partition, the partition
point serves only to add a stage to the pipeline
There are restrictions, enforced by the GUI, on
which partition types are valid at which partition
points
92
Partition Types Pass Through

Data is processed without redistributing the rows
among partitions
Serves only to add a stage to the pipeline
Use when you want an additional thread for a
complex transformation but you dont need to
redistribute the data (or you only have 1 partition)
93
Partition Types Key Range

The DTM passes data to each partition depending on
user-specified ranges
You may use several ports to form a compound
partition key
The DTM discards rows not falling in any specified
range
If 2 or more ranges overlap, a row can go down more
than 1 partition resulting in duplicate data
Use key range partitioning when the sources or
targets in the pipeline are partitioned by key range
94
Partition Types Round Robin

The Integration Service distributes rows of data
evenly to all partitions
Use when there is no need to group data among
partitions
Use when reading flat file sources of different
sizes
Use when data has been partitioned unevenly
upstream and requires significantly more
processing before arriving at the target
95
Partition Types Hash Auto Keys

The DTM applies a hash function to a partition
key to group data among partitions
Use hash partitioning to ensure that groups of
rows are processed in the same partition
The DTM automatically determines the partition
key based on:
aggregator or rank group keys
join keys
sort keys
96
Partition Types Hash User Keys

This is similar to hash auto keys except the user
specifies which ports make up the partition key
Alternative to hard-coded key range partition on
relational target (if DB table is partitioned)
97
Partition Types Database

Only valid for DB2 and Oracle databases in a
multi-node database
Sources: Oracle and DB2
Targets: DB2 only
The number of partitions does not have to equal

the number of database nodes
Performance may be better if they are equal, however
98
Partitioning with Relational Sources

PowerCenter creates a separate source
database connection for each partition
If you define n partitions, the source database
must support n parallel connections
The DTM generates a separate SQL Query for
each partition
Each query can be overridden
PowerCenter reads the data concurrently
99
Partitioning with Flat File Sources

Multiple flat files
Each partition reads a different file
PowerCenter reads the files in parallel
If the files are of unequal sizes, you may want to repartition
the data round-robin
Single flat file

PowerCenter makes multiple parallel connections to the
same file based on the number of partitions specified
PowerCenter distributes the data randomly to the partitions
Over a large volume of data, this random distribution tends
to have an effect similar to round robinpartition sizes
tend to be equal
100
Partitioning with Relational Targets

The DTM creates a separate target database
connection for each partition
The DTM loads data concurrently
If you define n partitions, database must support
n concurrent connections
101
Partitioning with Flat File Targets

The DTM writes output for each partition to a
separate file
Connection settings and properties can be
configured for each partition
The DTM can merge the target files if all have
connections local to the Integration Service
machine
The DTM writes the data concurrently
102
PartitioningMemory Requirements
Minimum number of buffer blocks multiplied by
number of partitions
(2 blocks per source, target, & XML group) x
(number of partitions)
Optimal number of buffer blocks =

(optimal number for 1 partition) x
(number of partitions)
103
Cache Partitioning
DTM may create separate caches for each
partition for each cached transformation; this is
called cache partitioning
DTM treats cache size settings as per partition
for example, if you configure an aggregator with:
2 MB for the index cache,
3 MB for the data cache,
& you create 2 partitions
DTM will allocate up to 4 MB & 6 MB total
DTM does not partition lookup or joiner caches

unless the lookup or joiner itself is a partition
point
104
Cache Partitioning
Index cache Each partition
has its own
Data cache cache(s)
Index cache
Data cache
Sorter cache
Sorter cache
105
Cache Partitioning
Index cache
Data cache
Index cache
With a partition
point on the joiner,
each partition
has its own
cache(s)
Data cache
106
Cache Partitioning
Index cache
Data cache
With no
partition point
on the joiner,
however, all
partitions
share 1 set of
caches
107
Monitoring Partitions
The Workflow Monitor provides runtime details
for each partition
Per partition, you can determine the following:
Number of rows processed
Memory usage
CPU usage
If one partition is doing more work than the

others, you may want to redistribute the data
108
Pipeline Partitioning Example

Scenario:
Student record processing
XML source and Oracle target
XML source is split into 3 files
109

Partition 1
Partition 2
Partition 3
Solution: Define a partition for each of the 3 source files
110

RR
RR
RR
Problem: Source files vary in size, resulting

in unequal workloads for each partition
Solution: Use Round Robin partitioning on the
filter to balance load
111

RR
RR
RR
Problem: Potential for splitting rank groups

Solution: Use hash auto-keys partitioning on the rank
to group rows appropriately
112

RR
RR
RR
Problem: Target tables are partitioned on Oracle by key range

Solution: Use target Key Range partitioning to optimize writing
to target tables
113
Dynamic Partitioning
Integration Service can automatically set the
number of partitions at runtime.
Useful when the data volume increases or the
number of CPUs available changes.
Basis for the number of partitions is specified as
a session property
114
Concurrent Workflow Execution (8.5)

Prior to 8.5
Only one instance of Workflow can run
Users duplicate workflows maintenance issues
Concurrent sessions required duplicate of session
115
Concurrent Workflow Execution

Allow workflow instances to be run
concurrently
Override parameters/ variables across run
instances
Same scheduler across multiple instances
Supports independent recovery/ failover
semantics
116
Concurrent Workflow Execution
117
Workflow on Grid (WonG)

Integration Service is deployed on a Grid an IS
service process (pmserver) runs on each node in
the grid
Allows tasks of a workflow to be distributed
across a grid no user configuration necessary
if all nodes homogenous
118
Workflow on Grid (WonG)

Different sessions in a workflow are dispatched
on different nodes to balance load
Use workflow on grid if:
There are many concurrent sessions and workflows
Leverage multiple machines in the environment
119
Load Balancer Modes

Round Robin
Honors Max Number of Processes per Node
Metric-based
Evaluates nodes in round-robin
Honors resource provision thresholds
Uses stats from last 3 runs - if no statistics is collected yet,
defaults used (40 MB memory, 15% CPU)
120
Load Balancer Modes

Adaptive
Selects node w/ the most available CPU
Honors resource provision thresholds
Uses statistics from last 3 runs of a task to determine whether a
task can run on a node
Bypass in dispatch queue: skip tasks in the queue that are more
resource intensive and cant be dispatch to any currently
available nodes
CPU Profile - Ranks node CPU performance against a baseline
system
All modes take into account the service level

assigned to workflows
121
Session on Grid (SonG)

Session partitioned and dispatched across
multiple nodes
Allows Unlimited Scalability
Source and targets may be on different nodes
More suited for large sessions
Smaller machines in a grid is a lower cost option
than large multi-CPU machines
122
Session on Grid (SonG)

Session on Grid will scale if:
Sessions are CPU/memory intensive and overcomes
overhead of data movement over network
I/O is kept localized to each node running the partition
There is a fast shared storage (e.g. NAS, clustered FS)
Partitions are independent
Source and target have different connections that are

only available on different machines
E.g. source Excel files on Windows and target is only
available on UNIX
Supported on a homogeneous grid
123
Configuring Session on Grid

Enable Session on Grid attribute in session configuration
tab
Assign workflow to be executed by an integration service
that has been assigned to a grid
124
Dynamic Partitioning
Based on user specification (# partitions)
Can parameterize as $DynamicPartitionCount
Based on # of nodes in grid

Based on source partitioning (Database partitioning)
125
SonG Partitioning Guidelines

Set # of partitions = # of nodes to get an even
distribution
Tip: use dynamic partitioning feature to ease expansion of
grid
In addition, continue to create partition-points to

achieve parallelism
126
SonG Partitioning Guidelines

To minimize data traffic across nodes:
Use pass-through partition type which will try to keep
transformations on the same node
Use resource-map to dispatch the source and target
transformations to the node where source or target are
located
Keep the target files unmerged whenever possible (e.g. if
being used for staging)
Resource requirement should be specified at the

lowest granularity e.g. transformation instead of
session (as far as possible)
This will ensure better distribution in SonG
127
File Placement Best Practices

Files that should be placed on a high-bandwidth shared
file system (CFS / NAS)
Source files
Lookup source files [sequential file access]
Target files [sequential file access]
Persistent cache files for lookup or incremental aggregation [random file
access]
Files that should be placed on a shared file system but

bandwidth requirement is low (NFS)
Parameter files
Other configuration files
Indirect source or target files
Log files.
128
File Placement Best Practices

Files that should be put on local storage
Non-persistent cache files (i.e. sorter temporary files)
Intermediate target files for sequential merge
Other temporary files created during a session execution
$PmTempFileDir should point to a local file system
For best performance, ensure sufficient

bandwidth for shared file system and local
storage (possibly by using additional disk i/o
controllers)
129
Data Integration Certification Path

Level
Certification Title
Recommended Training
Required Exams
Informatica Certified
Administrator
PowerCenter QuickStart (eLearning)

PowerCenter 8.5+ Administrator (4 days)
Architecture & Administration;

Advanced Administration
Developer

PowerCenter Developer 8.x Level I (4 days)
PowerCenter Developer 8 Level II (4 days)

Mapping Design
Advanced Mapping Design
Consultant

PowerCenter Developer 8.x Level I (4 days)
PowerCenter Developer 8 Level II (4 days)

Advanced Administration
Mapping Design
Advanced Mapping Design
Enablement Technologies
PowerCenter 8 Data Migration (4 days)

PowerCenter 8 High Availability (1 day)
Additional Training:
PowerCenter 8.5 New Features
PowerCenter 8.6 New Features
PowerCenter 8 Upgrade
PowerCenter 8 Team-Based Development

PowerCenter 8.5 Unified Security `
130
Q&A
Bert Peters
131
Course Evaluation
Bert Peters
132
Appendix
Informatica Services by
Solution
133
B2B Data Exchange

Recommended Services
B2B
Professional Services
Strategy Engagements
B2B Data Transformation
Architectural Review
Baseline Engagements
B2B Data Transformation
Baseline Architecture
Education Services
Recommended Courses
Informatica B2B Data
Transformation (D)
Informatica B2B Data Exchange
(D)
Implement Engagements
B2B Full Project Lifecycle
Transaction/Customer/
Payment Hub
Target Audience for Courses

D = Developer
M = Project Manager
A = Administrator
134
Data Governance
Informatica Environment
Assessment Service
Metadata Strategy and Enablement
Data Quality Audit
Data Governance Implementation
Metadata Manager Quick Start
Informatica Data Quality Baseline
Deployment
Metadata Manager Customization
Data Quality Management
Implementation
Education Services
Recommended Courses
PowerCenter Level I Developer (D)
Informatica Data Explorer (D)
Informatica Data Quality (D)
Related Courses
PowerCenter Administrator (A)
Metadata Manager (D)
Certifications:
PowerCenter
Data Quality

D = Developer
M = Project Manager
A = Administrator
135
Data Migration
Data Migration
Data Migration Readiness
Assessment
Informatica Data Quality Audit
PowerCenter Baseline Deployment
Informatica Data Quality (IDQ),
and/or Informatica Data Explorer
(IDE) Baseline Deployment
Data Migration Jumpstart
Data Migration End-to-End
Implementation
Education Services
Recommended Courses
Data Migration (M)
Related Courses
PowerExchange Basics (D)
Certifications
PowerCenter
Data Quality

D = Developer
M = Project Manager
A = Administrator
136
Data Quality
Data Quality
Education Services
Data Quality Management Strategy
Recommended Courses
Informatica Data Quality Web
Services Quick Start
Related Courses
Informatica Identity Resolution (D)
Certifications
Data Quality
Data Quality Management
Implementation

D = Developer
M = Project Manager
A = Administrator
137
Data Synchronization
Data
Synchronization
Project Definition and Assessment
PowerExchange Baseline
Architecture Deployment
PowerCenter Baseline Architecture
Deployment
Data Synchronization
Implementation
Education Services
Recommended Courses
PowerCenter Level II Developer (D)
Related Courses
PowerExchange Basics Oracle RealTime CDC (D)
PowerExchange SQL RT (D)
PowerExchange for MVS DB2 (D)
Certifications
PowerCenter

D = Developer
M = Project Manager
A = Administrator
138
Enterprise Data Warehousing

Data Warehouse
Enterprise Data Warehousing (EDW)
Strategy
Assessment Service
Metadata Strategy & Enablement
Deployment
EDW Implementation
Education Services
Recommended Courses
PowerCenter Level II Developer (D)
PowerCenter Metadata Manager (D)
Related Courses
Data Warehouse Development (D)
Certifications
PowerCenter

D = Developer
M = Project Manager
A = Administrator
139
Integration Competency Centers

ICC
ICC Assessment
ICC Master Class Series
ICC Director
ICC Launch
ICC Implementation
Informatica Production Support
Education Services
Recommended Courses
ICC Overview (M)
Related Courses
Certifications
PowerCenter
Data Quality

D = Developer
M = Project Manager
A = Administrator
140
Master Data Management

Master Data
Management
Education Services
Master Data Management (MDM)
Strategy
Recommended Courses
Informatica Data Explorer (IDE)
Baseline Deployment
Informatica Data Quality (IDQ)
Baseline Deployment
Deployment
Related Courses
Informatica Identity Resolution (D)
Certifications
PowerCenter
Data Quality
Implementation
MDM Implementation
D = Developer
M = Project Manager
A = Administrator
141
Services Oriented Architecture

Data Services
Data Services (SOA) Strategy
Informatica Web Services Quick
Start
Informatica Data Quality Web
Services Quick Start
Education Services
Recommended Courses
Certifications
PowerCenter
Data Quality
Data Services (SOA) Implementation

D = Developer
M = Project Manager
A = Administrator
142
Governance, Risk & Compliance (GRC)

Assessment Service
Enterprise Data Warehouse Strategy
Data Quality Audit
Informatica Data Quality Baseline
Deployment
Metadata Manager Quick Start
Risk Management Enablement Kit
Enterprise Data Warehouse
Implementation
Education Services
Recommended Courses
Related Courses
Data Warehouse Development (D)
ICC Overview (M)
Certifications
PowerCenter
Data Quality
D = Developer
M = Project Manager
A = Administrator
143
Mergers & Acquisitions (M&A)

Data Migration Readiness
Assessment
PowerCenter Baseline Deployment
Data Migration Jumpstart
Data Migration End-to-End
Implementation
Education Services
Recommended Courses
Data Migration (M)
Related Courses
PowerExchange Basics (D)
Certifications
PowerCenter
Data Quality

D = Developer
M = Project Manager
A = Administrator
144
Deliver Your
Project Right
the First Time
with
Informatica
Professional
Services
145
Informatica Global Education Services

Joe Caputo, Director, Pfizer
"We launched an aggressive data migration project

that was to be completed in one year. The
complexity of the data schema along with the use of
Informatica PowerCenter tools proved challenging
to our top colleagues.
We believe that Informatica training led us to triple
productivity, helping us to complete the project on
its original 1-year schedule.
146
Informatica Contact Information

Informatica Corporation Headquarters
100 Cardinal Way
Redwood City, CA 94063
Tel: 650-385-5000
Toll-free: 800-653-3871
Toll-free Sales: 888-635-0899
Fax: 650-385-5500
Informatica EMEA Headquarters
Informatica Asia/Pacific Headquarters
Informatica Nederland B.V.

Edisonbaan 14a
3439 MN Nieuwegein
Postbus 116
3430 AC Nieuwegein
Tel: +31 (0) 30-608-6700
Fax: +31 (0) 30-608-6777
Informatica Australia Pty Ltd

Level 5, 255 George Street
Sydney
N.S.W. 2000
Australia
Tel: +612-8907-4400
Fax: +612-8907-4499
Global Customer Support

support@informatica.com
Register at my.informatica.com to open a new service request
or to check on the status of an existing SR.
http://www.informatica.com
147

Informatica Performance Tuning PDD FINAL

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Informatica Performance Tuning PDD FINAL

Uploaded by

Copyright:

Available Formats

1

DTM Buffer Size Session Property

Default is Auto meaning DTM estimates optimal size

DTM Buffer Block Size

Source Row Logging

Large Commit Interval

Tuning the DTM Buffer

Tuning the DTM Buffer

Tuning the DTM Buffer

Tuning the DTM Buffer

Increase by a multiple of the block size

Tuning the Transformation Caches

Max Memory for Transformation Caches

Only applies to transformation caches set to Auto

Max Memory for Transformation Caches

Recommendation: use fixed limit if this is the

Tuning the Transformation Caches

Session Performance Counters

Tuning the Transformation Caches

Aggregator Caches Manual Tuning

Joiner Caches: Unsorted Input

Index cache contains join keys

Joiner Caches: Sorted Input

Both inputs must be

Index cache contains up to 100 keys

Joiner Caches Manual Tuning

Cache calculator detects the sorted input property

Index cache size

Default- lookup keys, then connected output ports in port order

Integration Service builds lookup caches at the

Session properties > Config

Lookup Caches Manual Tuning

Rank Caches Manual Tuning

Sorter Cache Manual Tuning

64 bit vs. 32 bit OS

Maximum Memory Allocation Example

Maximum Memory Allocation Example

Performance Tuning Methodology

The Production Environment

Operating systems, databases, networks and I/O

The Production Environment

first make it work, then make it faster

Benchmarking Conditional Branching

Scenario: a high percentage of test data goes to TARGET1;

Benchmarking Conditional Branching

Scenario: a high percentage of production data goes to

Benchmarking Conditional Branching

Tuning the most severe bottleneck may reveal

Thread Statistics - Terminology

Thread Statistics - Terminology

A pipeline on the master

Thread Statistics - Terminology

Using Thread Statistics

Thread Statistics in Session Log

Performance Counters in WF Monitor

Integration Service Monitor in WFMonitor

Session Statistics in WFMonitor

Other Methods of Bottleneck Isolation

Use constraint-based loading only when

Use External Loader

Target commit type

Source commit type

Last active source to feed a target becomes a transaction generator

User Defined commit type