You are on page 1of 147

1

Performance Tuning
Version 8.6

Bert Peters
Global Education Services, Principal Instructor

Objectives
After completing this course you will be able to:
Control how PowerCenter uses memory
Control how PowerCenter uses CPUs
Understand the performance counters
Isolate source, target and engine bottlenecks
Tune different types of bottlenecks
Configure Workflow and Session on Grid

Agenda
Memory optimization
Performance tuning methodology
Tuning source, target, & mapping bottlenecks
Pipeline partitioning
Server Grid
Q&A
Course evaluation

Anatomy of a Session
Integration Service
Data Transformation Manager
(DTM)
DTM Buffer
Target
WRITER data

Source
data READER

Transformation
caches

TRANSFORMER
5

Memory Optimization
DTM Buffer

WRITER

READER

TRANSFORMER
Transformation Caches

DTM Buffer
Temporary storage area for data
Buffer is divided into blocks
Buffer size and block size are tunable
Default setting for each is Auto

DTM Buffer Size Session Property

Default is Auto meaning DTM estimates optimal size


Check session log for actual size allocation

DTM Buffer Block Size

Default is Auto
Check session log for actual size allocation

Reader Bottleneck
Transformer & writer threads wait for data
DTM Buffer
waiting

WRITER

READER
Slow reader
waiting

waiting

TRANSFORMER

10

Transformer Bottleneck
Reader waits for free blocks; writer waits for data
DTM Buffer
waiting

waiting

WRITER

READER

TRANSFORMER
Slow transformer

11

Writer Bottleneck
Reader & transformer wait for free blocks
DTM Buffer
waiting

WRITER

READER

Slow writer
waiting

waiting

TRANSFORMER

12

Source Row Logging


DTM Buffer
waiting

WRITER

READER

TRANSFORMER
Source rows must remain in the buffers until transformation/
writer threads process corresponding rows downstream
13

Large Commit Interval


DTM Buffer
waiting

WRITER

READER

TRANSFORMER
Target rows remain in the buffers until the DTM reaches the
commit point

14

Tuning the DTM Buffer


Extra buffers can keep threads busy
DTM Buffer

WRITER

READER

TRANSFORMER

15

Tuning the DTM Buffer


Temporary slowdowns in reading, transforming
or writing may cause large fluctuations in
throughput
A slow thread typically provides data in
spurts
Extra memory blocks can act as a cushion,
keeping other threads busy in case of a
bottleneck

16

Tuning the DTM Buffer


Buffer block size
Recommendation: at least 100 rows / block
Compute based on largest source or target row size
Typically not a significant bottleneck unless below 10
rows/buffer

Number of blocks
Minimum of 2 blocks required for each source, target and
XML group
(number of blocks) =
0.9 x ((DTM buffer size)/(buffer block size))

17

Tuning the DTM Buffer


Determine the minimum DTM buffer size
(DTM buffer size) =
(buffer block size) x (minimum number of blocks) / 0.9

Increase by a multiple of the block size


If performance does not improve, return to
previous setting
There is no formula for optimal DTM buffer size
Auto setting may be adequate for some sessions

18

Transformation Caches
Temporary storage area for certain transformations
Except for Sorter, each is divided into a Data & Index
Cache
The size of each transformation cache is tunable
If runtime cache requirement > setting, overflow
written to disk
The default setting for each cache is Auto

19

Tuning the Transformation Caches

Default is Auto
20

Max Memory for Transformation Caches

Only applies to transformation caches set to Auto

21

Max Memory for Transformation Caches


Two settings: fixed number & percentage
System uses the smaller of the two
If either setting is 0, DTM assigns a default size to each
transformation cache thats set to Auto

Recommendation: use fixed limit if this is the


only session running; otherwise, use percentage
Use percentage if running in grid or HA
environment

22

Tuning the Transformation Caches


If a cache setting is too small, DTM writes
overflow to disk
Determine if transformation caches are
overflowing:
Watch the cache directory on the file system while the
session runs
Use the session performance counters

Options to tune:
Increase the maximum memory allowed for Auto
transformation cache sizes
Set the cache sizes for individual transformations manually

23

Session Performance Counters

24

Performance Counters

25

Tuning the Transformation Caches


Non-0 counts for readfromdisk and writetodisk
indicate sub-optimal settings for transformation
index or data caches
This may indicate the need to tune transformation
caches manually
Any manual setting allocates memory outside of
previously set maximum
Cache Calculators provide guidance in manual
tuning of transformation caches

26

Aggregator Caches
Unsorted Input
Must read all input before releasing any output rows
Index cache contains group keys
Data cache contains non-group-by ports

Sorted Input
Releases output row as each input group is processed
Does not require data or index cache
(both =0)
May run much faster than unsorted BUT
must consider the expense of sorting

27

Aggregator Caches Manual Tuning

28

Joiner Caches: Unsorted Input


MASTER

Staging algorithm:
All master data loaded
into cache
Specify smaller data
set as master

DETAIL

Index cache contains join keys


Data cache contains non-key connected outputs

29

Joiner Caches: Sorted Input


Streaming algorithm:

MASTER

Both inputs must be


sorted on join keys
Selected master data
loaded into cache
Specify data set with
fewest records under a
single key as master

DETAIL

Index cache contains up to 100 keys


Data cache contains non-key connected outputs
associated with the 100 keys
30

Joiner Caches Manual Tuning

Cache calculator detects the sorted input property

31

Lookup Caches
To cache or not to cache?
Large number of invocations cache
Large lookup table dont cache
Flat file lookup is always cached

32

Lookup Caches
Data cache
Only connected output ports included in data cache
For unconnected lookup, only return port included in
data cache

Index cache size


Only lookup keys included in index cache

33

Lookup Caches
Lookup Transformation Fine-tuning the
Cache
SQL override
Persistent cache (if the lookup data is static)
Optimize sort

Default- lookup keys, then connected output ports in port order


Can be commented out or overridden in SQL override
Indexing strategy on table may impact performance
Use Any Value property suppresses sort

34

Lookup Caches
Can build lookup caches concurrently
May improve session performance when there is significant
activity upstream from the lookup & the lookup cache is large
This option applies to the individual session

Integration Service builds lookup caches at the


beginning of the session run, even if no row has
entered a Lookup transformation

Session properties > Config


Object tab > Advanced settings

35

Lookup Caches Manual Tuning

36

Rank Caches
Index cache contains group keys
Data cache contains non-group-by ports
Cache sizes related to the number of groups &
the number of ranks

37

Rank Caches Manual Tuning

38

Sorter Cache
Sorter Transformation
May be faster than a DB sort or 3rd party sorter
Index read from RDB = pre-sorted data
SQL SELECT DISTINCT may reduce the volume of data
across the network versus sorter with Distinct property set

Single cache
(no separation of index & data)

39

Sorter Cache Manual Tuning

40

64 bit vs. 32 bit OS


Take advantage of large memory support in 64bit
Cache based transformations like Sorter,
Lookup, Aggregator, Joiner, and XML Target can
address larger blocks of memory

41

Maximum Memory Allocation Example


Parameters

64 Bit OS
Total system memory: 32 GB
Maximum allowed for transformation caches: 5 GB or 10%
DTM Buffer: 24 MB
One transformation manually configured
Index Cache: 10 MB
Data Cache: 20 MB
All other transformations set to Auto

42

Maximum Memory Allocation Example


Result
10% = 3.2 GB < 5 GB:
max allowed for transformation caches = 3.2 GB = 3200
MB
Manually configured transformation uses 30 MB
DTM Buffer uses 24 MB
3200 + 30 + 24 = 3254 MB
Note that 3254 MB represents an upper limit; cached
transformations may use less than the 3200 MB max

43

Performance Tuning Methodology


It is an iterative process

Establish benchmark
Optimize memory
Isolate bottleneck
Tune bottleneck
Take advantage of under-utilized CPU & memory

44

The Production Environment


Multi-vendor, multi-system environment with
many components:

Operating systems, databases, networks and I/O


Usually need to monitor performance in several places
Usually need to monitor outside Informatica as well

Disk
Disk

Disk

Disk

Disk

Disk

Disk

Disk
LAN /
WAN

Disk

Disk

Disk

Disk

Disk

Disk

DBMS
OS
PowerCenter

45

The Production Environment


Tuning involves an iterative approach
1. Identify the biggest performance problem
2. Eliminate or reduce it
3. Return to step 1

Disk
Disk

Disk

Disk

Disk

Disk

Disk

Disk
LAN /
WAN

Disk

Disk

Disk

Disk

Disk

Disk

DBMS
OS
PowerCenter

46

Preliminary Steps
Eliminate transformation errors & data rejects

first make it work, then make it faster


Source row logging requires reader to hold onto buffers
until data is written to target, EVEN IF THERE ARE NO
ERRORS; can significantly increase DTM buffer
requirement
You may want to set stop on errors to 1

47

Preliminary Steps
Override tracing level to terse or normal
Override at session level to avoid having to examine each
transformation in the mapping
Only use verbose tracing during development & only with
very small data sets
If you expect row errors that you will not need to correct,
avoid logging them by overriding the tracing level to terse
(not recommended as a long-term error handling solution)

48

Benchmarking
Hardware (CPU bandwidth, RAM, disk space,
etc.) should be similar to production
Database configuration should be similar to
production
Data volume should be similar to production
Challenge: production data is constantly
changing
Optimal tuning may be data dependent
Estimate average behavior
Estimate worst case behavior

49

Benchmarking Conditional Branching

Scenario: a high percentage of test data goes to TARGET1;


but a high percentage of production data goes to TARGET2
Tuning of sorter & aggregator could be overlooked in test

50

Benchmarking Conditional Branching

Scenario: a high percentage of production data goes to


TARGET1 on Mondays load; but a high percentage of production
data goes to TARGET2 on Tuesdays load
Performance of 2 loads may differ significantly
51

Benchmarking Conditional Branching


Conditional branching poses a challenge in
performance tuning
Volume & CHARACTERISTICS of data should be
consistent between test & production
May need to estimate average behavior
May want to tune for worst-case scenario

52

Identifying Bottlenecks
The first challenge is to identify the bottleneck

Target
Source
Transformations
Mapping/Session

Tuning the most severe bottleneck may reveal


another one
This is an iterative process

53

Thread Statistics
The DTM spawns multiple threads
Each thread has busy time & idle time
Goal maximize the busy time & minimize the
idle time

54

Thread Statistics - Terminology


A pipeline consists of:
A source qualifier
The sources that feed that source qualifier
All transformations and targets that receive data from that
source qualifier

55

Thread Statistics - Terminology


PIPELINE 1
MA

ST

ER

A pipeline on the master


input of a joiner terminates
at the joiner

DETAIL

PIPELINE 2

56

Thread Statistics - Terminology


Stage
a portion of a pipeline; implemented at runtime
as a thread
Partition Point
boundary between 2 stages; always associated
with a transformation

57

Using Thread Statistics


By default PowerCenter assigns a partition point
( ) at each Source Qualifier, Target, Aggregator
and Rank.
Partition Points

Reader Thread

Transformation Thread

(First Stage)

(Second Stage)

Transform
Writer Thread
Thread
(Third Stage) (Fourth Stage)

58

Target Bottleneck
The Aggregator transformation stage is waiting
for target buffers

Reader Thread

Transformation Thread

(First Stage)

(Second Stage)

Busy%

Busy%

Transform
Thread
(Third Stage)
Busy%=15

Writer Thread
(Fourth Stage)
Busy%=95

59

Transformation Bottleneck
Both the reader & writer are waiting for buffers

Reader Thread

Transformation Thread

(First Stage)

(Second Stage)

Busy%=15

Busy%=60

Transform
Writer Thread
Thread
(Third Stage) (Fourth Stage)
Busy%=95

Busy%=10

60

Thread Statistics in Session Log


***** RUN INFO FOR TGT LOAD ORDER GROUP [1], CONCURRENT SET [1] *****
Thread [READER_1_1_1] created for [the read stage] of partition point
[SQ_SortMergeDataSize_Detail] has completed.
Total Run Time = [318.271977] secs
Total Idle Time = [176.488675] secs
Busy Percentage = [44.547843]
Thread [TRANSF_1_1_1] created for [the transformation stage] of partition point
[SQ_SortMergeDataSize_Detail] has completed.
Total Run Time = [707.803168] secs
Total Idle Time = [105.303059] secs
Busy Percentage = [85.122550]
Thread work time breakdown:
JNRTRANS: 10.869565 percent
SRTTRANS: 89.130435 percent

61

Performance Counters in WF Monitor

62

Integration Service Monitor in WFMonitor

63

Session Statistics in WFMonitor

64

Other Methods of Bottleneck Isolation


Write to flat file
If significantly faster than relational target Target
Bottleneck
Place FALSE Filter right after Source Qualifier
If significantly faster Transformation Bottleneck
If target & transformation bottlenecks are ruled
out Source Bottleneck

65

Target Optimization
Target Optimization often involves nonInformatica components
Drop Indexes and Constraints
Use pre/post SQL to drop and rebuild
Use pre/post-load stored procedures

Use constraint-based loading only when


necessary

66

Target Optimization
Use Bulk Loading
Informatica bypasses the database log
Target cannot perform rollback
Weigh importance of performance over recovery

Use External Loader


Similar to bulk loader, but the DB reads from a flat file

67

Target Optimization
Transaction Control

Target commit type


Best performance, least precise control
System avoids writing partially-filled buffers

Source commit type

Last active source to feed a target becomes a transaction generator


Commit interval provides precise control
Slower than target commit type
Avoid setting commit interval too low

User Defined commit type


Required when mapping contains transaction control transformation
Provides precise data-driven control
Slower than target and source commit types

68

Target Optimization
update else insert session property
Works well if you rarely insert
Index required for update key but slows down insert
PowerCenter must wait for database to return error before
inserting

Alternative lookup followed by update strategy

69

Source Bottlenecks
Source optimization often involves nonInformatica components
Generated SQL available in session log
Execute directly against DB
Update statistics on DB
Used tuned SELECT as SQL override

Set the Line Sequential Buffer Length session


property to correspond with the record size

70

Source Bottlenecks
Avoid transferring more than once from remote
machine
Avoid reading same data more than once
Filter at source if possible (reduce data set)
Minimize connected outputs from the source
qualifier
Only connect what you need
The DTM only includes connected outputs when it
generates the SQL SELECT statement

71

Reduce Data Set


Remove Unnecessary Ports
Not all ports are needed
Fewer ports = better performance & lower memory req.

Reduce Rows in Pipeline


Place Filter Transformation as far upstream as possible
Filter before aggregator, rank, or sorter if possible
Filter in source qualifier if possible

72

Avoid Unnecessary Sorting


XML_PARSER_
PME_EQT_ENT
_v1_2

srt_ENT_EXCH_
IDNT_SEDOL

srt_ENT_EXCH_
IDNT_GRP

srt_ENT_EXCH
ANGE_GRP

srt_ENT_MKT_I
DNT_GRP

srt_ENT_MKT_
GRP1

srt_ENTITLEME
T

srt_ENT_EXCH_
IDNT_GRP_RIC

srt_ENT_EXCH_
IDNT_TICKER_
SYM

jnr_ENT_TO_M
KT_GRP

srt_ENT_MKT_
GRP

jnr_ENT_MKT_
GRP_TO_MKT_
IDNT_GRP

srt_ENT_MKT_
AND_MKT_IDN
T_GRP

jnr_ENT_MKTG
RP_TO_EXCHG
RP_WITH_MKT
_CODES

srt_ENT_EXCH_
GRP_PK

jnr_ENT_EXCH_
GRP_TO_EXCH
_IDNT

srt_ENT_EXCH_
IDNT_GRP_PK

jnr_ENT_EXCH_
IDNT_GRP_TO
_RIC

srt_ENT_EXCH_
CODE_PK

jnr_ENT_EXCH_
IDNT_GRP_TO srt_ENT_EXCH_
GRP_PK2
_SEDOL

jnr_ENT_EXCH_
IDNT_GRP_TO
_TICK_SYM

srt_ENT_EXCH_
IDNT_BBT_EXC
H_TICKR

73

Expressions Language Tips


Functions are more expensive than operators
Use || instead of CONCAT()

Use variable ports to factor out common logic

74

Expressions Language Tips


Simplify nested functions when possible
instead of:
IIF(condition1,result1,IIF(condition2,
result2,IIF ))))))))))))
try:
DECODE (TRUE,
condition1, result1,
:
conditionn, resultn)

75

General Guidelines
Data Type Conversions are expensive, avoid if
possible
All-input transformations (such as Aggregator,
Rank etc) are more expensive than pass-through
transformations
An all-input transformation must process multiple input
rows before it can produce any output

76

General Guidelines
High precision (session property) is expensive
but only applies to decimal data type
UNICODE requires 2 bytes per character; ASCII
requires 1 byte per character
Performance difference depends on number of string ports
only.

77

Transformation Specific
Reusable Sequence Generator
Number of Cached Values Property
Purpose: enables different sessions to share the
same sequence without generating the same
numbers
>0: allocates the specified number of values &
updates the current value in the repository at the
end of each block
(each session gets a different block of numbers)

78

Transformation Specific
Reusable Sequence Generator
Number of Cached Values Property
Setting too low causes frequent repository
access, which impacts performance
Unused values in a block are lost; this leads to
gaps in the sequence
Consider alternatives
example: non-reusable sequence generators,
one generates even numbers, & the other
generates odd numbers

79

Other Transformations
Normalizer
This transformation INCREASES the number of rows
Place as far downstream as possible

XML Reader/ Mid Stream XML Parser


Remove groups that are not projected
We do not allocate memory for these groups, but still need
to maintain PK/FK relationships
Dont leave port size lengths as infinite. Use appropriate
length.

80

Iterative Process
After tuning your bottlenecks, revisit memory
optimization
Tuning often REDUCES memory requirements
(you might even be able to change some settings back to Auto)

Change one thing at a time & record your results

81

Partitioning
Apply after optimizing source, target, &
transformation bottlenecks
Apply after optimizing memory usage
Exploit under-utilized CPU & memory
To customize partitioning settings, you need the
partitioning license

82

Partitioning Terminology
Partition
subset of the data
Stage
a portion of a pipeline
Partition Point
boundary between 2 stages
Partition Type
algorithm for distributing data among partitions;
always associated with a partition point

83

Threads, Partition Points and Stages


The DTM implements each stage as a thread;
hence, stages run in parallel
You may add or remove partition points

Reader Thread
(First Stage)

Transformation Thread
(Second Stage)

Transform
Writer Thread
Thread
(Third Stage) (Fourth Stage)

84

Rules for Adding Partition Points


You cannot add a partition point to a Sequence
Generator
You cannot add a partition point to an unconnected
transformation
You cannot add a partition point on a source
definition
If a pipeline is split and then concatenated, you
cannot add a partition point on any transformation
between the split and the concatenation
Adding or removing partition points requires the
partitioning license
85

Guidelines for Adding Partition Points


Make sure you have ample CPU bandwidth
Make sure you have gone through other optimization
techniques
Add on complex transformations that could benefit
from additional threads
If you have >1 partitions, add where data needs to be
re-distributed
Aggregator, Rank, or Sorter, where data must be grouped
Where data is distributed unevenly
On partitioned sources and targets

86

Partition Points & Partitions

Partitions subdivide the data

Each partition represents a thread within a stage

Each partition point distributes the data among the partitions

Threads - partition 1
Threads partition 2
Threads partition 3

3 Reader Threads
(First Stage)

3 Transformation Threads
(Second Stage)

3 more trans threads


(Third Stage)

3 Writer Threads
(Fourth Stage)

87

Session Partitioning GUI

The number next to each flag shows the number of partitions

The color of each flag indicates the partition type

88

Rules for Adding Partitions


The master input of a joiner can only have 1 partition
unless you add a partition point at the joiner
A pipeline with an XML target can only have 1
partition
If the pipeline has a relational source or target and
you define n partitions, each database must support
n parallel connections
A pipeline containing a custom or external
procedure transformation can only have 1 partition
unless those transformations are configured to allow
multiple partitions
89

Rules for Adding Partitions


The number of partitions is constant on a given
pipeline
If you have a partition point on a Joiner, the number of
partitions on both inputs will be the same

At each partition point, you can specify how you


want the data distributed among the partitions
(this is known as the partition type)

90

Guidelines for Adding Partitions


Make sure you have ample CPU bandwidth &
memory
Make sure you have gone through other
optimization techniques
Add 1 partition at a time & monitor the CPU
When CPU usage approaches 100%, dont add anymore
partitions

Take advantage of database partitioning

91

Partition Types
Each partition point is associated with a partition
type
The partition type defines how the DTM is to
distribute the data among the partitions
If the pipeline has only 1 partition, the partition
point serves only to add a stage to the pipeline
There are restrictions, enforced by the GUI, on
which partition types are valid at which partition
points

92

Partition Types Pass Through


Data is processed without redistributing the rows
among partitions
Serves only to add a stage to the pipeline
Use when you want an additional thread for a
complex transformation but you dont need to
redistribute the data (or you only have 1 partition)

93

Partition Types Key Range


The DTM passes data to each partition depending on
user-specified ranges
You may use several ports to form a compound
partition key
The DTM discards rows not falling in any specified
range
If 2 or more ranges overlap, a row can go down more
than 1 partition resulting in duplicate data
Use key range partitioning when the sources or
targets in the pipeline are partitioned by key range
94

Partition Types Round Robin


The Integration Service distributes rows of data
evenly to all partitions
Use when there is no need to group data among
partitions
Use when reading flat file sources of different
sizes
Use when data has been partitioned unevenly
upstream and requires significantly more
processing before arriving at the target

95

Partition Types Hash Auto Keys


The DTM applies a hash function to a partition
key to group data among partitions
Use hash partitioning to ensure that groups of
rows are processed in the same partition
The DTM automatically determines the partition
key based on:
aggregator or rank group keys
join keys
sort keys

96

Partition Types Hash User Keys


This is similar to hash auto keys except the user
specifies which ports make up the partition key
Alternative to hard-coded key range partition on
relational target (if DB table is partitioned)

97

Partition Types Database


Only valid for DB2 and Oracle databases in a
multi-node database
Sources: Oracle and DB2
Targets: DB2 only

The number of partitions does not have to equal


the number of database nodes
Performance may be better if they are equal, however

98

Partitioning with Relational Sources


PowerCenter creates a separate source
database connection for each partition
If you define n partitions, the source database
must support n parallel connections
The DTM generates a separate SQL Query for
each partition
Each query can be overridden
PowerCenter reads the data concurrently

99

Partitioning with Flat File Sources


Multiple flat files
Each partition reads a different file
PowerCenter reads the files in parallel
If the files are of unequal sizes, you may want to repartition
the data round-robin

Single flat file


PowerCenter makes multiple parallel connections to the
same file based on the number of partitions specified
PowerCenter distributes the data randomly to the partitions
Over a large volume of data, this random distribution tends
to have an effect similar to round robinpartition sizes
tend to be equal

100

Partitioning with Relational Targets


The DTM creates a separate target database
connection for each partition
The DTM loads data concurrently
If you define n partitions, database must support
n concurrent connections

101

Partitioning with Flat File Targets


The DTM writes output for each partition to a
separate file
Connection settings and properties can be
configured for each partition
The DTM can merge the target files if all have
connections local to the Integration Service
machine
The DTM writes the data concurrently

102

PartitioningMemory Requirements
Minimum number of buffer blocks multiplied by
number of partitions
(2 blocks per source, target, & XML group) x
(number of partitions)

Optimal number of buffer blocks =


(optimal number for 1 partition) x
(number of partitions)

103

Cache Partitioning
DTM may create separate caches for each
partition for each cached transformation; this is
called cache partitioning
DTM treats cache size settings as per partition
for example, if you configure an aggregator with:
2 MB for the index cache,
3 MB for the data cache,
& you create 2 partitions
DTM will allocate up to 4 MB & 6 MB total

DTM does not partition lookup or joiner caches


unless the lookup or joiner itself is a partition
point

104

Cache Partitioning
Index cache Each partition
has its own
Data cache cache(s)
Index cache
Data cache
Sorter cache

Sorter cache

105

Cache Partitioning

Index cache
Data cache

Index cache

With a partition
point on the joiner,
each partition
has its own
cache(s)

Data cache

106

Cache Partitioning

Index cache
Data cache

With no
partition point
on the joiner,
however, all
partitions
share 1 set of
caches

107

Monitoring Partitions
The Workflow Monitor provides runtime details
for each partition
Per partition, you can determine the following:
Number of rows processed
Memory usage
CPU usage

If one partition is doing more work than the


others, you may want to redistribute the data

108

Pipeline Partitioning Example


Scenario:
Student record processing
XML source and Oracle target
XML source is split into 3 files

109

Pipeline Partitioning Example


Partition 1

Partition 2

Partition 3

Solution: Define a partition for each of the 3 source files

110

Pipeline Partitioning Example


RR

RR

RR

Problem: Source files vary in size, resulting


in unequal workloads for each partition
Solution: Use Round Robin partitioning on the
filter to balance load
111

Pipeline Partitioning Example


RR

RR

RR

Problem: Potential for splitting rank groups


Solution: Use hash auto-keys partitioning on the rank
to group rows appropriately
112

Pipeline Partitioning Example


RR

RR

RR

Problem: Target tables are partitioned on Oracle by key range


Solution: Use target Key Range partitioning to optimize writing
to target tables
113

Dynamic Partitioning
Integration Service can automatically set the
number of partitions at runtime.
Useful when the data volume increases or the
number of CPUs available changes.
Basis for the number of partitions is specified as
a session property

114

Concurrent Workflow Execution (8.5)


Prior to 8.5
Only one instance of Workflow can run
Users duplicate workflows maintenance issues
Concurrent sessions required duplicate of session

115

Concurrent Workflow Execution


Allow workflow instances to be run
concurrently
Override parameters/ variables across run
instances
Same scheduler across multiple instances
Supports independent recovery/ failover
semantics

116

Concurrent Workflow Execution

117

Workflow on Grid (WonG)


Integration Service is deployed on a Grid an IS
service process (pmserver) runs on each node in
the grid
Allows tasks of a workflow to be distributed
across a grid no user configuration necessary
if all nodes homogenous

118

Workflow on Grid (WonG)


Different sessions in a workflow are dispatched
on different nodes to balance load
Use workflow on grid if:
There are many concurrent sessions and workflows
Leverage multiple machines in the environment

119

Load Balancer Modes


Round Robin
Honors Max Number of Processes per Node

Metric-based
Evaluates nodes in round-robin
Honors resource provision thresholds
Uses stats from last 3 runs - if no statistics is collected yet,
defaults used (40 MB memory, 15% CPU)

120

Load Balancer Modes


Adaptive
Selects node w/ the most available CPU
Honors resource provision thresholds
Uses statistics from last 3 runs of a task to determine whether a
task can run on a node
Bypass in dispatch queue: skip tasks in the queue that are more
resource intensive and cant be dispatch to any currently
available nodes
CPU Profile - Ranks node CPU performance against a baseline
system

All modes take into account the service level


assigned to workflows

121

Session on Grid (SonG)


Session partitioned and dispatched across
multiple nodes
Allows Unlimited Scalability
Source and targets may be on different nodes
More suited for large sessions
Smaller machines in a grid is a lower cost option
than large multi-CPU machines

122

Session on Grid (SonG)


Session on Grid will scale if:
Sessions are CPU/memory intensive and overcomes
overhead of data movement over network
I/O is kept localized to each node running the partition
There is a fast shared storage (e.g. NAS, clustered FS)
Partitions are independent

Source and target have different connections that are


only available on different machines
E.g. source Excel files on Windows and target is only
available on UNIX

Supported on a homogeneous grid

123

Configuring Session on Grid


Enable Session on Grid attribute in session configuration
tab
Assign workflow to be executed by an integration service
that has been assigned to a grid

124

Dynamic Partitioning
Based on user specification (# partitions)
Can parameterize as $DynamicPartitionCount

Based on # of nodes in grid


Based on source partitioning (Database partitioning)

125

SonG Partitioning Guidelines


Set # of partitions = # of nodes to get an even
distribution
Tip: use dynamic partitioning feature to ease expansion of
grid

In addition, continue to create partition-points to


achieve parallelism

126

SonG Partitioning Guidelines


To minimize data traffic across nodes:
Use pass-through partition type which will try to keep
transformations on the same node
Use resource-map to dispatch the source and target
transformations to the node where source or target are
located
Keep the target files unmerged whenever possible (e.g. if
being used for staging)

Resource requirement should be specified at the


lowest granularity e.g. transformation instead of
session (as far as possible)
This will ensure better distribution in SonG

127

File Placement Best Practices


Files that should be placed on a high-bandwidth shared
file system (CFS / NAS)

Source files
Lookup source files [sequential file access]
Target files [sequential file access]
Persistent cache files for lookup or incremental aggregation [random file
access]

Files that should be placed on a shared file system but


bandwidth requirement is low (NFS)

Parameter files
Other configuration files
Indirect source or target files
Log files.

128

File Placement Best Practices


Files that should be put on local storage
Non-persistent cache files (i.e. sorter temporary files)
Intermediate target files for sequential merge
Other temporary files created during a session execution
$PmTempFileDir should point to a local file system

For best performance, ensure sufficient


bandwidth for shared file system and local
storage (possibly by using additional disk i/o
controllers)

129

Data Integration Certification Path


Level

Certification Title

Recommended Training

Required Exams

Informatica Certified
Administrator

PowerCenter QuickStart (eLearning)


PowerCenter 8.5+ Administrator (4 days)

Architecture & Administration;


Advanced Administration

Informatica Certified
Developer

PowerCenter QuickStart (eLearning)


PowerCenter 8.5+ Administrator (4 days)
PowerCenter Developer 8.x Level I (4 days)
PowerCenter Developer 8 Level II (4 days)

Architecture & Administration;


Mapping Design
Advanced Mapping Design

Informatica Certified
Consultant

PowerCenter QuickStart (eLearning)


PowerCenter 8.5+ Administrator (4 days)
PowerCenter Developer 8.x Level I (4 days)
PowerCenter Developer 8 Level II (4 days)

Architecture & Administration;


Advanced Administration
Mapping Design
Advanced Mapping Design
Enablement Technologies

PowerCenter 8 Data Migration (4 days)


PowerCenter 8 High Availability (1 day)
Additional Training:
PowerCenter 8.5 New Features
PowerCenter 8.6 New Features
PowerCenter 8 Upgrade

PowerCenter 8 Team-Based Development


PowerCenter 8.5 Unified Security `

130

Q&A

Bert Peters
Global Education Services, Principal Instructor

131

Course Evaluation

Bert Peters
Global Education Services, Principal Instructor

132

Appendix
Informatica Services by
Solution

133

B2B Data Exchange


Recommended Services
B2B

Professional Services
Strategy Engagements
B2B Data Transformation
Architectural Review
Baseline Engagements
B2B Data Transformation
Baseline Architecture

Education Services
Recommended Courses
Informatica B2B Data
Transformation (D)
Informatica B2B Data Exchange
(D)

Implement Engagements
B2B Full Project Lifecycle
Transaction/Customer/
Payment Hub

Target Audience for Courses


D = Developer
M = Project Manager
A = Administrator

134

Data Governance
Recommended Services
Professional Services
Strategy Engagements
Informatica Environment
Assessment Service
Metadata Strategy and Enablement
Data Quality Audit
Baseline Engagements
Data Governance Implementation
Metadata Manager Quick Start
Informatica Data Quality Baseline
Deployment
Implement Engagements
Metadata Manager Customization
Data Quality Management
Implementation

Education Services
Recommended Courses
PowerCenter Level I Developer (D)
Informatica Data Explorer (D)
Informatica Data Quality (D)
Related Courses
PowerCenter Administrator (A)
Metadata Manager (D)
Certifications:
PowerCenter
Data Quality

Target Audience for Courses


D = Developer
M = Project Manager
A = Administrator

135

Data Migration
Recommended Services
Data Migration

Professional Services
Strategy Engagements
Data Migration Readiness
Assessment
Informatica Data Quality Audit
Baseline Engagements
PowerCenter Baseline Deployment
Informatica Data Quality (IDQ),
and/or Informatica Data Explorer
(IDE) Baseline Deployment
Implement Engagements
Data Migration Jumpstart
Data Migration End-to-End
Implementation

Education Services
Recommended Courses
Data Migration (M)
Informatica Data Explorer (D)
Informatica Data Quality (D)
PowerCenter Level I Developer (D)
Related Courses
PowerExchange Basics (D)
PowerCenter Administrator (A)
Certifications
PowerCenter
Data Quality

Target Audience for Courses


D = Developer
M = Project Manager
A = Administrator

136

Data Quality
Recommended Services
Data Quality

Professional Services

Education Services

Strategy Engagements
Data Quality Management Strategy
Informatica Data Quality Audit

Recommended Courses
Informatica Data Explorer (D)
Informatica Data Quality (D)

Baseline Engagements
Informatica Data Quality (IDQ),
and/or Informatica Data Explorer
(IDE) Baseline Deployment
Informatica Data Quality Web
Services Quick Start

Related Courses
Informatica Identity Resolution (D)
PowerCenter Level I Developer (D)
Certifications
Data Quality

Implement Engagements
Data Quality Management
Implementation

Target Audience for Courses


D = Developer
M = Project Manager
A = Administrator

137

Data Synchronization
Recommended Services
Data
Synchronization

Professional Services
Strategy Engagements
Project Definition and Assessment
Baseline Engagements
PowerExchange Baseline
Architecture Deployment
PowerCenter Baseline Architecture
Deployment
Implement Engagements
Data Synchronization
Implementation

Education Services
Recommended Courses
PowerCenter Level I Developer (D)
PowerCenter Level II Developer (D)
PowerCenter Administrator (A)
Related Courses
PowerExchange Basics Oracle RealTime CDC (D)
PowerExchange SQL RT (D)
PowerExchange for MVS DB2 (D)
Certifications
PowerCenter

Target Audience for Courses


D = Developer
M = Project Manager
A = Administrator

138

Enterprise Data Warehousing


Recommended Services
Data Warehouse

Professional Services
Strategy Engagements
Enterprise Data Warehousing (EDW)
Strategy
Informatica Environment
Assessment Service
Metadata Strategy & Enablement
Baseline Engagements
PowerCenter Baseline Architecture
Deployment
Implement Engagements
EDW Implementation

Education Services
Recommended Courses
PowerCenter Level I Developer (D)
PowerCenter Level II Developer (D)
PowerCenter Metadata Manager (D)
Related Courses
Informatica Data Quality (D)
Data Warehouse Development (D)
Certifications
PowerCenter

Target Audience for Courses


D = Developer
M = Project Manager
A = Administrator

139

Integration Competency Centers


Recommended Services
ICC

Professional Services
Strategy Engagements
ICC Assessment
Baseline Engagements
ICC Master Class Series
ICC Director
Implement Engagements
ICC Launch
ICC Implementation
Informatica Production Support

Education Services
Recommended Courses
ICC Overview (M)
PowerCenter Level I Developer (D)
PowerCenter Administrator (A)
Related Courses
Metadata Manager (D)
Informatica Data Explorer (D)
Informatica Data Quality (D)
Certifications
PowerCenter
Data Quality

Target Audience for Courses


D = Developer
M = Project Manager
A = Administrator

140

Master Data Management


Recommended Services
Master Data
Management

Professional Services

Education Services

Strategy Engagements
Master Data Management (MDM)
Strategy
Informatica Data Quality Audit

Recommended Courses
Informatica Data Explorer (D)
Informatica Data Quality (D)
PowerCenter Level I Developer (D)

Baseline Engagements
Informatica Data Explorer (IDE)
Baseline Deployment
Informatica Data Quality (IDQ)
Baseline Deployment
PowerCenter Baseline Architecture
Deployment

Related Courses
Metadata Manager (D)
Informatica Identity Resolution (D)
Certifications
PowerCenter
Data Quality

Implementation
MDM Implementation
Target Audience for Courses
D = Developer
M = Project Manager
A = Administrator

141

Services Oriented Architecture


Recommended Services
Data Services

Professional Services
Strategy Engagements
Data Services (SOA) Strategy
Baseline Engagements
Informatica Web Services Quick
Start
Informatica Data Quality Web
Services Quick Start

Education Services
Recommended Courses
PowerCenter Level I Developer (D)
Informatica Data Quality (D)
Certifications
PowerCenter
Data Quality

Implement Engagements
Data Services (SOA) Implementation

Target Audience for Courses


D = Developer
M = Project Manager
A = Administrator

142

Governance, Risk & Compliance (GRC)


Recommended Services
Professional Services
Strategy Engagements
Informatica Environment
Assessment Service
Enterprise Data Warehouse Strategy
Data Quality Audit
Baseline Engagements
Informatica Data Quality Baseline
Deployment
Metadata Manager Quick Start
Implement Engagements
Risk Management Enablement Kit
Enterprise Data Warehouse
Implementation

Education Services
Recommended Courses
PowerCenter Level I Developer (D)
Informatica Data Explorer (D)
Informatica Data Quality (D)
Related Courses
Data Warehouse Development (D)
ICC Overview (M)
Metadata Manager (D)
Certifications
PowerCenter
Data Quality
Target Audience for Courses
D = Developer
M = Project Manager
A = Administrator

143

Mergers & Acquisitions (M&A)


Recommended Services
Professional Services
Strategy Engagements
Data Migration Readiness
Assessment
Informatica Data Quality Audit
Baseline Engagements
PowerCenter Baseline Deployment
Informatica Data Quality (IDQ),
and/or Informatica Data Explorer
(IDE) Baseline Deployment
Implement Engagements
Data Migration Jumpstart
Data Migration End-to-End
Implementation

Education Services
Recommended Courses
Data Migration (M)
PowerCenter Level I Developer (D)
Related Courses
Informatica Data Explorer (D)
Informatica Data Quality (D)
PowerExchange Basics (D)
Certifications
PowerCenter
Data Quality

Target Audience for Courses


D = Developer
M = Project Manager
A = Administrator

144

Deliver Your
Project Right
the First Time
with
Informatica
Professional
Services

145

Informatica Global Education Services


Joe Caputo, Director, Pfizer

"We launched an aggressive data migration project


that was to be completed in one year. The
complexity of the data schema along with the use of
Informatica PowerCenter tools proved challenging
to our top colleagues.
We believe that Informatica training led us to triple
productivity, helping us to complete the project on
its original 1-year schedule.

146

Informatica Contact Information


Informatica Corporation Headquarters
100 Cardinal Way
Redwood City, CA 94063
Tel: 650-385-5000
Toll-free: 800-653-3871
Toll-free Sales: 888-635-0899
Fax: 650-385-5500

Informatica EMEA Headquarters

Informatica Asia/Pacific Headquarters

Informatica Nederland B.V.


Edisonbaan 14a
3439 MN Nieuwegein
Postbus 116
3430 AC Nieuwegein
Tel: +31 (0) 30-608-6700
Fax: +31 (0) 30-608-6777

Informatica Australia Pty Ltd


Level 5, 255 George Street
Sydney
N.S.W. 2000
Australia
Tel: +612-8907-4400
Fax: +612-8907-4499

Global Customer Support


support@informatica.com
Register at my.informatica.com to open a new service request
or to check on the status of an existing SR.

http://www.informatica.com

147

You might also like