You are on page 1of 147

1

Performance Tuning
Version 8.6

Bert Peters
Global Education Services, Principal Instructor

2
Objectives

After completing this course you will be able to:


• Control how PowerCenter uses memory
• Control how PowerCenter uses CPUs
• Understand the performance counters
• Isolate source, target and engine bottlenecks
• Tune different types of bottlenecks
• Configure Workflow and Session on Grid

3
Agenda

• Memory optimization
• Performance tuning methodology
• Tuning source, target, & mapping bottlenecks
• Pipeline partitioning
• Server Grid
• Q&A
• Course evaluation

4
Anatomy of a Session

Integration Service

Data Transformation Manager


(DTM)

DTM Buffer

Source Target
WRITER data
data READER

Transformation
caches
TRANSFORMER

5
Memory Optimization

DTM Buffer

READER WRITER

TRANSFORMER

Transformation Caches

6
DTM Buffer

• Temporary storage area for data


• Buffer is divided into blocks
• Buffer size and block size are tunable
• Default setting for each is Auto

7
DTM Buffer Size – Session Property

• Default is Auto meaning DTM estimates optimal size


• Check session log for actual size allocation

8
DTM Buffer Block Size

• Default is Auto
• Check session log for actual size allocation

9
Reader Bottleneck

Transformer & writer threads wait for data

DTM Buffer
waiting

READER WRITER
Slow reader

waiting waiting

TRANSFORMER

10
Transformer Bottleneck

Reader waits for free blocks; writer waits for data

DTM Buffer
waiting waiting

READER WRITER

TRANSFORMER
Slow transformer

11
Writer Bottleneck

Reader & transformer wait for free blocks

DTM Buffer
waiting

READER WRITER
Slow writer

waiting waiting

TRANSFORMER

12
Source Row Logging

DTM Buffer
waiting

READER WRITER

TRANSFORMER
Source rows must remain in the buffers until transformation/
writer threads process corresponding rows downstream

13
Large Commit Interval

DTM Buffer
waiting

READER WRITER

TRANSFORMER

Target rows remain in the buffers until the DTM reaches the
commit point

14
Tuning the DTM Buffer

Extra buffers can keep threads busy


DTM Buffer

READER WRITER

TRANSFORMER

15
Tuning the DTM Buffer

• Temporary slowdowns in reading, transforming


or writing may cause large fluctuations in
throughput
• A “slow” thread typically provides data in
spurts
• Extra memory blocks can act as a “cushion”,
keeping other threads busy in case of a
bottleneck

16
Tuning the DTM Buffer

• Buffer block size


• Recommendation: at least 100 rows / block
• Compute based on largest source or target row size
• Typically not a significant bottleneck unless below 10
rows/buffer

• Number of blocks
• Minimum of 2 blocks required for each source, target and
XML group
• (number of blocks) =
0.9 x ((DTM buffer size)/(buffer block size))

17
Tuning the DTM Buffer

• Determine the minimum DTM buffer size


(DTM buffer size) =
(buffer block size) x (minimum number of blocks) / 0.9

• Increase by a multiple of the block size


• If performance does not improve, return to
previous setting
• There is no “formula” for optimal DTM buffer size
• Auto setting may be adequate for some sessions

18
Transformation Caches

• Temporary storage area for certain transformations


• Except for Sorter, each is divided into a Data & Index
Cache
• The size of each transformation cache is tunable
• If runtime cache requirement > setting, overflow
written to disk
• The default setting for each cache is Auto

19
Tuning the Transformation Caches

Default is Auto

20
Max Memory for Transformation Caches

Only applies to transformation caches set to Auto

21
Max Memory for Transformation Caches

• Two settings: fixed number & percentage


• System uses the smaller of the two
• If either setting is 0, DTM assigns a default size to each
transformation cache that’s set to Auto

• Recommendation: use fixed limit if this is the


only session running; otherwise, use percentage
• Use percentage if running in grid or HA
environment

22
Tuning the Transformation Caches

• If a cache setting is too small, DTM writes


overflow to disk
• Determine if transformation caches are
overflowing:
• Watch the cache directory on the file system while the
session runs
• Use the session performance counters

• Options to tune:
• Increase the maximum memory allowed for Auto
transformation cache sizes
• Set the cache sizes for individual transformations manually

23
Session Performance Counters

24
Performance Counters

25
Tuning the Transformation Caches

• Non-0 counts for readfromdisk and writetodisk


indicate sub-optimal settings for transformation
index or data caches
• This may indicate the need to tune transformation
caches manually
• Any manual setting allocates memory outside of
previously set maximum
• Cache Calculators provide guidance in manual
tuning of transformation caches

26
Aggregator Caches

• Unsorted Input
• Must read all input before releasing any output rows
• Index cache contains group keys
• Data cache contains non-group-by ports

• Sorted Input
• Releases output row as each input group is processed
• Does not require data or index cache
(both =0)
• May run much faster than unsorted BUT
must consider the expense of sorting

27
Aggregator Caches – Manual Tuning

28
Joiner Caches: Unsorted Input

MASTER
Staging algorithm:
All master data loaded
into cache

Specify smaller data


set as master

DETAIL

• Index cache contains join keys


• Data cache contains non-key connected outputs

29
Joiner Caches: Sorted Input

Streaming algorithm: MASTER


Both inputs must be
sorted on join keys

Selected master data


loaded into cache

Specify data set with


fewest records under a
single key as master DETAIL

• Index cache contains up to 100 keys


• Data cache contains non-key connected outputs
associated with the 100 keys

30
Joiner Caches – Manual Tuning

Cache calculator detects the sorted input property

31
Lookup Caches

• To cache or not to cache?


• Large number of invocations – cache
• Large lookup table – don’t cache
• Flat file lookup is always cached

32
Lookup Caches

• Data cache
• Only connected output ports included in data cache
• For unconnected lookup, only “return” port included in
data cache

• Index cache size


• Only lookup keys included in index cache

33
Lookup Caches

• Lookup Transformation – Fine-tuning the


Cache
• SQL override
• Persistent cache (if the lookup data is static)
• Optimize sort
• Default- lookup keys, then connected output ports in port order
• Can be commented out or overridden in SQL override
• Indexing strategy on table may impact performance
• Use Any Value property suppresses sort

34
Lookup Caches

• Can build lookup caches concurrently


• May improve session performance when there is significant
activity upstream from the lookup & the lookup cache is large
• This option applies to the individual session

• Integration Service builds lookup caches at the


beginning of the session run, even if no row has
entered a Lookup transformation

Session properties > Config


Object tab > Advanced settings

35
Lookup Caches – Manual Tuning

36
Rank Caches

• Index cache contains group keys


• Data cache contains non-group-by ports
• Cache sizes related to the number of groups &
the number of ranks

37
Rank Caches – Manual Tuning

38
Sorter Cache

• Sorter Transformation
• May be faster than a DB sort or 3rd party sorter
• Index read from RDB = pre-sorted data
• SQL SELECT DISTINCT may reduce the volume of data
across the network versus sorter with “Distinct” property set

• Single cache
(no separation of index & data)

39
Sorter Cache – Manual Tuning

40
64 bit vs. 32 bit OS

• Take advantage of large memory support in 64-


bit
• Cache based transformations like Sorter,
Lookup, Aggregator, Joiner, and XML Target can
address larger blocks of memory

41
Maximum Memory Allocation Example

• Parameters
• 64 Bit OS
• Total system memory: 32 GB
• Maximum allowed for transformation caches: 5 GB or 10%
• DTM Buffer: 24 MB
• One transformation manually configured
Index Cache: 10 MB
Data Cache: 20 MB
• All other transformations set to Auto

42
Maximum Memory Allocation Example

• Result
• 10% = 3.2 GB < 5 GB:
max allowed for transformation caches = 3.2 GB = 3200
MB
• Manually configured transformation uses 30 MB
• DTM Buffer uses 24 MB
• 3200 + 30 + 24 = 3254 MB
• Note that 3254 MB represents an upper limit; cached
transformations may use less than the 3200 MB max

43
Performance Tuning Methodology

• It is an iterative process
• Establish benchmark
• Optimize memory
• Isolate bottleneck
• Tune bottleneck
• Take advantage of under-utilized CPU & memory

44
The Production Environment

• Multi-vendor, multi-system environment with


many components:
• Operating systems, databases, networks and I/O
• Usually need to monitor performance in several places
• Usually need to monitor outside Informatica as well

Disk Disk

Disk Disk Disk Disk


LAN / DBMS
Disk WAN OS Disk
Disk Disk
Disk PowerCenter Disk
Disk Disk

45
The Production Environment

• Tuning involves an iterative approach


1. Identify the biggest performance problem
2. Eliminate or reduce it
3. Return to step 1

Disk Disk

Disk Disk Disk Disk


LAN / DBMS
Disk WAN OS Disk
Disk Disk
Disk PowerCenter Disk
Disk Disk

46
Preliminary Steps

• Eliminate transformation errors & data rejects


“first make it work, then make it faster”
• Source row logging requires reader to hold onto buffers
until data is written to target, EVEN IF THERE ARE NO
ERRORS; can significantly increase DTM buffer
requirement
• You may want to set stop on errors to 1

47
Preliminary Steps

• Override tracing level to terse or normal


• Override at session level to avoid having to examine each
transformation in the mapping
• Only use verbose tracing during development & only with
very small data sets
• If you expect row errors that you will not need to correct,
avoid logging them by overriding the tracing level to terse
(not recommended as a long-term error handling solution)

48
Benchmarking

• Hardware (CPU bandwidth, RAM, disk space,


etc.) should be similar to production
• Database configuration should be similar to
production
• Data volume should be similar to production
• Challenge: production data is constantly
changing
• Optimal tuning may be data dependent
• Estimate “average” behavior
• Estimate “worst case” behavior

49
Benchmarking – Conditional Branching

Scenario: a high percentage of test data goes to TARGET1;


but a high percentage of production data goes to TARGET2

Tuning of sorter & aggregator could be overlooked in test

50
Benchmarking – Conditional Branching

Scenario: a high percentage of production data goes to


TARGET1 on Monday’s load; but a high percentage of production
data goes to TARGET2 on Tuesday’s load

Performance of 2 loads may differ significantly

51
Benchmarking – Conditional Branching

• Conditional branching poses a challenge in


performance tuning
• Volume & CHARACTERISTICS of data should be
consistent between test & production
• May need to estimate average behavior
• May want to tune for worst-case scenario

52
Identifying Bottlenecks

• The first challenge is to identify the bottleneck


• Target
• Source
• Transformations
• Mapping/Session

• Tuning the most severe bottleneck may reveal


another one
• This is an iterative process

53
Thread Statistics

• The DTM spawns multiple threads


• Each thread has busy time & idle time
• Goal – maximize the busy time & minimize the
idle time

54
Thread Statistics - Terminology

• A pipeline consists of:


• A source qualifier
• The sources that feed that source qualifier
• All transformations and targets that receive data from that
source qualifier

55
Thread Statistics - Terminology

PIPELINE 1 A pipeline on the master


MA
input of a joiner terminates
ST
ER at the joiner

DETAIL

PIPELINE 2

56
Thread Statistics - Terminology

• Stage
a portion of a pipeline; implemented at runtime
as a thread
• Partition Point
boundary between 2 stages; always associated
with a transformation

57
Using Thread Statistics

• By default PowerCenter assigns a partition point


( ) at each Source Qualifier, Target, Aggregator
and Rank.
Partition Points

Reader Thread Transformation Thread Transform Writer Thread


Thread
(First Stage) (Second Stage) (Third Stage) (Fourth Stage)

58
Target Bottleneck

• The Aggregator transformation stage is waiting


for target buffers

Reader Thread Transformation Thread Transform Writer Thread


Thread
(First Stage) (Second Stage) (Third Stage) (Fourth Stage)
Busy% Busy% Busy%=15 Busy%=95

59
Transformation Bottleneck

• Both the reader & writer are waiting for buffers

Reader Thread Transformation Thread Transform Writer Thread


Thread
(First Stage) (Second Stage) (Third Stage) (Fourth Stage)
Busy%=15 Busy%=60 Busy%=95 Busy%=10

60
Thread Statistics in Session Log

***** RUN INFO FOR TGT LOAD ORDER GROUP [1], CONCURRENT SET [1] *****
Thread [READER_1_1_1] created for [the read stage] of partition point
[SQ_SortMergeDataSize_Detail] has completed.
Total Run Time = [318.271977] secs
Total Idle Time = [176.488675] secs
Busy Percentage = [44.547843]
Thread [TRANSF_1_1_1] created for [the transformation stage] of partition point
[SQ_SortMergeDataSize_Detail] has completed.
Total Run Time = [707.803168] secs
Total Idle Time = [105.303059] secs
Busy Percentage = [85.122550]
Thread work time breakdown:
JNRTRANS: 10.869565 percent
SRTTRANS: 89.130435 percent

61
Performance Counters in WF Monitor

62
Integration Service Monitor in WFMonitor

63
Session Statistics in WFMonitor

64
Other Methods of Bottleneck Isolation

• Write to flat file


If significantly faster than relational target – Target
Bottleneck
• Place FALSE Filter right after Source Qualifier
If significantly faster – Transformation Bottleneck
• If target & transformation bottlenecks are ruled
out – Source Bottleneck

65
Target Optimization

• Target Optimization often involves non-


Informatica components
• Drop Indexes and Constraints
• Use pre/post SQL to drop and rebuild
• Use pre/post-load stored procedures

• Use constraint-based loading only when


necessary

66
Target Optimization

• Use Bulk Loading


• Informatica bypasses the database log
• Target cannot perform rollback
• Weigh importance of performance over recovery

• Use External Loader


• Similar to bulk loader, but the DB reads from a flat file

67
Target Optimization

Transaction Control
• Target commit type
• Best performance, least precise control
• System avoids writing partially-filled buffers
• Source commit type
• Last active source to feed a target becomes a transaction generator
• Commit interval provides precise control
• Slower than target commit type
• Avoid setting commit interval too low
• User Defined commit type
• Required when mapping contains transaction control transformation
• Provides precise data-driven control
• Slower than target and source commit types

68
Target Optimization

• “update else insert” session property


• Works well if you rarely insert
• Index required for update key but slows down insert
• PowerCenter must wait for database to return error before
inserting

• Alternative – lookup followed by update strategy

69
Source Bottlenecks

• Source optimization often involves non-


Informatica components
• Generated SQL available in session log
• Execute directly against DB
• Update statistics on DB
• Used tuned SELECT as SQL override

• Set the Line Sequential Buffer Length session


property to correspond with the record size

70
Source Bottlenecks

• Avoid transferring more than once from remote


machine
• Avoid reading same data more than once
• Filter at source if possible (reduce data set)
• Minimize connected outputs from the source
qualifier
• Only connect what you need
• The DTM only includes connected outputs when it
generates the SQL SELECT statement

71
Reduce Data Set

• Remove Unnecessary Ports


• Not all ports are needed
• Fewer ports = better performance & lower memory req.

• Reduce Rows in Pipeline


• Place Filter Transformation as far upstream as possible
• Filter before aggregator, rank, or sorter if possible
• Filter in source qualifier if possible

72
Avoid Unnecessary Sorting

XML_PARSER_ srt_ENT_EXCH_
PME_EQT_ENT IDNT_SEDOL
_v1_2

srt_ENT_EXCH_
IDNT_GRP

srt_ENT_EXCH
ANGE_GRP

srt_ENT_MKT_I
DNT_GRP

srt_ENT_MKT_
GRP1

srt_ENTITLEME jnr_ENT_TO_M srt_ENT_MKT_ jnr_ENT_MKT_ srt_ENT_MKT_ jnr_ENT_MKTG srt_ENT_EXCH_ jnr_ENT_EXCH_ srt_ENT_EXCH_


T KT_GRP GRP GRP_TO_MKT_ AND_MKT_IDN RP_TO_EXCHG GRP_PK GRP_TO_EXCH IDNT_GRP_PK
IDNT_GRP T_GRP RP_WITH_MKT _IDNT
_CODES

jnr_ENT_EXCH_
srt_ENT_EXCH_ jnr_ENT_EXCH_ srt_ENT_EXCH_ IDNT_GRP_TO srt_ENT_EXCH_
IDNT_GRP_RIC IDNT_GRP_TO CODE_PK _SEDOL GRP_PK2
_RIC

srt_ENT_EXCH_ jnr_ENT_EXCH_
IDNT_TICKER_ IDNT_GRP_TO
SYM _TICK_SYM

srt_ENT_EXCH_
IDNT_BBT_EXC
H_TICKR

73
Expressions Language Tips

• Functions are more expensive than operators


• Use || instead of CONCAT()

• Use variable ports to factor out common logic

74
Expressions Language Tips

• Simplify nested functions when possible

instead of:
IIF(condition1,result1,IIF(condition2,
result2,IIF… ))))))))))))

try:
DECODE (TRUE,
condition1, result1,
:
conditionn, resultn)

75
General Guidelines

• Data Type Conversions are expensive, avoid if


possible
• All-input transformations (such as Aggregator,
Rank etc) are more expensive than pass-through
transformations
• An all-input transformation must process multiple input
rows before it can produce any output

76
General Guidelines

• High precision (session property) is expensive


but only applies to “decimal” data type
• UNICODE requires 2 bytes per character; ASCII
requires 1 byte per character
• Performance difference depends on number of string ports
only.

77
Transformation Specific

Reusable Sequence Generator –


Number of Cached Values Property
• Purpose: enables different sessions to share the
same sequence without generating the same
numbers
• >0: allocates the specified number of values &
updates the current value in the repository at the
end of each block
(each session gets a different block of numbers)

78
Transformation Specific

Reusable Sequence Generator –


Number of Cached Values Property
• Setting too low causes frequent repository
access, which impacts performance
• Unused values in a block are lost; this leads to
gaps in the sequence
• Consider alternatives
example: non-reusable sequence generators,
one generates even numbers, & the other
generates odd numbers

79
Other Transformations

• Normalizer
• This transformation INCREASES the number of rows
• Place as far downstream as possible

• XML Reader/ Mid Stream XML Parser


• Remove groups that are not projected
• We do not allocate memory for these groups, but still need
to maintain PK/FK relationships
• Don’t leave port size lengths as infinite. Use appropriate
length.

80
Iterative Process

• After tuning your bottlenecks, revisit memory


optimization
• Tuning often REDUCES memory requirements
(you might even be able to change some settings back to Auto)

• Change one thing at a time & record your results

81
Partitioning

• Apply after optimizing source, target, &


transformation bottlenecks
• Apply after optimizing memory usage
• Exploit under-utilized CPU & memory
• To customize partitioning settings, you need the
partitioning license

82
Partitioning Terminology

• Partition
subset of the data
• Stage
a portion of a pipeline
• Partition Point
boundary between 2 stages
• Partition Type
algorithm for distributing data among partitions;
always associated with a partition point

83
Threads, Partition Points and Stages

• The DTM implements each stage as a thread;


hence, stages run in parallel
• You may add or remove partition points

Reader Thread Transformation Thread Transform Writer Thread


Thread
(First Stage) (Second Stage) (Third Stage) (Fourth Stage)

84
Rules for Adding Partition Points

• You cannot add a partition point to a Sequence


Generator
• You cannot add a partition point to an unconnected
transformation
• You cannot add a partition point on a source
definition
• If a pipeline is split and then concatenated, you
cannot add a partition point on any transformation
between the split and the concatenation
• Adding or removing partition points requires the
partitioning license

85
Guidelines for Adding Partition Points

• Make sure you have ample CPU bandwidth


• Make sure you have gone through other optimization
techniques
• Add on complex transformations that could benefit
from additional threads
• If you have >1 partitions, add where data needs to be
re-distributed
• Aggregator, Rank, or Sorter, where data must be grouped
• Where data is distributed unevenly
• On partitioned sources and targets

86
Partition Points & Partitions

• Partitions subdivide the data


• Each partition represents a thread within a stage
• Each partition point distributes the data among the partitions

Threads - partition 1
Threads – partition 2
Threads – partition 3

3 Reader Threads 3 Transformation Threads 3 more trans threads 3 Writer Threads


(First Stage) (Second Stage) (Third Stage) (Fourth Stage)

87
Session Partitioning GUI

• The number next to each flag shows the number of partitions


• The color of each flag indicates the partition type

88
Rules for Adding Partitions

• The master input of a joiner can only have 1 partition


unless you add a partition point at the joiner
• A pipeline with an XML target can only have 1
partition
• If the pipeline has a relational source or target and
you define n partitions, each database must support
n parallel connections
• A pipeline containing a custom or external
procedure transformation can only have 1 partition
unless those transformations are configured to allow
multiple partitions

89
Rules for Adding Partitions

• The number of partitions is constant on a given


pipeline
• If you have a partition point on a Joiner, the number of
partitions on both inputs will be the same

• At each partition point, you can specify how you


want the data distributed among the partitions
(this is known as the partition type)

90
Guidelines for Adding Partitions

• Make sure you have ample CPU bandwidth &


memory
• Make sure you have gone through other
optimization techniques
• Add 1 partition at a time & monitor the CPU
• When CPU usage approaches 100%, don’t add anymore
partitions

• Take advantage of database partitioning

91
Partition Types

• Each partition point is associated with a partition


type
• The partition type defines how the DTM is to
distribute the data among the partitions
• If the pipeline has only 1 partition, the partition
point serves only to add a stage to the pipeline
• There are restrictions, enforced by the GUI, on
which partition types are valid at which partition
points

92
Partition Types – Pass Through

• Data is processed without redistributing the rows


among partitions
• Serves only to add a stage to the pipeline
• Use when you want an additional thread for a
complex transformation but you don’t need to
redistribute the data (or you only have 1 partition)

93
Partition Types – Key Range

• The DTM passes data to each partition depending on


user-specified ranges
• You may use several ports to form a compound
partition key
• The DTM discards rows not falling in any specified
range
• If 2 or more ranges overlap, a row can go down more
than 1 partition resulting in duplicate data
• Use key range partitioning when the sources or
targets in the pipeline are partitioned by key range

94
Partition Types – Round Robin

• The Integration Service distributes rows of data


evenly to all partitions
• Use when there is no need to group data among
partitions
• Use when reading flat file sources of different
sizes
• Use when data has been partitioned unevenly
upstream and requires significantly more
processing before arriving at the target

95
Partition Types – Hash Auto Keys

• The DTM applies a hash function to a partition


key to group data among partitions
• Use hash partitioning to ensure that groups of
rows are processed in the same partition
• The DTM automatically determines the partition
key based on:
• aggregator or rank group keys
• join keys
• sort keys

96
Partition Types – Hash User Keys

• This is similar to hash auto keys except the user


specifies which ports make up the partition key
• Alternative to hard-coded key range partition on
relational target (if DB table is partitioned)

97
Partition Types – Database

• Only valid for DB2 and Oracle databases in a


multi-node database
• Sources: Oracle and DB2
• Targets: DB2 only

• The number of partitions does not have to equal


the number of database nodes
• Performance may be better if they are equal, however

98
Partitioning with Relational Sources

• PowerCenter creates a separate source


database connection for each partition
• If you define n partitions, the source database
must support n parallel connections
• The DTM generates a separate SQL Query for
each partition
• Each query can be overridden
• PowerCenter reads the data concurrently

99
Partitioning with Flat File Sources

• Multiple flat files


• Each partition reads a different file
• PowerCenter reads the files in parallel
• If the files are of unequal sizes, you may want to repartition
the data round-robin

• Single flat file


• PowerCenter makes multiple parallel connections to the
same file based on the number of partitions specified
• PowerCenter distributes the data randomly to the partitions
• Over a large volume of data, this random distribution tends
to have an effect similar to round robin—partition sizes
tend to be equal

100
Partitioning with Relational Targets

• The DTM creates a separate target database


connection for each partition
• The DTM loads data concurrently
• If you define n partitions, database must support
n concurrent connections

101
Partitioning with Flat File Targets

• The DTM writes output for each partition to a


separate file
• Connection settings and properties can be
configured for each partition
• The DTM can merge the target files if all have
connections local to the Integration Service
machine
• The DTM writes the data concurrently

102
Partitioning—Memory Requirements

• Minimum number of buffer blocks multiplied by


number of partitions
(2 blocks per source, target, & XML group) x
(number of partitions)

• Optimal number of buffer blocks =


(optimal number for 1 partition) x
(number of partitions)

103
Cache Partitioning

• DTM may create separate caches for each


partition for each cached transformation; this is
called cache partitioning
• DTM treats cache size settings as per partition
for example, if you configure an aggregator with:
2 MB for the index cache,
3 MB for the data cache,
& you create 2 partitions–
DTM will allocate up to 4 MB & 6 MB total

• DTM does not partition lookup or joiner caches


unless the lookup or joiner itself is a partition
point

104
Cache Partitioning

Index cache Each partition


has its own
Data cache cache(s)

Index cache
Data cache

Sorter cache

Sorter cache

105
Cache Partitioning

Index cache With a partition


Data cache point on the joiner,
each partition
has its own
Index cache cache(s)

Data cache

106
Cache Partitioning

With no
partition point
on the joiner,
however, all
Index cache partitions
share 1 set of
Data cache
caches

107
Monitoring Partitions

• The Workflow Monitor provides runtime details


for each partition
• Per partition, you can determine the following:
• Number of rows processed
• Memory usage
• CPU usage

• If one partition is doing more work than the


others, you may want to redistribute the data

108
Pipeline Partitioning Example

• Scenario:
• Student record processing
• XML source and Oracle target
• XML source is split into 3 files

109
Pipeline Partitioning Example

Partition 1

Partition 2

Partition 3

Solution: Define a partition for each of the 3 source files

110
Pipeline Partitioning Example

RR

RR

RR

Problem: Source files vary in size, resulting


in unequal workloads for each partition
Solution: Use Round Robin partitioning on the
filter to balance load

111
Pipeline Partitioning Example

RR H

RR H

RR H

Problem: Potential for splitting rank groups


Solution: Use hash auto-keys partitioning on the rank
to group rows appropriately

112
Pipeline Partitioning Example

RR H K

RR H K

RR H K

Problem: Target tables are partitioned on Oracle by key range


Solution: Use target Key Range partitioning to optimize writing
to target tables

113
Dynamic Partitioning

• Integration Service can automatically set the


number of partitions at runtime.
• Useful when the data volume increases or the
number of CPU’s available changes.
• Basis for the number of partitions is specified as
a session property

114
Concurrent Workflow Execution (8.5)

• Prior to 8.5

• Only one instance of Workflow can run

• Users duplicate workflows – maintenance issues

• Concurrent sessions required duplicate of session

115
Concurrent Workflow Execution

• Allow workflow instances to be run


concurrently
• Override parameters/ variables across run
instances
• Same scheduler across multiple instances
• Supports independent recovery/ failover
semantics

116
Concurrent Workflow Execution

117
Workflow on Grid (WonG)

• Integration Service is deployed on a Grid – an IS


service process (pmserver) runs on each node in
the grid
• Allows tasks of a workflow to be distributed
across a grid – no user configuration necessary
if all nodes homogenous

118
Workflow on Grid (WonG)

• Different sessions in a workflow are dispatched


on different nodes to balance load
• Use workflow on grid if:
• There are many concurrent sessions and workflows
• Leverage multiple machines in the environment

119
Load Balancer Modes

• Round Robin
• Honors Max Number of Processes per Node

• Metric-based
• Evaluates nodes in round-robin
• Honors resource provision thresholds
• Uses stats from last 3 runs - if no statistics is collected yet,
defaults used (40 MB memory, 15% CPU)

120
Load Balancer Modes

• Adaptive
• Selects node w/ the most available CPU
• Honors resource provision thresholds
• Uses statistics from last 3 runs of a task to determine whether a
task can run on a node
• Bypass in dispatch queue: skip tasks in the queue that are more
resource intensive and can’t be dispatch to any currently
available nodes
• CPU Profile - Ranks node CPU performance against a baseline
system

• All modes take into account the service level


assigned to workflows

121
Session on Grid (SonG)

• Session partitioned and dispatched across


multiple nodes
• Allows Unlimited Scalability
• Source and targets may be on different nodes
• More suited for large sessions
• Smaller machines in a grid is a lower cost option
than large multi-CPU machines

122
Session on Grid (SonG)

• Session on Grid will scale if:


• Sessions are CPU/memory intensive and overcomes
overhead of data movement over network
• I/O is kept localized to each node running the partition
• There is a fast shared storage (e.g. NAS, clustered FS)
• Partitions are independent
• Source and target have different connections that are
only available on different machines
• E.g. source Excel files on Windows and target is only
available on UNIX
• Supported on a homogeneous grid

123
Configuring Session on Grid

• Enable Session on Grid attribute in session configuration


tab
• Assign workflow to be executed by an integration service
that has been assigned to a grid

124
Dynamic Partitioning

• Based on user specification (# partitions)


• Can parameterize as $DynamicPartitionCount

• Based on # of nodes in grid


• Based on source partitioning (Database partitioning)

125
SonG Partitioning Guidelines

• Set # of partitions = # of nodes to get an even


distribution
• Tip: use dynamic partitioning feature to ease expansion of
grid

• In addition, continue to create partition-points to


achieve parallelism

126
SonG Partitioning Guidelines

• To minimize data traffic across nodes:


• Use pass-through partition type which will try to keep
transformations on the same node
• Use resource-map to dispatch the source and target
transformations to the node where source or target are
located
• Keep the target files unmerged whenever possible (e.g. if
being used for staging)

• Resource requirement should be specified at the


lowest granularity e.g. transformation instead of
session (as far as possible)
• This will ensure better distribution in SonG

127
File Placement Best Practices

• Files that should be placed on a high-bandwidth shared


file system (CFS / NAS)
• Source files
• Lookup source files [sequential file access]
• Target files [sequential file access]
• Persistent cache files for lookup or incremental aggregation [random file
access]

• Files that should be placed on a shared file system but


bandwidth requirement is low (NFS)
• Parameter files
• Other configuration files
• Indirect source or target files
• Log files.

128
File Placement Best Practices

• Files that should be put on local storage


• Non-persistent cache files (i.e. sorter temporary files)
• Intermediate target files for sequential merge
• Other temporary files created during a session execution
• $PmTempFileDir should point to a local file system

• For best performance, ensure sufficient


bandwidth for shared file system and local
storage (possibly by using additional disk i/o
controllers)

129
Data Integration Certification Path
Level Certification Title Recommended Training Required Exams

Informatica Certified » PowerCenter QuickStart (eLearning) »Architecture & Administration;


Administrator » PowerCenter 8.5+ Administrator (4 days) »Advanced Administration

Informatica Certified » PowerCenter QuickStart (eLearning) »Architecture & Administration;


Developer » PowerCenter 8.5+ Administrator (4 days) »Mapping Design
» PowerCenter Developer 8.x Level I (4 days) »Advanced Mapping Design
» PowerCenter Developer 8 Level II (4 days)

Informatica Certified » PowerCenter QuickStart (eLearning) »Architecture & Administration;


Consultant » PowerCenter 8.5+ Administrator (4 days) »Advanced Administration
» PowerCenter Developer 8.x Level I (4 days) »Mapping Design
» PowerCenter Developer 8 Level II (4 days) »Advanced Mapping Design
»Enablement Technologies
» PowerCenter 8 Data Migration (4 days)
» PowerCenter 8 High Availability (1 day)

Additional Training:
» PowerCenter 8.5 New Features » PowerCenter 8 Team-Based Development
» PowerCenter 8.6 New Features » PowerCenter 8.5 Unified Security `
» PowerCenter 8 Upgrade

130
Q&A

Bert Peters
Global Education Services, Principal Instructor

131
Course Evaluation

Bert Peters
Global Education Services, Principal Instructor

132
Appendix
Informatica Services by
Solution

133
B2B Data Exchange
Recommended Services
B2B

Professional Services Education Services


Strategy Engagements Recommended Courses
• B2B Data Transformation • Informatica B2B Data
Architectural Review Transformation (D)
Baseline Engagements • Informatica B2B Data Exchange
(D)
• B2B Data Transformation
Baseline Architecture
Implement Engagements
• B2B Full Project Lifecycle
• Transaction/Customer/
Payment Hub

Target Audience for Courses


D = Developer M = Project Manager
A = Administrator
134
Data Governance
Recommended Services

Professional Services Education Services


Strategy Engagements Recommended Courses
• Informatica Environment • PowerCenter Level I Developer (D)
Assessment Service • Informatica Data Explorer (D)
• Metadata Strategy and Enablement • Informatica Data Quality (D)
• Data Quality Audit
Related Courses
Baseline Engagements • PowerCenter Administrator (A)
• Data Governance Implementation • Metadata Manager (D)
• Metadata Manager Quick Start
• Informatica Data Quality Baseline
Deployment Certifications:
• PowerCenter
Implement Engagements • Data Quality
• Metadata Manager Customization
• Data Quality Management
Implementation
Target Audience for Courses
D = Developer M = Project Manager
A = Administrator
135
Data Migration
Recommended Services
Data Migration

Professional Services Education Services


Strategy Engagements Recommended Courses
• Data Migration Readiness • Data Migration (M)
Assessment • Informatica Data Explorer (D)
• Informatica Data Quality Audit • Informatica Data Quality (D)
Baseline Engagements • PowerCenter Level I Developer (D)
• PowerCenter Baseline Deployment Related Courses
• Informatica Data Quality (IDQ), • PowerExchange Basics (D)
and/or Informatica Data Explorer • PowerCenter Administrator (A)
(IDE) Baseline Deployment
Certifications
Implement Engagements
• PowerCenter
• Data Migration Jumpstart
• Data Quality
• Data Migration End-to-End
Implementation

Target Audience for Courses


D = Developer M = Project Manager
A = Administrator
136
Data Quality
Recommended Services
Data Quality

Professional Services Education Services


Strategy Engagements Recommended Courses
• Data Quality Management Strategy • Informatica Data Explorer (D)
• Informatica Data Quality Audit • Informatica Data Quality (D)
Baseline Engagements Related Courses
• Informatica Data Quality (IDQ), • Informatica Identity Resolution (D)
and/or Informatica Data Explorer • PowerCenter Level I Developer (D)
(IDE) Baseline Deployment
• Informatica Data Quality Web Certifications
Services Quick Start • Data Quality
Implement Engagements
• Data Quality Management
Implementation

Target Audience for Courses


D = Developer M = Project Manager
A = Administrator
137
Data Synchronization
Recommended Services
Data
Synchronization

Professional Services Education Services


Strategy Engagements Recommended Courses
• Project Definition and Assessment • PowerCenter Level I Developer (D)
Baseline Engagements • PowerCenter Level II Developer (D)
• PowerExchange Baseline • PowerCenter Administrator (A)
Architecture Deployment
Related Courses
• PowerCenter Baseline Architecture
• PowerExchange Basics Oracle Real-
Deployment
Time CDC (D)
Implement Engagements • PowerExchange SQL RT (D)
• Data Synchronization • PowerExchange for MVS DB2 (D)
Implementation
Certifications
• PowerCenter

Target Audience for Courses


D = Developer M = Project Manager
A = Administrator
138
Enterprise Data Warehousing
Recommended Services
Data Warehouse

Professional Services Education Services


Strategy Engagements Recommended Courses
• Enterprise Data Warehousing (EDW) • PowerCenter Level I Developer (D)
Strategy • PowerCenter Level II Developer (D)
• Informatica Environment • PowerCenter Metadata Manager (D)
Assessment Service
• Metadata Strategy & Enablement Related Courses
• Informatica Data Quality (D)
Baseline Engagements • Data Warehouse Development (D)
• PowerCenter Baseline Architecture
Deployment Certifications
• PowerCenter
Implement Engagements
• EDW Implementation

Target Audience for Courses


D = Developer M = Project Manager
A = Administrator
139
Integration Competency Centers
Recommended Services
ICC

Professional Services Education Services


Strategy Engagements Recommended Courses
• ICC Assessment • ICC Overview (M)
Baseline Engagements • PowerCenter Level I Developer (D)
• ICC Master Class Series • PowerCenter Administrator (A)
• ICC Director
Implement Engagements Related Courses
• ICC Launch • Metadata Manager (D)
• ICC Implementation • Informatica Data Explorer (D)
• Informatica Production Support • Informatica Data Quality (D)
Certifications
• PowerCenter
• Data Quality

Target Audience for Courses


D = Developer M = Project Manager
A = Administrator
140
Master Data Management
Recommended Services
Master Data
Management

Professional Services Education Services


Strategy Engagements Recommended Courses
• Master Data Management (MDM) • Informatica Data Explorer (D)
Strategy • Informatica Data Quality (D)
• Informatica Data Quality Audit • PowerCenter Level I Developer (D)
Baseline Engagements Related Courses
• Informatica Data Explorer (IDE) • Metadata Manager (D)
Baseline Deployment • Informatica Identity Resolution (D)
• Informatica Data Quality (IDQ)
Baseline Deployment Certifications
• PowerCenter Baseline Architecture • PowerCenter
Deployment • Data Quality
Implementation
• MDM Implementation

Target Audience for Courses


D = Developer M = Project Manager
A = Administrator
141
Services Oriented Architecture
Recommended Services
Data Services

Professional Services Education Services


Strategy Engagements Recommended Courses
• Data Services (SOA) Strategy • PowerCenter Level I Developer (D)
Baseline Engagements • Informatica Data Quality (D)
• Informatica Web Services Quick Certifications
Start • PowerCenter
• Informatica Data Quality Web • Data Quality
Services Quick Start
Implement Engagements
• Data Services (SOA) Implementation

Target Audience for Courses


D = Developer M = Project Manager
A = Administrator
142
Governance, Risk & Compliance (GRC)
Recommended Services

Professional Services Education Services


Strategy Engagements Recommended Courses
• Informatica Environment • PowerCenter Level I Developer (D)
Assessment Service • Informatica Data Explorer (D)
• Enterprise Data Warehouse Strategy • Informatica Data Quality (D)
• Data Quality Audit
Related Courses
Baseline Engagements
• Data Warehouse Development (D)
• Informatica Data Quality Baseline
Deployment • ICC Overview (M)
• Metadata Manager Quick Start • Metadata Manager (D)

Implement Engagements Certifications


• Risk Management Enablement Kit • PowerCenter
• Enterprise Data Warehouse • Data Quality
Implementation

Target Audience for Courses


D = Developer M = Project Manager
A = Administrator
143
Mergers & Acquisitions (M&A)
Recommended Services

Professional Services Education Services


Strategy Engagements Recommended Courses
• Data Migration Readiness • Data Migration (M)
Assessment • PowerCenter Level I Developer (D)
• Informatica Data Quality Audit
Related Courses
Baseline Engagements • Informatica Data Explorer (D)
• PowerCenter Baseline Deployment • Informatica Data Quality (D)
• Informatica Data Quality (IDQ), • PowerExchange Basics (D)
and/or Informatica Data Explorer
(IDE) Baseline Deployment Certifications
• PowerCenter
Implement Engagements
• Data Quality
• Data Migration Jumpstart
• Data Migration End-to-End
Implementation

Target Audience for Courses


D = Developer M = Project Manager
A = Administrator
144
Deliver Your
Project Right
the First Time
with
Informatica
Professional
Services

145
Informatica Global Education Services

Joe Caputo, Director, Pfizer

"We launched an aggressive data migration project


that was to be completed in one year. The
complexity of the data schema along with the use of
Informatica PowerCenter tools proved challenging
to our top colleagues.

We believe that Informatica training led us to triple


productivity, helping us to complete the project on
its original 1-year schedule.”

146
Informatica Contact Information

Informatica Corporation Headquarters


100 Cardinal Way
Redwood City, CA 94063
Tel: 650-385-5000
Toll-free: 800-653-3871
Toll-free Sales: 888-635-0899
Fax: 650-385-5500

Informatica EMEA Headquarters Informatica Asia/Pacific Headquarters


Informatica Nederland B.V. Informatica Australia Pty Ltd
Edisonbaan 14a Level 5, 255 George Street
3439 MN Nieuwegein Sydney
Postbus 116 N.S.W. 2000
3430 AC Nieuwegein Australia
Tel: +31 (0) 30-608-6700 Tel: +612-8907-4400
Fax: +31 (0) 30-608-6777 Fax: +612-8907-4499
Global Customer Support
support@informatica.com
Register at my.informatica.com to open a new service request
or to check on the status of an existing SR.

http://www.informatica.com

147

You might also like