Professional Documents
Culture Documents
Performance Tuning
Version 8.6
Bert Peters
Global Education Services, Principal Instructor
Objectives
After completing this course you will be able to:
Control how PowerCenter uses memory
Control how PowerCenter uses CPUs
Understand the performance counters
Isolate source, target and engine bottlenecks
Tune different types of bottlenecks
Configure Workflow and Session on Grid
Agenda
Memory optimization
Performance tuning methodology
Tuning source, target, & mapping bottlenecks
Pipeline partitioning
Server Grid
Q&A
Course evaluation
Anatomy of a Session
Integration Service
Data Transformation Manager
(DTM)
DTM Buffer
Target
WRITER data
Source
data READER
Transformation
caches
TRANSFORMER
5
Memory Optimization
DTM Buffer
WRITER
READER
TRANSFORMER
Transformation Caches
DTM Buffer
Temporary storage area for data
Buffer is divided into blocks
Buffer size and block size are tunable
Default setting for each is Auto
Default is Auto
Check session log for actual size allocation
Reader Bottleneck
Transformer & writer threads wait for data
DTM Buffer
waiting
WRITER
READER
Slow reader
waiting
waiting
TRANSFORMER
10
Transformer Bottleneck
Reader waits for free blocks; writer waits for data
DTM Buffer
waiting
waiting
WRITER
READER
TRANSFORMER
Slow transformer
11
Writer Bottleneck
Reader & transformer wait for free blocks
DTM Buffer
waiting
WRITER
READER
Slow writer
waiting
waiting
TRANSFORMER
12
WRITER
READER
TRANSFORMER
Source rows must remain in the buffers until transformation/
writer threads process corresponding rows downstream
13
WRITER
READER
TRANSFORMER
Target rows remain in the buffers until the DTM reaches the
commit point
14
WRITER
READER
TRANSFORMER
15
16
Number of blocks
Minimum of 2 blocks required for each source, target and
XML group
(number of blocks) =
0.9 x ((DTM buffer size)/(buffer block size))
17
18
Transformation Caches
Temporary storage area for certain transformations
Except for Sorter, each is divided into a Data & Index
Cache
The size of each transformation cache is tunable
If runtime cache requirement > setting, overflow
written to disk
The default setting for each cache is Auto
19
Default is Auto
20
21
22
Options to tune:
Increase the maximum memory allowed for Auto
transformation cache sizes
Set the cache sizes for individual transformations manually
23
24
Performance Counters
25
26
Aggregator Caches
Unsorted Input
Must read all input before releasing any output rows
Index cache contains group keys
Data cache contains non-group-by ports
Sorted Input
Releases output row as each input group is processed
Does not require data or index cache
(both =0)
May run much faster than unsorted BUT
must consider the expense of sorting
27
28
Staging algorithm:
All master data loaded
into cache
Specify smaller data
set as master
DETAIL
29
MASTER
DETAIL
31
Lookup Caches
To cache or not to cache?
Large number of invocations cache
Large lookup table dont cache
Flat file lookup is always cached
32
Lookup Caches
Data cache
Only connected output ports included in data cache
For unconnected lookup, only return port included in
data cache
33
Lookup Caches
Lookup Transformation Fine-tuning the
Cache
SQL override
Persistent cache (if the lookup data is static)
Optimize sort
34
Lookup Caches
Can build lookup caches concurrently
May improve session performance when there is significant
activity upstream from the lookup & the lookup cache is large
This option applies to the individual session
35
36
Rank Caches
Index cache contains group keys
Data cache contains non-group-by ports
Cache sizes related to the number of groups &
the number of ranks
37
38
Sorter Cache
Sorter Transformation
May be faster than a DB sort or 3rd party sorter
Index read from RDB = pre-sorted data
SQL SELECT DISTINCT may reduce the volume of data
across the network versus sorter with Distinct property set
Single cache
(no separation of index & data)
39
40
41
64 Bit OS
Total system memory: 32 GB
Maximum allowed for transformation caches: 5 GB or 10%
DTM Buffer: 24 MB
One transformation manually configured
Index Cache: 10 MB
Data Cache: 20 MB
All other transformations set to Auto
42
43
Establish benchmark
Optimize memory
Isolate bottleneck
Tune bottleneck
Take advantage of under-utilized CPU & memory
44
Disk
Disk
Disk
Disk
Disk
Disk
Disk
Disk
LAN /
WAN
Disk
Disk
Disk
Disk
Disk
Disk
DBMS
OS
PowerCenter
45
Disk
Disk
Disk
Disk
Disk
Disk
Disk
Disk
LAN /
WAN
Disk
Disk
Disk
Disk
Disk
Disk
DBMS
OS
PowerCenter
46
Preliminary Steps
Eliminate transformation errors & data rejects
47
Preliminary Steps
Override tracing level to terse or normal
Override at session level to avoid having to examine each
transformation in the mapping
Only use verbose tracing during development & only with
very small data sets
If you expect row errors that you will not need to correct,
avoid logging them by overriding the tracing level to terse
(not recommended as a long-term error handling solution)
48
Benchmarking
Hardware (CPU bandwidth, RAM, disk space,
etc.) should be similar to production
Database configuration should be similar to
production
Data volume should be similar to production
Challenge: production data is constantly
changing
Optimal tuning may be data dependent
Estimate average behavior
Estimate worst case behavior
49
50
52
Identifying Bottlenecks
The first challenge is to identify the bottleneck
Target
Source
Transformations
Mapping/Session
53
Thread Statistics
The DTM spawns multiple threads
Each thread has busy time & idle time
Goal maximize the busy time & minimize the
idle time
54
55
ST
ER
DETAIL
PIPELINE 2
56
57
Reader Thread
Transformation Thread
(First Stage)
(Second Stage)
Transform
Writer Thread
Thread
(Third Stage) (Fourth Stage)
58
Target Bottleneck
The Aggregator transformation stage is waiting
for target buffers
Reader Thread
Transformation Thread
(First Stage)
(Second Stage)
Busy%
Busy%
Transform
Thread
(Third Stage)
Busy%=15
Writer Thread
(Fourth Stage)
Busy%=95
59
Transformation Bottleneck
Both the reader & writer are waiting for buffers
Reader Thread
Transformation Thread
(First Stage)
(Second Stage)
Busy%=15
Busy%=60
Transform
Writer Thread
Thread
(Third Stage) (Fourth Stage)
Busy%=95
Busy%=10
60
61
62
63
64
65
Target Optimization
Target Optimization often involves nonInformatica components
Drop Indexes and Constraints
Use pre/post SQL to drop and rebuild
Use pre/post-load stored procedures
66
Target Optimization
Use Bulk Loading
Informatica bypasses the database log
Target cannot perform rollback
Weigh importance of performance over recovery
67
Target Optimization
Transaction Control
68
Target Optimization
update else insert session property
Works well if you rarely insert
Index required for update key but slows down insert
PowerCenter must wait for database to return error before
inserting
69
Source Bottlenecks
Source optimization often involves nonInformatica components
Generated SQL available in session log
Execute directly against DB
Update statistics on DB
Used tuned SELECT as SQL override
70
Source Bottlenecks
Avoid transferring more than once from remote
machine
Avoid reading same data more than once
Filter at source if possible (reduce data set)
Minimize connected outputs from the source
qualifier
Only connect what you need
The DTM only includes connected outputs when it
generates the SQL SELECT statement
71
72
srt_ENT_EXCH_
IDNT_SEDOL
srt_ENT_EXCH_
IDNT_GRP
srt_ENT_EXCH
ANGE_GRP
srt_ENT_MKT_I
DNT_GRP
srt_ENT_MKT_
GRP1
srt_ENTITLEME
T
srt_ENT_EXCH_
IDNT_GRP_RIC
srt_ENT_EXCH_
IDNT_TICKER_
SYM
jnr_ENT_TO_M
KT_GRP
srt_ENT_MKT_
GRP
jnr_ENT_MKT_
GRP_TO_MKT_
IDNT_GRP
srt_ENT_MKT_
AND_MKT_IDN
T_GRP
jnr_ENT_MKTG
RP_TO_EXCHG
RP_WITH_MKT
_CODES
srt_ENT_EXCH_
GRP_PK
jnr_ENT_EXCH_
GRP_TO_EXCH
_IDNT
srt_ENT_EXCH_
IDNT_GRP_PK
jnr_ENT_EXCH_
IDNT_GRP_TO
_RIC
srt_ENT_EXCH_
CODE_PK
jnr_ENT_EXCH_
IDNT_GRP_TO srt_ENT_EXCH_
GRP_PK2
_SEDOL
jnr_ENT_EXCH_
IDNT_GRP_TO
_TICK_SYM
srt_ENT_EXCH_
IDNT_BBT_EXC
H_TICKR
73
74
75
General Guidelines
Data Type Conversions are expensive, avoid if
possible
All-input transformations (such as Aggregator,
Rank etc) are more expensive than pass-through
transformations
An all-input transformation must process multiple input
rows before it can produce any output
76
General Guidelines
High precision (session property) is expensive
but only applies to decimal data type
UNICODE requires 2 bytes per character; ASCII
requires 1 byte per character
Performance difference depends on number of string ports
only.
77
Transformation Specific
Reusable Sequence Generator
Number of Cached Values Property
Purpose: enables different sessions to share the
same sequence without generating the same
numbers
>0: allocates the specified number of values &
updates the current value in the repository at the
end of each block
(each session gets a different block of numbers)
78
Transformation Specific
Reusable Sequence Generator
Number of Cached Values Property
Setting too low causes frequent repository
access, which impacts performance
Unused values in a block are lost; this leads to
gaps in the sequence
Consider alternatives
example: non-reusable sequence generators,
one generates even numbers, & the other
generates odd numbers
79
Other Transformations
Normalizer
This transformation INCREASES the number of rows
Place as far downstream as possible
80
Iterative Process
After tuning your bottlenecks, revisit memory
optimization
Tuning often REDUCES memory requirements
(you might even be able to change some settings back to Auto)
81
Partitioning
Apply after optimizing source, target, &
transformation bottlenecks
Apply after optimizing memory usage
Exploit under-utilized CPU & memory
To customize partitioning settings, you need the
partitioning license
82
Partitioning Terminology
Partition
subset of the data
Stage
a portion of a pipeline
Partition Point
boundary between 2 stages
Partition Type
algorithm for distributing data among partitions;
always associated with a partition point
83
Reader Thread
(First Stage)
Transformation Thread
(Second Stage)
Transform
Writer Thread
Thread
(Third Stage) (Fourth Stage)
84
86
Threads - partition 1
Threads partition 2
Threads partition 3
3 Reader Threads
(First Stage)
3 Transformation Threads
(Second Stage)
3 Writer Threads
(Fourth Stage)
87
88
90
91
Partition Types
Each partition point is associated with a partition
type
The partition type defines how the DTM is to
distribute the data among the partitions
If the pipeline has only 1 partition, the partition
point serves only to add a stage to the pipeline
There are restrictions, enforced by the GUI, on
which partition types are valid at which partition
points
92
93
95
96
97
98
99
100
101
102
PartitioningMemory Requirements
Minimum number of buffer blocks multiplied by
number of partitions
(2 blocks per source, target, & XML group) x
(number of partitions)
103
Cache Partitioning
DTM may create separate caches for each
partition for each cached transformation; this is
called cache partitioning
DTM treats cache size settings as per partition
for example, if you configure an aggregator with:
2 MB for the index cache,
3 MB for the data cache,
& you create 2 partitions
DTM will allocate up to 4 MB & 6 MB total
104
Cache Partitioning
Index cache Each partition
has its own
Data cache cache(s)
Index cache
Data cache
Sorter cache
Sorter cache
105
Cache Partitioning
Index cache
Data cache
Index cache
With a partition
point on the joiner,
each partition
has its own
cache(s)
Data cache
106
Cache Partitioning
Index cache
Data cache
With no
partition point
on the joiner,
however, all
partitions
share 1 set of
caches
107
Monitoring Partitions
The Workflow Monitor provides runtime details
for each partition
Per partition, you can determine the following:
Number of rows processed
Memory usage
CPU usage
108
109
Partition 2
Partition 3
110
RR
RR
RR
RR
RR
RR
Dynamic Partitioning
Integration Service can automatically set the
number of partitions at runtime.
Useful when the data volume increases or the
number of CPUs available changes.
Basis for the number of partitions is specified as
a session property
114
115
116
117
118
119
Metric-based
Evaluates nodes in round-robin
Honors resource provision thresholds
Uses stats from last 3 runs - if no statistics is collected yet,
defaults used (40 MB memory, 15% CPU)
120
121
122
123
124
Dynamic Partitioning
Based on user specification (# partitions)
Can parameterize as $DynamicPartitionCount
125
126
127
Source files
Lookup source files [sequential file access]
Target files [sequential file access]
Persistent cache files for lookup or incremental aggregation [random file
access]
Parameter files
Other configuration files
Indirect source or target files
Log files.
128
129
Certification Title
Recommended Training
Required Exams
Informatica Certified
Administrator
Informatica Certified
Developer
Informatica Certified
Consultant
130
Q&A
Bert Peters
Global Education Services, Principal Instructor
131
Course Evaluation
Bert Peters
Global Education Services, Principal Instructor
132
Appendix
Informatica Services by
Solution
133
Professional Services
Strategy Engagements
B2B Data Transformation
Architectural Review
Baseline Engagements
B2B Data Transformation
Baseline Architecture
Education Services
Recommended Courses
Informatica B2B Data
Transformation (D)
Informatica B2B Data Exchange
(D)
Implement Engagements
B2B Full Project Lifecycle
Transaction/Customer/
Payment Hub
134
Data Governance
Recommended Services
Professional Services
Strategy Engagements
Informatica Environment
Assessment Service
Metadata Strategy and Enablement
Data Quality Audit
Baseline Engagements
Data Governance Implementation
Metadata Manager Quick Start
Informatica Data Quality Baseline
Deployment
Implement Engagements
Metadata Manager Customization
Data Quality Management
Implementation
Education Services
Recommended Courses
PowerCenter Level I Developer (D)
Informatica Data Explorer (D)
Informatica Data Quality (D)
Related Courses
PowerCenter Administrator (A)
Metadata Manager (D)
Certifications:
PowerCenter
Data Quality
135
Data Migration
Recommended Services
Data Migration
Professional Services
Strategy Engagements
Data Migration Readiness
Assessment
Informatica Data Quality Audit
Baseline Engagements
PowerCenter Baseline Deployment
Informatica Data Quality (IDQ),
and/or Informatica Data Explorer
(IDE) Baseline Deployment
Implement Engagements
Data Migration Jumpstart
Data Migration End-to-End
Implementation
Education Services
Recommended Courses
Data Migration (M)
Informatica Data Explorer (D)
Informatica Data Quality (D)
PowerCenter Level I Developer (D)
Related Courses
PowerExchange Basics (D)
PowerCenter Administrator (A)
Certifications
PowerCenter
Data Quality
136
Data Quality
Recommended Services
Data Quality
Professional Services
Education Services
Strategy Engagements
Data Quality Management Strategy
Informatica Data Quality Audit
Recommended Courses
Informatica Data Explorer (D)
Informatica Data Quality (D)
Baseline Engagements
Informatica Data Quality (IDQ),
and/or Informatica Data Explorer
(IDE) Baseline Deployment
Informatica Data Quality Web
Services Quick Start
Related Courses
Informatica Identity Resolution (D)
PowerCenter Level I Developer (D)
Certifications
Data Quality
Implement Engagements
Data Quality Management
Implementation
137
Data Synchronization
Recommended Services
Data
Synchronization
Professional Services
Strategy Engagements
Project Definition and Assessment
Baseline Engagements
PowerExchange Baseline
Architecture Deployment
PowerCenter Baseline Architecture
Deployment
Implement Engagements
Data Synchronization
Implementation
Education Services
Recommended Courses
PowerCenter Level I Developer (D)
PowerCenter Level II Developer (D)
PowerCenter Administrator (A)
Related Courses
PowerExchange Basics Oracle RealTime CDC (D)
PowerExchange SQL RT (D)
PowerExchange for MVS DB2 (D)
Certifications
PowerCenter
138
Professional Services
Strategy Engagements
Enterprise Data Warehousing (EDW)
Strategy
Informatica Environment
Assessment Service
Metadata Strategy & Enablement
Baseline Engagements
PowerCenter Baseline Architecture
Deployment
Implement Engagements
EDW Implementation
Education Services
Recommended Courses
PowerCenter Level I Developer (D)
PowerCenter Level II Developer (D)
PowerCenter Metadata Manager (D)
Related Courses
Informatica Data Quality (D)
Data Warehouse Development (D)
Certifications
PowerCenter
139
Professional Services
Strategy Engagements
ICC Assessment
Baseline Engagements
ICC Master Class Series
ICC Director
Implement Engagements
ICC Launch
ICC Implementation
Informatica Production Support
Education Services
Recommended Courses
ICC Overview (M)
PowerCenter Level I Developer (D)
PowerCenter Administrator (A)
Related Courses
Metadata Manager (D)
Informatica Data Explorer (D)
Informatica Data Quality (D)
Certifications
PowerCenter
Data Quality
140
Professional Services
Education Services
Strategy Engagements
Master Data Management (MDM)
Strategy
Informatica Data Quality Audit
Recommended Courses
Informatica Data Explorer (D)
Informatica Data Quality (D)
PowerCenter Level I Developer (D)
Baseline Engagements
Informatica Data Explorer (IDE)
Baseline Deployment
Informatica Data Quality (IDQ)
Baseline Deployment
PowerCenter Baseline Architecture
Deployment
Related Courses
Metadata Manager (D)
Informatica Identity Resolution (D)
Certifications
PowerCenter
Data Quality
Implementation
MDM Implementation
Target Audience for Courses
D = Developer
M = Project Manager
A = Administrator
141
Professional Services
Strategy Engagements
Data Services (SOA) Strategy
Baseline Engagements
Informatica Web Services Quick
Start
Informatica Data Quality Web
Services Quick Start
Education Services
Recommended Courses
PowerCenter Level I Developer (D)
Informatica Data Quality (D)
Certifications
PowerCenter
Data Quality
Implement Engagements
Data Services (SOA) Implementation
142
Education Services
Recommended Courses
PowerCenter Level I Developer (D)
Informatica Data Explorer (D)
Informatica Data Quality (D)
Related Courses
Data Warehouse Development (D)
ICC Overview (M)
Metadata Manager (D)
Certifications
PowerCenter
Data Quality
Target Audience for Courses
D = Developer
M = Project Manager
A = Administrator
143
Education Services
Recommended Courses
Data Migration (M)
PowerCenter Level I Developer (D)
Related Courses
Informatica Data Explorer (D)
Informatica Data Quality (D)
PowerExchange Basics (D)
Certifications
PowerCenter
Data Quality
144
Deliver Your
Project Right
the First Time
with
Informatica
Professional
Services
145
146
http://www.informatica.com
147