Professional Documents
Culture Documents
Performance Tuning
Version 8.6
Bert Peters
Global Education Services, Principal Instructor
2
Objectives
3
Agenda
• Memory optimization
• Performance tuning methodology
• Tuning source, target, & mapping bottlenecks
• Pipeline partitioning
• Server Grid
• Q&A
• Course evaluation
4
Anatomy of a Session
Integration Service
DTM Buffer
Source Target
WRITER data
data READER
Transformation
caches
TRANSFORMER
5
Memory Optimization
DTM Buffer
READER WRITER
TRANSFORMER
Transformation Caches
6
DTM Buffer
7
DTM Buffer Size – Session Property
8
DTM Buffer Block Size
• Default is Auto
• Check session log for actual size allocation
9
Reader Bottleneck
DTM Buffer
waiting
READER WRITER
Slow reader
waiting waiting
TRANSFORMER
10
Transformer Bottleneck
DTM Buffer
waiting waiting
READER WRITER
TRANSFORMER
Slow transformer
11
Writer Bottleneck
DTM Buffer
waiting
READER WRITER
Slow writer
waiting waiting
TRANSFORMER
12
Source Row Logging
DTM Buffer
waiting
READER WRITER
TRANSFORMER
Source rows must remain in the buffers until transformation/
writer threads process corresponding rows downstream
13
Large Commit Interval
DTM Buffer
waiting
READER WRITER
TRANSFORMER
Target rows remain in the buffers until the DTM reaches the
commit point
14
Tuning the DTM Buffer
READER WRITER
TRANSFORMER
15
Tuning the DTM Buffer
16
Tuning the DTM Buffer
• Number of blocks
• Minimum of 2 blocks required for each source, target and
XML group
• (number of blocks) =
0.9 x ((DTM buffer size)/(buffer block size))
17
Tuning the DTM Buffer
18
Transformation Caches
19
Tuning the Transformation Caches
Default is Auto
20
Max Memory for Transformation Caches
21
Max Memory for Transformation Caches
22
Tuning the Transformation Caches
• Options to tune:
• Increase the maximum memory allowed for Auto
transformation cache sizes
• Set the cache sizes for individual transformations manually
23
Session Performance Counters
24
Performance Counters
25
Tuning the Transformation Caches
26
Aggregator Caches
• Unsorted Input
• Must read all input before releasing any output rows
• Index cache contains group keys
• Data cache contains non-group-by ports
• Sorted Input
• Releases output row as each input group is processed
• Does not require data or index cache
(both =0)
• May run much faster than unsorted BUT
must consider the expense of sorting
27
Aggregator Caches – Manual Tuning
28
Joiner Caches: Unsorted Input
MASTER
Staging algorithm:
All master data loaded
into cache
DETAIL
29
Joiner Caches: Sorted Input
30
Joiner Caches – Manual Tuning
31
Lookup Caches
32
Lookup Caches
• Data cache
• Only connected output ports included in data cache
• For unconnected lookup, only “return” port included in
data cache
33
Lookup Caches
34
Lookup Caches
35
Lookup Caches – Manual Tuning
36
Rank Caches
37
Rank Caches – Manual Tuning
38
Sorter Cache
• Sorter Transformation
• May be faster than a DB sort or 3rd party sorter
• Index read from RDB = pre-sorted data
• SQL SELECT DISTINCT may reduce the volume of data
across the network versus sorter with “Distinct” property set
• Single cache
(no separation of index & data)
39
Sorter Cache – Manual Tuning
40
64 bit vs. 32 bit OS
41
Maximum Memory Allocation Example
• Parameters
• 64 Bit OS
• Total system memory: 32 GB
• Maximum allowed for transformation caches: 5 GB or 10%
• DTM Buffer: 24 MB
• One transformation manually configured
Index Cache: 10 MB
Data Cache: 20 MB
• All other transformations set to Auto
42
Maximum Memory Allocation Example
• Result
• 10% = 3.2 GB < 5 GB:
max allowed for transformation caches = 3.2 GB = 3200
MB
• Manually configured transformation uses 30 MB
• DTM Buffer uses 24 MB
• 3200 + 30 + 24 = 3254 MB
• Note that 3254 MB represents an upper limit; cached
transformations may use less than the 3200 MB max
43
Performance Tuning Methodology
• It is an iterative process
• Establish benchmark
• Optimize memory
• Isolate bottleneck
• Tune bottleneck
• Take advantage of under-utilized CPU & memory
44
The Production Environment
Disk Disk
45
The Production Environment
Disk Disk
46
Preliminary Steps
47
Preliminary Steps
48
Benchmarking
49
Benchmarking – Conditional Branching
50
Benchmarking – Conditional Branching
51
Benchmarking – Conditional Branching
52
Identifying Bottlenecks
53
Thread Statistics
54
Thread Statistics - Terminology
55
Thread Statistics - Terminology
DETAIL
PIPELINE 2
56
Thread Statistics - Terminology
• Stage
a portion of a pipeline; implemented at runtime
as a thread
• Partition Point
boundary between 2 stages; always associated
with a transformation
57
Using Thread Statistics
58
Target Bottleneck
59
Transformation Bottleneck
60
Thread Statistics in Session Log
***** RUN INFO FOR TGT LOAD ORDER GROUP [1], CONCURRENT SET [1] *****
Thread [READER_1_1_1] created for [the read stage] of partition point
[SQ_SortMergeDataSize_Detail] has completed.
Total Run Time = [318.271977] secs
Total Idle Time = [176.488675] secs
Busy Percentage = [44.547843]
Thread [TRANSF_1_1_1] created for [the transformation stage] of partition point
[SQ_SortMergeDataSize_Detail] has completed.
Total Run Time = [707.803168] secs
Total Idle Time = [105.303059] secs
Busy Percentage = [85.122550]
Thread work time breakdown:
JNRTRANS: 10.869565 percent
SRTTRANS: 89.130435 percent
61
Performance Counters in WF Monitor
62
Integration Service Monitor in WFMonitor
63
Session Statistics in WFMonitor
64
Other Methods of Bottleneck Isolation
65
Target Optimization
66
Target Optimization
67
Target Optimization
Transaction Control
• Target commit type
• Best performance, least precise control
• System avoids writing partially-filled buffers
• Source commit type
• Last active source to feed a target becomes a transaction generator
• Commit interval provides precise control
• Slower than target commit type
• Avoid setting commit interval too low
• User Defined commit type
• Required when mapping contains transaction control transformation
• Provides precise data-driven control
• Slower than target and source commit types
68
Target Optimization
69
Source Bottlenecks
70
Source Bottlenecks
71
Reduce Data Set
72
Avoid Unnecessary Sorting
XML_PARSER_ srt_ENT_EXCH_
PME_EQT_ENT IDNT_SEDOL
_v1_2
srt_ENT_EXCH_
IDNT_GRP
srt_ENT_EXCH
ANGE_GRP
srt_ENT_MKT_I
DNT_GRP
srt_ENT_MKT_
GRP1
jnr_ENT_EXCH_
srt_ENT_EXCH_ jnr_ENT_EXCH_ srt_ENT_EXCH_ IDNT_GRP_TO srt_ENT_EXCH_
IDNT_GRP_RIC IDNT_GRP_TO CODE_PK _SEDOL GRP_PK2
_RIC
srt_ENT_EXCH_ jnr_ENT_EXCH_
IDNT_TICKER_ IDNT_GRP_TO
SYM _TICK_SYM
srt_ENT_EXCH_
IDNT_BBT_EXC
H_TICKR
73
Expressions Language Tips
74
Expressions Language Tips
instead of:
IIF(condition1,result1,IIF(condition2,
result2,IIF… ))))))))))))
try:
DECODE (TRUE,
condition1, result1,
:
conditionn, resultn)
75
General Guidelines
76
General Guidelines
77
Transformation Specific
78
Transformation Specific
79
Other Transformations
• Normalizer
• This transformation INCREASES the number of rows
• Place as far downstream as possible
80
Iterative Process
81
Partitioning
82
Partitioning Terminology
• Partition
subset of the data
• Stage
a portion of a pipeline
• Partition Point
boundary between 2 stages
• Partition Type
algorithm for distributing data among partitions;
always associated with a partition point
83
Threads, Partition Points and Stages
84
Rules for Adding Partition Points
85
Guidelines for Adding Partition Points
86
Partition Points & Partitions
Threads - partition 1
Threads – partition 2
Threads – partition 3
87
Session Partitioning GUI
88
Rules for Adding Partitions
89
Rules for Adding Partitions
90
Guidelines for Adding Partitions
91
Partition Types
92
Partition Types – Pass Through
93
Partition Types – Key Range
94
Partition Types – Round Robin
95
Partition Types – Hash Auto Keys
96
Partition Types – Hash User Keys
97
Partition Types – Database
98
Partitioning with Relational Sources
99
Partitioning with Flat File Sources
100
Partitioning with Relational Targets
101
Partitioning with Flat File Targets
102
Partitioning—Memory Requirements
103
Cache Partitioning
104
Cache Partitioning
Index cache
Data cache
Sorter cache
Sorter cache
105
Cache Partitioning
Data cache
106
Cache Partitioning
With no
partition point
on the joiner,
however, all
Index cache partitions
share 1 set of
Data cache
caches
107
Monitoring Partitions
108
Pipeline Partitioning Example
• Scenario:
• Student record processing
• XML source and Oracle target
• XML source is split into 3 files
109
Pipeline Partitioning Example
Partition 1
Partition 2
Partition 3
110
Pipeline Partitioning Example
RR
RR
RR
111
Pipeline Partitioning Example
RR H
RR H
RR H
112
Pipeline Partitioning Example
RR H K
RR H K
RR H K
113
Dynamic Partitioning
114
Concurrent Workflow Execution (8.5)
• Prior to 8.5
115
Concurrent Workflow Execution
116
Concurrent Workflow Execution
117
Workflow on Grid (WonG)
118
Workflow on Grid (WonG)
119
Load Balancer Modes
• Round Robin
• Honors Max Number of Processes per Node
• Metric-based
• Evaluates nodes in round-robin
• Honors resource provision thresholds
• Uses stats from last 3 runs - if no statistics is collected yet,
defaults used (40 MB memory, 15% CPU)
120
Load Balancer Modes
• Adaptive
• Selects node w/ the most available CPU
• Honors resource provision thresholds
• Uses statistics from last 3 runs of a task to determine whether a
task can run on a node
• Bypass in dispatch queue: skip tasks in the queue that are more
resource intensive and can’t be dispatch to any currently
available nodes
• CPU Profile - Ranks node CPU performance against a baseline
system
121
Session on Grid (SonG)
122
Session on Grid (SonG)
123
Configuring Session on Grid
124
Dynamic Partitioning
125
SonG Partitioning Guidelines
126
SonG Partitioning Guidelines
127
File Placement Best Practices
128
File Placement Best Practices
129
Data Integration Certification Path
Level Certification Title Recommended Training Required Exams
Additional Training:
» PowerCenter 8.5 New Features » PowerCenter 8 Team-Based Development
» PowerCenter 8.6 New Features » PowerCenter 8.5 Unified Security `
» PowerCenter 8 Upgrade
130
Q&A
Bert Peters
Global Education Services, Principal Instructor
131
Course Evaluation
Bert Peters
Global Education Services, Principal Instructor
132
Appendix
Informatica Services by
Solution
133
B2B Data Exchange
Recommended Services
B2B
145
Informatica Global Education Services
146
Informatica Contact Information
http://www.informatica.com
147