This action might not be possible to undo. Are you sure you want to continue?
° PowerCenter Architecture ° Performance tuning step-by-step ° Eliminating Common bottlenecks
PowerCenter Architecture: Engine-based & Metadata-driven
Windows Metadata Exchange
Erwin Designer 2000 Power Designer Heterogeneous CWM ODBC
TCP/IP JDBC Heterogeneous Targets Oracle API, SQL*Loader MS SQL Server, BCP Sybase, IQ Load Informix DB2 UDB, Autoloader Teradata fload, tpump, fload, tpump MainFrame , mpumpERP ODBC SAS Flat File RealTime XML Remote Files PowerConnect PowerCenter Server Engine Buffers
Oracle MS SQL Server Sybase Informix DB2 UDB ODBC Flat File XML MainFrame VSAM/COBOL ERP Copybook
Repository Server Repository Agent Native
Sources Metadata Repository
SAS RealTime Remote Files
Key Data Metadata
Introducing PowerExchange On-Demand Data Access through Changed Data Capture Mainframe Real-time AS/400. EAI Change Bulk 4 . HP3000 Change Relational Batch File Formats.
databases. networks. multi-system environment There are many components involved − Operating systems. PowerCenter Performance is determined by THE SLOWEST COMPONENT (the bottleneck) − Usually need to monitor performance in several places − Usually need to monitor outside PowerCenter 5 .PowerCenter Environment Disk Disk Disk Disk Disk Disk Disk PowerCenter Disk DBMS OS Disk Disk Disk Disk Disk Disk LAN/WAN ° ° ° This is a multi-vendor. I/O.
scheduling. etc. − Configured using the using the Load Manager Shared Memory parameter − Set value to approximately 200K bytes per session multiplied by the max number of concurrent sessions 6 .Server Architecture .Memory ° The PowerCenter Server utilizes two main processes − Load Manager process (pmserver) − Session process (DTM) ° The Load Manager process is a continuous listener process designed to handle tasks such as session start. error reporting. email.
Server Architecture . data transformation and writing ° Two session parameters control the DTM memory allocation − DTM Buffer Pool Size − Buffer Block Size ° DTM pipeline threads overlap when possible Reader Transformation Engine Writer 7 .Memory ° The DTM process uses shared memory to handle tasks such as reading.
Server memory runtime ° Example 8 .
then you get 25M/64K = 390 blocks ° Buffer Block Size controls the size of the blocks that move in the pipeline − Optimum size depends on the row size being processed − 64KB ≈ 64 rows of 1KB − 128KB ≈ 128 rows of 1KB 9 .Server Architecture .Memory ° DTM Buffer Pool Size controls the total amount of memory used to buffer rows internally by the reader and writer − This sets the total number of blocks available − The optimal value is about 25MB − If the block size is 64K.
Server Architecture – DTM Parameters The Session Task parameters control the processing pipeline and are found on the Properties and Config Object tabs 10 .
Threads Assume a mapping with an Aggregator. and other transformations in a session with two partitions.Server Architecture . Load Manager DTM Master Thread Mapping Thread Transformation Transformation Thread Thread Rank Threads Reader Thread Reader Thread Thread Writer Thread Writer Thread Thread Transformation Thread Transformation Thread Transformation Thread Transformation Thread Aggregator Threads Process Memory 11 . a Rank. Pre and Post session commands would add one thread each.
Measure Until elapsed time < batch window HINTS: 5. buy more •If the change doesn’t help. Determine bottleneck •Write down a log of every step •If all resources are used 100%. Determine Batch window 2. Make ONE change 3.Performance tuning step-by-step 1. UNDO 12 . Run sessions 4.
Measuring Performance Internal to Informatica 13 .2.
Measuring Performance .Internal ° Several types of Bottlenecks can affect session performance − Network − System − Database − Informatica Mapping and Session There several ways of measuring performance such as total amount of data (volume) per unit of time − Volume can be measured as: ° Number of bytes ° Number of rows ° − Time can be measured as: ° CPU or process time ° Lapsed time 14 .
Measuring Performance . the type of source/target (flat file or relational) and underlying hardware 15 .Internal ° For the purpose of identifying bottlenecks use: − Lapsed time as a relative measurement time − Number of rows loaded over the period of time (rows per second) ° Rows per second allows performance measurement of a session over a period of time and with changes in the environment ° Rows per sec can have a very wide range depending on the size of the row (number of bytes).
convert to seconds to get the total session time run time for the session − Divide the number of rows loaded by the number of seconds of 16 .Measuring Performance .Internal ° Establishing the baseline using the Workflow Manager − Run the session task to be measured − View the session task Transformation Statistics detail window at the end of the session and record the number of rows loaded and end times of the session − View the Session Task Properties window and record the start − Subtract the start time from the end time of the session.
Measuring Performance .Internal Example Session Name Start/End Times Applied Rows 17 .
Internal Tips: ° Calculated rows per second are not the same as “Write Throughput” ° For multiple targets use sum of rows loaded for targets which are similar in row size ° For multiple partitions use the sum of rows loaded for all partitions ° Monitor background processes external to Informatica that will have an effect between test runs 18 .Measuring Performance .
Establishing Baselines Internal to Informatica 19 .
Establishing Baselines .Internal ° Each component in a production environment contributes to the overall session performance ° Performance is limited to the slowest component ° Knowing the physical data limits establishes the maximum data throughput ° Baseline measurement can be used for future comparisons LAN/ WAN PowerCenter DBMS OS 20 .
Establishing Baselines – Read Throughput Mapping ° Read Throughput Mapping – Use a database table to flat file mapping to establish a typical read rate Session Name s_m_RDB_TO_FF_TEST Rows Loaded 249995 Rows Failed 0 Start Time 10/18/2002 11:00:58 AM End Time 10/18/2002 11:01:17 AM Elapsed Time 19 Rows Per Sec 13158 21 .
Establishing Baselines .Historical ° Each Informatica Repository contains a history of each session run ° Use MX view “REP_SESS_LOG to extract session information SUBJECT_AREA (Folder) SESSION_NAME (Session) SUCCESSFUL_ROWS (Rows Loaded) FAILED_ROWS (Rows Not Loaded) ACTUAL_START (Start Time) SESSION_TIMESTAMP (End Time) Note: simple query – select * from rep_sess_log 22 .
2.SESSION_TIMESTAMP SESSION_LOG. Measure Performance ° Use repository views to establish performance − Session elapsed time (in seconds) = (REP_SESS_LOG.SESSION_TIMESTAMP REP_SESS_LOG.CHAR (SESSION_LOG.Target Rows per second = SUCCESSFUL_ROWS / Session elapsed time ° OR: Use the MetaData Reporter! 23 .ACTUAL_START) * 86400 TIMESTAMPDIFF(2.ACTUAL_START)) .
3. Determine bottleneck ° Identifying Target Bottlenecks ° Identifying Source Bottlenecks ° Identifying Mapping Bottlenecks − session parameters − system resource allocation − mapping/transformation design 24 .
3. 25 . Determine Target Bottlenecks ° Writing to a flat file usually does not cause a bottleneck ° Configure a session task to write to a flat file target (/dev/null) − If write throughput increases significantly. then you have a target database bottleneck.
3. Determine Source or Mapping Bottlenecks Add a FILTER behind each source qualifier set filter condition to false Original Modified No faster Faster 26 Source bottleneck mapping bottleneck .
6. here are some common bottlenecks − Target − Source − Mapping − Session − System ° Only keep the changes that improve performance (maintaining changes is confusing and costly) 27 . Make ONE change ° Very case-specific.
Eliminate Target Bottlenecks ° Databases indexes and constraints − Disable indexes and constraints before the load.000 28 .6.& post SQL) − Check the database space allocation for indexes ° indexes should be on a different disk if possible ° Use a loader connection ° Check the commit interval − Very small commit intervals cause excessive overhead − Make sure you have allocated plenty of rollback space (PC6: connection Rollback segment) − Good Commit interval is 50. and enable afterward (connection/target pre.
The writer cannot do block inserts or block updates 29 .6. DD_UPDATE). Eliminate Target Bottlenecks ° PowerCenter updates and deletes − Updates and deletes can be extremely slow without an index or key − Bitmap Indexes on columns you are updating cause very slow performance (usually less than 100 rows/sec) − Do NOT use an Update Strategy transformation if all rows are treated the same (DD_INSERT.
but PowerCenter processing can overlap with the query execution 30 . add indexes. Eliminate Source Bottlenecks Discuss with your DBA how to optimize your Source Qualifier SQL (in the session log file) − standard DBMS tuning: explain plan.3. etc ° Optimize the query to begin returning rows early − the total query time may be longer. estimate statistics (regularly) alter database parameters.
Eliminate Mapping Bottlenecks ° Reduce I/O times − Cache in memory − Use fast disks for Cache. − Check your Sequence Generator ° Reduce amount of data to transform − Filter early ° Aggregator or joiner: prefix with a sorter 31 . BadFiles.3. SessionLogs etc.
Optimize expression performance ° Use numeric ports instead of string ports ° Reduce (hidden) Data type conversions ° Simplify expressions − Factor out common logic to transformation variables or even mapping variables or parameters ° Simplify nested IIFs when possible or use DECODE statements 32 .6.
− ‘where’ clause in lookup sql ° Use persistent lookup caches − When a nightly batch has several sessions that use the same lookup − Build the persistent cachefile in a separate session ° Lookup with date-range: lookup/filter combo ° Lookup against large dimension with few changes: − PoweExchange Changed Data Capture − checksum AEP plus lookup (devnet. Optimize Lookup Performance ° Reduce the number of lookup rows. use ‘update else insert’ 33 .informatica.6.com) ° Remove the lookup.
recommended is 24Mb ° Buffer Block Size controls the size of the blocks that move in the pipeline − Buffer Block size should hold about 100 rows − 64K (64.6.000) ≈ 64 rows of 1Kb − 128K (128.000) ≈ 128 rows of 1Kb ° Extremely large DTM may SLOW DOWN session! 34 . Session Optimizing ° Set the DTM Buffer Pool Size and Buffer Block Size − Large row sizes may require a larger buffer block size − Default buffer pool is 12000000b = 12 Mb.
) to point to high performance disk arrays ° Reduce transformation errors (& error logging) 35 . etc. Cache. SessLogs.6. Session Memory Settings ° Set cache memory larger than the size of the cachefile on disk ° Set the server variable directories (Badfiles.
For those that are still on PowerCenter 5 … PowerCenter 6 Performance highlights ° More efficient server ° New Sorter transformation ° ‘Sorted Input’ switch for aggregator & joiner ° More bulk loaders ° Pipeline Partitioning (PowerCenter only) Upgrade! 36 .
Oracle 9i and MS SQL server) 37 .For those that are still on PowerCenter 6 … PowerCenter 7 Performance highlights ° Block DTM − Enables moving/transforming a block of rows at a time at each transformation − Accelerates ALL sessions with: ° Mapping bottleneck AND ° (Lots of transformations OR Lots of string ports) ° Superior XML reading and writing ° Easy GUI for partitioning ° Max 64 partitions per partition point ° 64-bit version Upgrade! ° Server Grid (workflow load balancing across several servers) ° Change Data Capture (MVS.
Measure Until elapsed time < batch window HINTS: 5. Determine Batch window 2.Write down a log of every step 3. Run sessions 4. Determine bottleneck . buy more 4.Performance tuning step-by-step 1. UNDO 38 . If the change doesn’t help. If all resources are used 100%. Make ONE change 3.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.