Data Stage Best Practices & Performance Tuning

DataStage
Best Practices & Performance Tuning

Page 1 of 30

Data Stage Best Practices & Performance Tuning

DataStage ..........................................................................................................................1 1 Environment Variable Settings.........................................................................................3 1.1 Environment Variable Settings for All Jobs...............................................................3 1.2 Additional Environment Variable Settings ................................................................3 2 Configuration Files............................................................................................................6 2.1 Logical Processing Nodes.........................................................................................6 2.2 Optimizing Parallelism...............................................................................................7 2.3 Configuration File Examples......................................................................................7 2.3.1 Example for Any Number of CPUs and Any Number of Disks...........................8 2.3.2 Example that Reduces Contention...................................................................10 2.3.3 Smaller Configuration Example........................................................................11 2.4 Sequential File Stages (Import and Export)............................................................13 2.4.1 Improving Sequential File Performance............................................................13 2.4.2 Partitioning Sequential File Reads....................................................................13 2.4.3 Sequential File (Export) Buffering.....................................................................13 2.4.4 Reading from and Writing to Fixed-Length Files..............................................14 2.4.5 Reading Bounded-Length VARCHAR Columns...............................................14 2.5 Transformer Usage Guidelines................................................................................14 2.5.1 Choosing Appropriate Stages...........................................................................14 2.5.2 Transformer NULL Handling and Reject Link...................................................15 2.5.3 Transformer Derivation Evaluation...................................................................16 2.5.4 Conditionally Aborting Jobs..............................................................................16 2.6 Lookup vs. Join Stages............................................................................................16 2.7 Capturing Unmatched Records from a Join............................................................16 2.8 The Aggregator Stage..............................................................................................17 2.9 Appropriate Use of SQL and DataStage Stages.....................................................17 2.10 Optimizing Select Lists...........................................................................................18 2.11 Designing for Restart.............................................................................................18 2.12 Database OPEN and CLOSE Commands............................................................18 2.13 Database Sparse Lookup vs. Join.........................................................................19 2.14 Oracle Database Guidelines..................................................................................19 2.14.1 Proper Import of Oracle Column Definitions (Schema)..................................19 2.14.2 Reading from Oracle in Parallel......................................................................19 2.14.3 Oracle Load Options.......................................................................................20 3 Tips for Debugging Enterprise Edition Jobs...................................................................21 3.1 Reading a Score Dump............................................................................................21 3.2 Partitioner and Sort Insertion...................................................................................22 4 Performance Tips for Job Design...................................................................................24 5 Performance Monitoring and Tuning..............................................................................25 5.1 The Job Monitor.......................................................................................................25 5.2 OS/RDBMS-Specific Tools .....................................................................................25 5.3 Obtaining Operator Run-Time Information..............................................................26 5.4 Selectively Rewriting the Flow.................................................................................27 5.5 Eliminating Repartitions...........................................................................................27 5.6 Ensuring Data is Evenly Partitioned.......................................................................27 5.7 Buffering for All Versions........................................................................................28 5.8 Resolving Bottlenecks.............................................................................................28

Page 2 of 30

Data Stage Best Practices & Performance Tuning 5.8.1 Variable Length Data........................................................................................28 5.8.2 Combinable Operators......................................................................................28 5.8.3 Disk I/O..............................................................................................................29 5.8.4 Buffering............................................................................................................29

1 Environment Variable Settings
DataStage EE provides a number of environment variables to control how jobs operate on a UNIX system. In addition to providing required information, environment variables can be used to enable or disable various DataStage features, and to tune performance settings.

1.1

Environment Variable Settings for All Jobs

Ascential recommends the following environment variable settings for all Enterprise Edition jobs. These settings can be made at the project level, or may be set on an individual basis within the properties for each job. Environment Variable Settings For All Jobs Environment Variable
$APT_CONFIG_FILE $APT_DUMP_SCORE

Setting
filepath 1

Description
Specifies the full pathname to the EE configuration file. Outputs EE score dump to the DataStage job log, providing detailed information about actual job flow including operators, processes, and datasets. Extremely useful for understanding how a job actually ran in the environment. (see section 10.1 Reading a Score Dump) Includes a copy of the generated osh in the job’s DataStage log. Starting with v7, this option is enabled when “Generated OSH visible for Parallel jobs in ALL projects” option is enabled in DataStage Administrator. Outputs record counts to the DataStage job log as each operator completes processing. The count is per operator per partition. Places entries in DataStage job log showing UNIX process ID (PID) for each process started by a job. Does not report PIDs of DataStage “phantom” processes started by Server shared containers. Maximum buffer delay in seconds Only needed for DataStage v7.0 and earlier. Setting this environment variable significantly reduces memory usage for very large (>100 operator) jobs.

$OSH_ECHO

1

$APT_RECORD_COUNTS

1

$APT_PM_SHOW_PIDS

1

$APT_BUFFER_MAXIMUM_TIMEOUT $APT_THIN_SCORE (DataStage 7.0 and earlier)

1 1

1.2

Additional Environment Variable Settings

Ascential recommends setting the following environment variables on an as-needed basis. These variables can be used to tune the performance of a particular job flow, to assist in debugging, and to change the default behavior of specific EE stages.

Page 3 of 30

but this can be set as low as 2 bytes.Data Stage Best Practices & Performance Tuning NOTE: The environment variable settings in this section are only examples. use this variable instead of APT_DELIMITED_READ_SIZE. Default is 128 (128K). Normally set in a user’s environment by Oracle scripts. socket. Specifies the Oracle service name. setting this variable to a value equal to the read / write size in bytes can improve performance of Sequential File import/export operations. Specifies the number of bytes the Sequential File (import) stage reads-ahead to get the next delimiter. This setting should be set to a lower value when reading from streaming inputs (eg. This variable controls the upper bound which is by default 100. $ORACLE_SID Page 4 of 30 . Sequential File Stage Environment Variables Environment Variable $APT_EXPORT_FLUSH_COUNT Setting [nrows] Description Specifies how frequently (in rows) that the Sequential File stage (export operator) flushes its internal buffer to disk. The default is 500 bytes. corresponding to a TNSNAMES entry. and so on (4X) up to 100. If it is not found the importer looks ahead 4*500=2000 (1500 more) bytes. Set values that are optimal to your environment. Sequential File (import) will read ahead 500 bytes to get the next delimiter. FIFO) to avoid blocking.000 bytes.000 bytes. In some disk array configurations. $APT_IMPORT_BUFFER_SIZE $APT_EXPORT_BUFFER_SIZE [Kbytes ] $APT_CONSISTENT_BUFFERIO_SIZE [bytes] $APT_DELIMITED_READ_SIZE [bytes] $APT_MAX_DELIMITED_READ_SIZE [bytes] By default. When more than 500 bytes read-ahead is desired. Defines size of I/O buffer for Sequential File reads (imports) and writes (exports) respectively. Increasing these values on heavily-loaded file servers may improve performance. Setting this value to a low number (such as 1) is useful for realtime applications. with a minimum of 8. but there is a small performance penalty from increased I/O. Oracle Environment Variables Environment Variable $ORACLE_HOME Setting [path] [sid] Description Specifies installation directory for current Oracle instance.

$APT_MONITOR_SIZE [rows] $APT_NO_JOBMON 1 $APT_RECORD_COUNTS 1 Job Monitoring Environment Variables Environment Variable Setting [second $APT_MONITOR_TIME Description In v7 and later. specifies the time interval (in seconds) for generating job monitor information at runtime. Allows DataStage to handle Oracle databases which use the special characters # and $ in column names. In rare instances. PARALLEL=TRUE) When set. $APT_ORACLE_LOAD_OPTIONS [SQL* Loader options] 1 [filepath] Specifies Oracle SQL*Loader options used in a target Oracle stage with Load method. Disables job monitoring completely. In general. this may improve performance. a target Oracle stage with Load method will limit the number of players to the number of datafiles in the table’s tablespace. this should only be set on a per-job basis when attempting to resolve performance bottlenecks. The filepath specified by this environment variable specifies the file with the SQL*Loader commands. To minimize the number of messages during large job runs.Data Stage Best Practices & Performance Tuning $APT_ORAUPSERT_COMMIT_ROW_INTERVAL $APT_ORAUPSERT_COMMIT_TIME_INTERVAL [num] [second s] These two environment variables work together to specify how often target rows are committed for target Oracle stages with Upsert method. this is set to OPTIONS(DIRECT=TRUE. $APT_ORA_IGNORE_CONFIG_FILE_PARALLELIS M $APT_ORA_WRITE_FILES $DS_ENABLE_RESERVED_CHAR_CONVERT 1 Job Monitoring Environment Variables Environment Variable Setting [second $APT_MONITOR_TIME s] Description In v7 and later. specifies the time interval (in seconds) Page 5 of 30 . the output of a Target Oracle stage with Load method is written to files instead of invoking the Oracle SQL*Loader. The default of 5000 records is usually too small. and set $APT_MONITOR_SIZE below. whichever comes first. commits are made every 2 seconds or 5000 rows. set this to a higher value (eg. Determines the minimum number of records the job monitor reports. The count is per operator per partition. By default. 1000000). When set. Commits are made whenever the time interval period has passed or the row interval is reached. unset this environment variable. To enable size-based job monitoring. By default. Prints record counts in the job log as each operator completes processing. Useful in debugging Oracle SQL*Loader issues.

the configuration file is specified through the environment variable $APT_CONFIG_FILE. In general. this may improve performance. EE first reads the configuration file to determine what system resources are allocated to it. When you modify the system. EE processing nodes are a logical rather than a physical construct. 1000000). Determines the minimum number of records the job monitor reports. 2. temporary storage. and dataset storage). A configuration file with a larger number of nodes Page 6 of 30 . The count is per operator per partition. this should only be set on a per-job basis when attempting to resolve performance bottlenecks. the configuration file can also define other resources such as databases and buffer storage. At runtime. Within a configuration file. There is not necessarily one ideal configuration file for a given system because of the high variability between the way different jobs work. you must modify the DataStage EE configuration file accordingly. To enable size-based job monitoring. it is important to note that the number of processing nodes does not necessarily correspond to the actual number of CPUs in your system.Data Stage Best Practices & Performance Tuning s] for generating job monitor information at runtime. At runtime. In more advanced environments. For this reason. To minimize the number of messages during large job runs. In rare instances. by adding or removing nodes or disks. set this to a higher value (eg. and then distributes the job flow across these resources. It is up to the UNIX operating system to actually schedule and run the processes that make up a DataStage job across physical processors. $APT_MONITOR_SIZE [rows] $APT_NO_JOBMON 1 $APT_RECORD_COUNTS 1 2 Configuration Files The configuration file tells DataStage Enterprise Edition how to exploit underlying system resources (processing. For this reason. it automatically scales the application to fit the system without having to alter the job design. multiple configuration files should be used to optimize overall throughput and to match job characteristics to available hardware resources. The default of 5000 records is usually too small. and set $APT_MONITOR_SIZE below. unset this environment variable. Prints record counts in the job log as each operator completes processing.1 Logical Processing Nodes The configuration file defines one or more EE processing nodes on which parallel jobs will run. Since EE reads the configuration file every time it runs a job. Disables job monitoring completely. the number of processing nodes defines the degree of parallelism and resources that a particular job will use to run.

For example. resource availability. The default configuration file has the following characteristics: . you must weigh the gains of added parallelism against the potential losses in processing efficiency. where do you begin? For starters. 2-4 nodes). a good starting point is to set the number of nodes equal to the number of CPUs. it may appropriate to have more nodes than physical CPUs.disk and scratchdisk storage use subdirectories within the DataStage install filesystem You should create and use a new configuration file that is optimized to your hardware and file systems. For example. Because different job flows have different needs (CPU-intensive? Memory-intensive? Disk-Intensive? Database-Intensive? Sorts? need to share resources Page 7 of 30 . this is a conservative starting point that is highly dependent on system configuration. 2.2 Optimizing Parallelism The degree of parallelism of a DataStage EE application is determined by the number of nodes you define in the configuration file. For development environments. This will minimize the effect of data skew and significantly improve overall job performance. Increasing parallelism may better distribute your work load. When business requirements dictate a partitioning strategy that is excessively skewed. For typical production environments. job design. Note that even in the smallest development environments. the default configuration file (default. create smaller configuration files (eg. This is referred to as minimizing skew.3 Configuration File Examples Given the large number of considerations for building a configuration file. While the DataStage documentation suggests creating half the number of nodes as physical CPUs. memory. 2. when hash partitioning. and other applications sharing the server hardware.apt) created when DataStage is installed is appropriate for only the most basic environments. if a job is highly I/O dependent or dependent on external (eg. a 2-node configuration file should be used to verify that job logic and partitioning will work in parallel (as long as the test data can sufficiently identify data discrepancies).number of nodes = ½ number of physical CPUs . Parallelism should be optimized rather than maximized. database) sources or targets. disk controllers and disk configuration that make up your system influence the degree of parallelism you can sustain. Therefore. try to ensure that the resulting partitions are evenly populated. The CPUs. but it also adds to your overhead because the number of processes increases. Keep in mind that the closest equal partitioning of data contributes to the best overall performance of an application running in parallel. which are typically smaller and more resource-constrained. remember to change the partition strategy to a more balanced one as soon as possible in the job flow.Data Stage Best Practices & Performance Tuning generates a larger number of processes that use more memory (and perhaps more disk activity) than a configuration file with a smaller number of nodes.

software topology (local vs. and job design. With the synergistic relationship between hardware (number of CPUs. /fs1. /fs2. speed. number and speed of I/O controllers. */ fastname "fastone" resource scratchdisk "/fs0/ds/scratch" {} /* start with fs0 */ resource scratchdisk "/fs1/ds/scratch" {} resource scratchdisk "/fs2/ds/scratch" {} resource scratchdisk "/fs3/ds/scratch" {} resource disk "/fs0/ds/disk" {} /* start with fs0 */ resource disk "/fs1/ds/disk" {} resource disk "/fs2/ds/disk" {} resource disk "/fs3/ds/disk" {} } node "n1" { pools "" fastname "fastone" resource scratchdisk "/fs1/ds/scratch" {} /* start with fs1 */ resource scratchdisk "/fs2/ds/scratch" {} resource scratchdisk "/fs3/ds/scratch" {} resource scratchdisk "/fs0/ds/scratch" {} resource disk "/fs1/ds/disk" {} /* start with fs1 */ resource disk "/fs2/ds/disk" {} resource disk "/fs3/ds/disk" {} Page 8 of 30 . The configuration file you would use as a starting point would look like the one below. Assuming that the system load from processing outside of DataStage is minimal. cache. remote database access. Clustered processing). it is often appropriate to have multiple configuration files optimized for particular types of processing.1 Example for Any Number of CPUs and Any Number of Disks Assume you are running on a shared-memory multi-processor system. shared disk. SMP vs. available system memory. it may be appropriate to create one node per CPU as a starting point.Data Stage Best Practices & Performance Tuning with other jobs/databases/ applications? etc). network configuration and availability). there is no definitive science for formulating a configuration file. 2. This section attempts to provide some guidelines based on experience with actual production applications. local vs.3. an SMP server. RAID configurations. which is the most common platform today. { /* config files allow C-style comments. /fs3 You can adjust the sample to match your precise environment. Keep all the sub-items of the individual node specifications in the order shown here. IMPORTANT: It is important to follow the order of all sub-items within individual node specifications in the example configuration files given in this section. */ node "n0" { pools "" /* on an SMP node pools aren’t used often. Let’s assume these properties: computer host name “fastone” 6 CPUs 4 separate file systems on 4 drives named /fs0. In the following example. the way disk and scratchdisk resources are handled is the important. disk size and speed. */ /* Configuration do not have flexible syntax.

So what do we do now? * The answer: something that is not perfect. Rather.e. We’re going to repeat the sequence.. This configuration method works well when the job flow is complex enough that it is difficult to determine and precisely plan for good I/O utilization. Page 9 of 30 . */ resource scratchdisk “/fs0/ds/scratch” {} /* start with fs0 again */ resource scratchdisk “/fs1/ds/scratch” {} resource scratchdisk “/fs2/ds/scratch” {} resource scratchdisk “/fs3/ds/scratch” {} resource disk “/fs0/ds/disk” {} /* start with fs0 again */ resource disk “/fs1/ds/disk” {} resource disk “/fs2/ds/disk” {} resource disk “/fs3/ds/disk” {} } node “n5” { pools “” fastname “fastone” resource scratchdisk “/fs1/ds/scratch” {} /* start with fs1 */ resource scratchdisk “/fs2/ds/scratch” {} resource scratchdisk “/fs3/ds/scratch” {} resource scratchdisk “/fs0/ds/scratch” {} resource disk “/fs1/ds/disk” {} /* start with fs1 */ resource disk “/fs2/ds/disk” {} resource disk “/fs3/ds/disk” {} resource disk “/fs0/ds/disk” {} } } /* end of entire config */ The file pattern of the configuration file above is a “give every node all the disk” example.Data Stage Best Practices & Performance Tuning resource disk "/fs0/ds/disk" {} } node "n2" { pools "" fastname "fastone" resource scratchdisk "/fs2/ds/scratch" {} /* start with fs2 */ resource scratchdisk "/fs3/ds/scratch" {} resource scratchdisk "/fs0/ds/scratch" {} resource scratchdisk "/fs1/ds/scratch" {} resource disk "/fs2/ds/disk" {} /* start with fs2 */ resource disk "/fs3/ds/disk" {} resource disk "/fs0/ds/disk" {} resource disk "/fs1/ds/disk" {} } node "n3" { pools "" fastname "fastone" resource scratchdisk "/fs3/ds/scratch" {} /* start with fs3 */ resource scratchdisk "/fs0/ds/scratch" {} resource scratchdisk "/fs1/ds/scratch" {} resource scratchdisk "/fs2/ds/scratch" {} resource disk "/fs3/ds/disk" {} /* start with fs3 */ resource disk "/fs0/ds/disk" {} resource disk "/fs1/ds/disk" {} resource disk "/fs2/ds/disk" {} } node "n4" { pools "" fastname "fastone" /* Now we have rotated through starting with a different disk. EE does not “stripe” the data across multiple filesystems. it fills the disk and scratchdisk filesystems in the order specified in the configuration file. in an attempt to minimize I/O contention. the order of the disks is purposely shifted for each node. albeit in different orders to minimize I/O contention. use /fs0 /fs2 /fs1 /fs3 as an order. but the fundamental problem * in this scenario is that there are more nodes than disks. In the 4-node example above. You could * shuffle differently i. but that most likely won’t * matter. Within each node.

This configuration style works for any number of CPUs and any number of disks since it doesn't require any particular correspondence between them. at least go for achieving balance.3. We could give every CPU two disks and rotate them around.” 2. You can imagine this could be hard given our hypothetical 6-way SMP with 4 disks because setting up the obvious one-to-one correspondence doesn't work.Data Stage Best Practices & Performance Tuning Even in this example. /fs5 Now a configuration file for this environment might look like this: { node "n0" { pools "" fastname "fastone" resource disk "/fs0/ds/data" {pools ""} resource scratchdisk "/fs0/ds/scratch" {pools ""} } node "node2" { fastname "fastone" pools "" resource disk "/fs1/ds/data" {pools ""} resource scratchdisk "/fs1/ds/scratch" {pools ""} } node "node3" { fastname "fastone" pools "" resource disk "/fs2/ds/data" {pools ""} resource scratchdisk "/fs2/ds/scratch" {pools ""} } node "node4" { fastname "fastone" pools "" resource disk "/fs3/ds/data" {pools ""} resource scratchdisk "/fs3/ds/scratch" {pools ""} } node "node5" { fastname "fastone" pools "" resource disk "/fs4/ds/data" {pools ""} Page 10 of 30 . giving every partition (node) access to all the I/O resources can cause contention. /fs3. but EE attempts to minimize this by using fairly large I/O blocks. /fs4.2 Example that Reduces Contention The alternative to the first configuration method is more careful planning of the I/O behavior to reduce contention. /fs2. but that would be little different than the previous strategy. Doubling up some nodes on the same disk is unlikely to be good for overall performance since we create a hotspot. let’s imagine a less constrained environment with two additional disks: computer host name “fastone” 6 CPUs 6 separate file systems on 4 drives named /fs0. So. /fs1. The heuristic here is: “When it’s too difficult to figure out precisely.

/fs2. stage. /fs5 { node "node1" { fastname "fastone" pools "" resource disk "/fs0/ds/data" {pools ""} /* start with fs0 */ resource disk "/fs4/ds/data" {pools ""} resource scratchdisk "/fs4/ds/scratch" {pools ""} /* start with fs4 */ resource scratchdisk "/fs0/ds/scratch" {pools ""} } node "node2" { fastname "fastone" pools "" resource disk "/fs1/ds/data" {pools ""} resource disk "/fs5/ds/data" {pools ""} resource scratchdisk "/fs5/ds/scratch" {pools ""} resource scratchdisk "/fs1/ds/scratch" {pools ""} } node "node3" { fastname "fastone" pools "" resource disk "/fs2/ds/data" {pools ""} resource disk "/fs6/ds/data" {pools ""} resource scratchdisk "/fs6/ds/scratch" {pools ""} resource scratchdisk "/fs2/ds/scratch" {pools ""} } node "node4" { fastname "fastone" pools "" resource disk "/fs3/ds/data" {pools ""} resource disk "/fs7/ds/data" {pools ""} resource scratchdisk "/fs7/ds/scratch" {pools ""} resource scratchdisk "/fs3/ds/scratch" {pools ""} } } /* end of entire config */ The 4-node example above illustrates another concept in configuration file setup – you can assign multiple disk and scratch disk resources for each node. but a special one that you would specifically assign to stage / operator instances.3. it may be necessary to distribute file systems across nodes in smaller environments (fewer available CPUs/memory). depending on the total disk space required to process large jobs. /fs3. Using the above server example. this time with 4-nodes: computer host name “fastone” 4 CPUs 6 separate file systems on 4 drives named /fs0. /fs1. 2.Data Stage Best Practices & Performance Tuning resource scratchdisk "/fs4/ds/scratch" {pools ""} } node "node6" { fastname "fastone" pools "" resource disk "/fs5/ds/data" {pools ""} resource scratchdisk "/fs5/ds/scratch" {pools ""} } } /* end of entire config */ While this is the simplest scenario. or operator instance on any one partition can go faster than the single disk it has access to. Page 11 of 30 . You could combine strategies by adding in a node pool where disks have a one-to-one association with nodes. /fs4.3 Smaller Configuration Example Because disk and scratchdisk resources are assigned per node. These nodes would then not be in the default node pool. it is important to realize that no single player.

Do not trust high-level RAID/SAN monitoring tools. If the job is large and complex this is less of an issue since the input part is proportionally less of the total work. Know what's real and what's NFS: Real disks are directly attached. not NFS. often SAN) filesystem space for disk resources. It is better to setup a "final" disk pool. • Ensure that the different file systems mentioned as the disk and scratchdisk resources hit disjoint sets of spindles even if they're located on a RAID system. • • Page 12 of 30 . physical limitations of available hardware and disk configuration don’t always lend themselves to “clean” configurations illustrated above. as their “cache hit ratios” are often misleading. but let intermediate storage go to local or SAN resources. Often those disks will be hotspots until the input phase is over. your final result files may need to be written out onto the NFS disk area. Never use NFS file systems for scratchdisk resources. Beware if you use NFS (and. just for storage. Other configuration file tips: • Consider avoiding the disk(s) that your input files reside on. For example. and constrain the result sequential file or data set to reside there.dedicated. or are reachable over a SAN (storage-area network . low-level protocols). Proper configuration of scratch and resource disk (and the underlying filesystem and physical hardware architecture) can significantly affect overall job performance.Data Stage Best Practices & Performance Tuning Unfortunately. but that doesn't mean the intermediate data sets created and used temporarily in a multi-job sequence should use this NFS disk area.

These settings specify the size of the read (import) and write (export) buffer size in Kbytes. The environment variable $APT_EXPORT_FLUSH_COUNT allows the job developer to specify how frequently (in number of rows) that the Sequential File stage flushes its internal buffer on writes.4. input row order is not maintained. • 2. The formatting and column properties of the Column Import stage match those of the Sequential File stage. If the input sequential file cannot be read in parallel. When a job completes successfully. define a single large string column for the non-parallel Sequential File read.3 Sequential File (Export) Buffering By default. in some disk array configurations. 2. Finally. Note that in this manner. each file’s data is read into a separate partition. performance can still be improved by separating the file I/O from the column parsing operation.4 Sequential File Stages (Import and Export) 2. Page 13 of 30 . the Readers Per Node option can be used to read a single input file in parallel at evenly-spaced offsets. To accomplish this.4.1 Improving Sequential File Performance If the source file is fixed/de-limited. Increasing this may improve performance. but there is a small performance penalty associated with increased I/O. the buffers are always flushed to disk.4. the environment variables $APT_IMPORT_BUFFER_SIZE and $APT_EXPORT_BUFFER_SIZE can be used to improve I/O performance. with a default of 128 (128K).Data Stage Best Practices & Performance Tuning 2. It is important to use ROUND-ROBIN partitioning (or other partitioning appropriate to downstream components) to evenly distribute the data in the flow. On heavily-loaded file servers or some RAID/SAN array configurations. Setting this value to a low number (such as 1) is useful for realtime applications. SAME will read the entire file into a single partition. setting the environment variable $APT_CONSISTENT_BUFFERIO_SIZE to a value equal to the read/write size in bytes can significantly improve performance of Sequential File operations.2 Partitioning Sequential File Reads Care must be taken to choose the appropriate partitioning method from a Sequential File read: • Don’t read from Sequential File using SAME partitioning! Unless more than one source file is specified. making the entire downstream flow run sequentially (unless it is later repartitioned). or by using a File Pattern). When multiple files are read by a single Sequential File stage (using multiple files. and then pass this to a Column Import stage to parse the file in parallel. the Sequential File (export operator) stage buffers its writes to optimize performance.

Integer.Data Stage Best Practices & Performance Tuning 2. bounded-length Varchar columns (Varchars with the length option set).4.0 Operators Reference. Starting with v7.1 Choosing Appropriate Stages The parallel Transformer stage always generates “C” code which is then compiled to a parallel component. Double-click on the column number in the grid dialog to set these properties. For this reason. the field width column property must be set to match the fixed-width of the input column. To display each field value. use the print_field import property. you must define the null field value and length in the Nullable section of the column property. these extra characters will be silently truncated. All import and export properties are listed in chapter 25. it is important to minimize the number of Page 14 of 30 . Varchar).5 Transformer Usage Guidelines 2. the field width and pad string column properties must be set to match the fixed-width of the output column. By default.5. Varchar). • If a field is nullable. if the source file has fields with values longer than the maximum Varchar length.4.5 Reading Bounded-Length VARCHAR Columns Care must be taken when reading delimited. Integer. Double-click on the column number in the grid dialog to set this column property. When writing fixed-length files from variable-length fields (eg. Double-click on the column number in the grid dialog to set this column property. Decimal. Import/Export Properties of the Orchestrate 7. $APT_IMPORT_REJECT_STRING_FIELD_OVERRUNS 2. • • 2. Decimal.4 Reading from and Writing to Fixed-Length Files Particular attention must be taken when processing fixed-length fields using the Sequential File stage: • If the incoming columns are variable-length data types (eg.01 the environment variable will direct DataStage to reject records with strings longer than their declared maximum column length.

For example.6. • Buildops should be used instead of Transformers in the handful of scenarios where complex reusable logic is required. and character string trimming. drop (if runtime column propagation is disabled). See section 7.Renaming Columns . EE will optimize this out at runtime) . right-click on an output link and choose “Convert to Reject”.3 Then B=”X” If A=4. and default type conversion can also be performed by the output mapping tab of any stage.5 for more information. • In v7 and later.1. • Optimize the overall job flow design to combine derivations from multiple Transformers into a single Transformer stage when possible. Page 15 of 30 . • In v7 and later. To create a Transformer reject link in DataStage Designer.5. and to use other stages (Copy. if possible.Data Stage Best Practices & Performance Tuning transformers. Filter. null handling. the Filter and/or Switch stages can be used to separate rows into multiple output links based on SQL-like link constraint expressions.7 Then B=”C” Could be implemented with a lookup table containing values for column A and corresponding values of column B. Switch.2. etc) when derivations are not needed. or where existing Transformer-based job flows do not meet performance requirements. 2.2 Transformer NULL Handling and Reject Link When evaluating expressions for output derivations or link constraints. the Transformer will reject (through the reject link indicated by a dashed line) any row that has a NULL value used in the expression.5. • • NEVER use the “BASIC Transformer” stage in large-volume job flows. • The Copy stage should be used instead of a Transformer for simple operations including: .Default Type Conversions Note that rename. the Modify stage can be used for non-default type conversions.Dropping Columns . the derivation expression: If A=0. Instead. Consider.Job Design placeholder between stages (unless the Force option =true. implementing complex derivation expressions using regular patterns by Lookup tables instead of using a Transformer with nested derivations. user-defined functions and routines can expand parallel Transformer capabilities.

for example: If ISNULL(link. the PadString function uses the length of the source type. TrimLeadingTrailing(string) works only if string is a VarChar field.3 Transformer Derivation Evaluation Output derivations are evaluated BEFORE any type conversions on the assignment. Left Outer. Join Stages The Lookup stage is most appropriate when the reference data for all lookup stages in a job is small enough to fit into available physical memory. 2. For example.5. a Sparse Lookup may be appropriate.5. It is important to set the Sequential File buffer flush (see section 7. 1:100 or more) than the number of reference rows. 1). DataStage release 7 enhances this behavior by placing warnings in the log file when discards occur. it is important to make sure the type conversion is done before a row reaches the Transformer. or both links in the case of Full Outer) are output regardless of match on key values. apply the abort rule to this output link. or committed to database tables. If un-matched rows must be captured or logged. an OUTER join operation must be performed. Page 16 of 30 . and the number of input rows is significantly smaller (eg. Since the Transformer will abort the entire job flow immediately. In an OUTER join scenario. If the reference to a Lookup is directly from a Oracle table.4 Conditionally Aborting Jobs The Transformer can be used to conditionally abort a job when incoming data matches a specific rule. Within the link constraints dialog box. 2. it is possible that valid rows will not have been flushed from Sequential File (export) buffers. the incoming column must be type VarChar before it is evaluated in the Transformer. the JOIN or MERGE stage should be used.col) Then… Else… Note that if an incoming column is only used in a pass-through derivation. 2. and set the “Abort After Rows” count to the number of rows allowed before the job should be aborted (eg. Right Outer. For this reason. For example. 2. Thus. Each lookup reference requires a contiguous block of physical memory. Therefore. always test for null values before using a column in an expression.3) or database commit parameters. Create a new output link that will handle rows that match the abort rule.Data Stage Best Practices & Performance Tuning The Transformer rejects NULL derivation results because the rules for arithmetic and string handling of NULL values are by definition undefined. not the target. all rows on an outer link (eg.6 Lookup vs. If the datasets are larger than available resources.7 Capturing Unmatched Records from a Join The Join stage does not provide reject handling for unmatched records (such as in an InnerJoin scenario). the Transformer will allow this row to be output.

and leverages the database capabilities. there is often a functional overlap between SQL and DataStage stages. 2. You can also specify that the result of an individual calculation or recalculation is decimal by using the optional “Decimal Output” subproperty. the new optional property “Aggregations/Default to Decimal Output” specifies that all calculation or recalculations result in decimal output of the specified precision and scale. the efficiency of the DataStage EE Sort and Join stages can be significantly faster than an equivalent SQL Page 17 of 30 . This is also handy with Lookups that have multiple reference links. the optimal implementation involves leveraging the strengths of each technology to provide maximum throughput and developer productivity. c) When combining data from very large tables. and ongoing application maintenance costs. In some cases. especially when the join columns are indexed. b) Use a SQL Join to combine data from tables with a small number of rows in the same database instance. This is most easily done by inserting a Copy stage and mapping a column from NON-NULLABLE to NULLABLE. 2. or when the source includes a large number of database tables. there may be “gray areas” where the decision should be made factors such as developer productivity. with a constant value. the Join stage inserts NULL values into the unmatched columns. Starting with v7.8 The Aggregator Stage By default. The following guidelines can assist with the appropriate use of SQL and DataStage technologies in a given job flow: a) When possible. to each of the outer links and test that column for the constant after you have performed the join. when a match does not occur. Care must be taken to change the column properties to allow NULL values before the Join.9 Appropriate Use of SQL and DataStage Stages When using relational database sources. A Filter stage can be used to test for NULL values in unmatched columns. it is simpler to use a Column Generator to add an ‘indicator’ column.01 of DataStage EE. metadata capture and re-use. Although it is possible to use either SQL or DataStage to solve a given business problem. the output data type of a parallel Aggregator stage calculation or recalculation column is Double. use a SQL filter (WHERE clause) to limit the number of rows sent to the DataStage job. This minimizes impact on network and memory resources. While there are extreme scenarios when the appropriate technology choice is clearly understood.Data Stage Best Practices & Performance Tuning During an Outer Join.

2. including database-specific options (tablespace. It is important to understand the implications of specifying a user-defined OPEN and CLOSE command. etc) not possible with the “Create” option. Subsequent job(s) should read this data set and populate the target table using the appropriate database stage and write method. logging. OPEN and CLOSE are not offered by plug-in database stages. the OPEN command could be used to create a temporary table.12 Database OPEN and CLOSE Commands The native parallel database stages provide options for specifying OPEN and CLOSE commands. For example.Data Stage Best Practices & Performance Tuning query. it can still be beneficial to use database filters (WHERE clause) if appropriate. These options allow commands (including SQL) to be sent to the database before (OPEN) or after (CLOSE) all rows are read/written/loaded to the database. the results should be landed to a parallel data set. it is best to explicitly specify column names on all source database stages. In this scenario. the DataStage Designer will automatically populate the select list based on the stage’s output column definition. There few options to specify Create table options. In general. d) Avoid the use of database stored procedures (eg. After transformation. it is best to implement business rules natively using DataStage components. a default OPEN statement Page 18 of 30 . a Lookup stage (or Join stage. 2. constraints. For maximum scalability and parallel performance. always specify the “Select List” subproperty. and doing so may violate data-management (DBA) policies. instead of using an unqualified “Table” or SQL “SELECT *” read. For example. Oracle PL/SQL) on a per-row basis within a high-volume data flow.10 Optimizing Select Lists For best performance and optimal memory usage. don’t let EE generate target tables unless they are used for temporary storage. As a further optimization. For “Auto-Generated” SQL. The only exception to this rule is when building dynamic database jobs that use runtime column propagation to process all rows in a source table. it is important to separate the transformation process from the database write (Load or Upsert) operation. the OPEN command can be used to create a target table.11 Designing for Restart To enable restart of high-volume jobs. 2. when reading from DB2. For “Table” read method. As another example. and the CLOSE command could be used to select all rows from the temporary table and insert into a final target table. depending on data volume) can be used to identify existing rows before they are inserted into the target table.

Further details are outlined in the respective database sections of the Orchestrate Operators Reference which is part of the Orchestrate OEM documentation. Nullability. Sparse Lookup is only available when the database stage is directly connected to the reference link. 1:100 or more) than the number of reference rows in a Oracle table. Setting the part i t i o n table option to the specified table will enable parallel extracts from an Oracle source. sending individual SQL statements to the reference database for each incoming Lookup row.14.1 or later) of DataStage Designer to avoid unexpected data type conversions. When directly connected as the reference link to a Lookup stage.1 Proper Import of Oracle Column Definitions (Schema) DataStage EE always uses the Oracle table definition.14 Oracle Database Guidelines 2. 2. 2. with no intermediate stages. both DB2/UDB Enterprise and Oracle Enterprise stages allow the lookup type to be changed to “Sparse”. In most cases. the Oracle Enterprise stage reads sequentially from its source table or query. regardless of explicit job design metadata (Data Type. When specifying a user-defined OPEN command. this reference data is loaded into memory like any other reference link (“Normal” Lookup). By default.14.2 Reading from Oracle in Parallel By default. this lock is not sent – and should be specified explicitly if appropriate. it is faster to use a DataStage JOIN stage between the input and DB2 reference data than it is to perform a “Sparse” Lookup. etc) IMPORTANT: To avoid unexpected default type conversions. a Sparse Lookup may be appropriate. The underlying Oracle table does not have to be partitioned for parallel read within DataStage EE.0.Data Stage Best Practices & Performance Tuning places a shared lock on the source.13 Database Sparse Lookup vs. For scenarios where the number of input rows is significantly smaller (eg. always import Oracle table definitions using the “orchdbutil option” (in v6. Page 19 of 30 . 2. Join Data read by any database stage can serve as the reference input to a Lookup operation. IMPORTANT: The individual SQL statements required by a “Sparse” Lookup are an expensive operation from a performance perspective.

• Setting the environment variable $APT_ORACLE_LOAD_OPTIONS to “OPTIONS (DIRECT=TRUE. Page 20 of 30 . Parallel Extender uses the Parallel Direct Path Load method.3 Oracle Load Options When writing to an Oracle table (using Write Method = Load).Data Stage Best Practices & Performance Tuning It is important to note that certain types of queries cannot run in parallel. rebuild).queries performing a non-collocated join (a SQL JOIN between two tables that are not stored in the same partitions with the same partitioning strategy) 2. When using this method. the Oracle stage cannot write to a table that has indexes (including indexes automatically generated by Primary Key constraints) on it unless you specify the Index Mode option (maintenance.14. In order to automatically generate the SQL required by the Upsert method. Examples include: . PARALLEL=FALSE) also allows loading of indexed tables without index maintenance.queries containing a GROUP BY clause that are also hash partitioned on the same field . the key column(s) must be identified using the check boxes in the column grid. The Upsert Write Method can be used to insert rows into a target Oracle table without bypassing indexes or constraints. In this instance. the Oracle load will be done sequentially.

look at row counts. For flat (sequential) sources and targets: o To display the actual contents of any file (including embedded control characters or ASCII NULLs). and to manage source or target Parallel Data Sets. which operators. When a fatal error occurs. Page 21 of 30 . Also available is some information about where data may be buffered. • 3. if any. use the UNIX command wc –lc [filename] Dividing the total number of characters number of lines provides an audit to ensure all rows are same length. • • Use $OSH_PRINT_SCHEMAS to verify that the job’s runtime schemas matches what the job developer expected in the design-time column definitions. NOTE: The wc command counts UNIX line delimiters.2. where data is repartitioned and how it is repartitioned. so if the file has any binary columns. Examine the score dump (placed in the DataStage log when $APT_DUMP_SCORE is enabled). what degree of parallelism each operator runs with.Data Stage Best Practices & Performance Tuning 3 Tips for Debugging Enterprise Edition Jobs There are a number of tools available to debug DataStage EE jobs.1 Reading a Score Dump When attempting to understand an EE flow. including how composite operators and shared containers break down. Use the “Data Set Management” tool (available in the Tools menu of DataStage Designer or DataStage Manager) to examine the schema. A score dump includes a variety of information about a flow. the log entry is sometimes preceded by a warning condition. have been inserted by EE. • • Enable the Job Monitoring Environment Variables detailed in section 5. and exactly which nodes each operator runs on. The general process for debugging a job is: • Check the DataStage job log for warnings. These may indicate an underlying logic problem or unexpected data type conversion. use the UNIX command od –xc o To display the number of lines and characters in a specified ASCII text file. this count may be incorrect . the first task is to examine the score dump which is generated when you set APT_DUMP_SCORE=1 in your environment.

can be computationally expensive.p1] )} In a score dump. Page 22 of 30 . EE automatically inserts partitioner and sort components in the work flow to optimize performance.torrent. there are three areas to investigate: • Are there sequential stages? • Is needless repartitioning occurring? • In a cluster. Sort (tsort) and Peek. Orchestrate users accomplish this by inserting same partitioners. using the “Don’t Sort. there are some situations where these features can be a hindrance. The Peek and Sort stages are combined.torrent.com[op1. The job runs 3 processes on 2 nodes. All stages in this flow are running on one node. Already Sorted” key property in the Sort stage. especially sort insertion. Because these processes. coming from a source other than a dataset.torrent. which has a hash partitioner that partitions on key field a. This makes it possible for users to write correct data flows without having to deal directly with issues of parallelism. Partitioner insertion may be disabled on a per-link basis by specifying SAME partitioning on the appropriate link. that is.com[op0. understanding the score dump can help a user detect any superfluous sorts or partitioners.2 Partitioner and Sort Insertion Partitioner and sort insertion are two processes that can insert additional components into the work flow.Data Stage Best Practices & Performance Tuning The following score dump shows a flow with a single dataset. are the computation-intensive stages shared evenly across all nodes? 3. This same mechanism can be used to override sort insertion on any specific link. ##I TFSC 004000 14:51:50(000) <main_program> This step has 1 dataset: ds0: {op0[1p] (sequential generator) eOther(APT_HashPartitioner { key={ value=a } })->eCollectAny op1[2p] (parallel APT_CombinedOperatorController:tsort)} It has 2 operators: op0[1p] {(sequential generator) on nodes ( lemond. must be explicitly marked as sorted.com[op1. they have been optimized into the same process.p0] lemond. It shows three stages: Generator. However.p0] )} op1[2p] {(parallel APT_CombinedOperatorController: (tsort) (peek) )on nodes ( lemond. Presorted data.

Data Stage Best Practices & Performance Tuning In some cases. $APT_NO_PART_INSERTION=1 and $APT_NO_SORT_INSERTION=1 can be used to disable the two features on a flowwide basis. It is generally advised that both partitioner insertion and sort insertion be left alone by the average user. and that more experienced users carefully analyze the score to determine if sorts or partitioners are being inserted sub-optimally. EE still inserts sort stages. If the data is not correctly sorted. With this setting. but instead of actually sorting the data. Page 23 of 30 . setting $APT_SORT_INSERTION_CHECK_ONLY=1 may improve performance if the data is pre-partitioned or pre-sorted but EE does not know this. they verify that the incoming data is sorted correctly. As a last resort. the job will abort.

There are limited scenarios when the memory overhead of handling large Varchar columns would dictate the use of unbounded strings. Copy. In DataStage v7. Filter.Data Stage Best Practices & Performance Tuning 4 Performance Tips for Job Design Remove unneeded columns as early as possible within the job flow – every additional unused column requires additional buffer memory which can impact performance (it also makes each transfer of a record from one stage to the next more expensive). limit the use of variable-length records within a flow. Depending on the number of variable-length columns. userdefined functions and routines can expand the capabilities of the parallel Transformer. For example: o Varchar columns of a large (eg. use a select list to read needed columns instead of the entire table (if possible) o To ensure that columns are actually removed using a stage’s Output Mapping.01 and later implement internal performance optimizations for variable-length columns that specify a maximum length. o When reading from database sources.0 and earlier. Always specify a maximum length for Varchar columns. use other stages (eg. 32K) maximum length that are rarely populated o Varchar columns of a large maximum length with highly varying data sizes Placing unbounded columns at the end of the schema definition may improve performance. Unbounded strings (Varchar’s without a maximum length) can have a significant negative performance impact on a job flow. disable runtime column propagation for that column. Where appropriate. and trim to variablelength at the end of a flow before writing to a target database or flat file (using fixedlength records can dramatically improve performance). Modify) instead of the Transformer NEVER use the BASIC Transformer in large-volume data flows. it may be beneficial to convert incoming records to fixed-length types at the start of a job flow. Switch. - - - - Page 24 of 30 . Avoid type conversions if possible o Be careful to use proper datatype from source (especially Oracle) in EE job design  Enable $OSH_PRINT_SCHEMAS to verify runtime schema matches job design column definitions o Verify that the data type of defined Transformer stage variables matches the expected result type Minimize the number of Transformers. Instead. DataStage v7.

1 The Job Monitor The Job Monitor provides a useful snapshot of a job’s performance at a moment of execution. try to maintain this sorting if possible by using “SAME” partitioning. it does not include operators that are inserted by EE. Due to buffering and to some job semantics. See “Reading a Score Dump” in section 10. The job will appear to hang. Note that sort memory usage can only be specified for standalone Sort stages. or a run with a sample set of data. That is. and groupings. specifying the “don’t sort. However. a snapshot image of the flow may not be a representative sample of the performance over the course of the entire job. o Performance of individual sorts can be improved by increasing the memory usage per partition using the “Restrict Memory Usage (MB)” option of the standalone Sort stage .Data Stage Best Practices & Performance Tuning Buildops should be used instead of Transformers in the handful of scenarios where complex reusable logic is required. partitioning. o The stable sort option is much more expensive than non-stable sorts.2 OS/RDBMS-Specific Tools Each OS and RDBMS has its own set of tools which may be useful in performance monitoring. The CPU summary information provided by the Job Monitor is useful as a first approximation of where time is being spent in the flow.  If data has already been partitioned and sorted on a set of key columns. When reading from these datasets. previously sorted” option for those key columns in the Sort stage will reduce the cost of sorting and take greater advantage of pipeline parallelism. Talking to the system administrator or DBA may provide some useful monitoring strategies. and should only be used if there is a need to maintain row order except as needed to perform the sort. rows are being read from the dataset and passed to the sort. it cannot be changed for inline (on a link) sorts. o When writing to parallel datasets. For these components. A worst-case scenario occurs when a job flow reads from a dataset. a Job Monitor snapshot should not be used in place of a full run of the job. 5. Page 25 of 30 . - 5 Performance Monitoring and Tuning 5. Minimize and combine use of Sorts where possible o It is sometimes possible to re-arrange the order of business logic within a job flow to leverage the same sort order. but does not provide thorough performance metrics. The default setting is 20MB per partition. the score dump can be of assistance. sort order and partitioning are preserved. or where existing Transformer-based job flows do not meet performance requirements. Such operators include sorts which were not explicitly included and the suboperators of composite operators. and passes immediately to a sort on a link. when. in fact. The Job Monitor also does not monitor sorts on links.1.

01 user: 0.0> Operator completed.11)¨ This output shows that each partition of each operator has consumed about one tenth of a second of CPU time during its runtime portion.1> Calling runLocally: step=1. For example: ##I TFPM 000324 08:59:32(004) <generator. node=rh73dev04. status: APT_StatusOk elapsed: 0. and that repartitioning.02 (total CPU: 0. therefore.00 suser: 0. ptn=0 ##I TFPM 000325 08:59:32(005) <generator. is using.00 sys: 0. ptn=1 ##I TFPM 000325 08:59:32(019) <peek. may be useful in some situations to get finer-grained information as to which operators are using up CPU cycles. this should be done with care. status: APT_StatusOk elapsed: 0. however.3 Obtaining Operator Run-Time Information Setting $APT_PM_PLAYER_TIMING=1 provides information for each stage in the DataStage job log. gives you a sense of which operators are using more of the CPU. op=0. the information can be very enlightening. node=rh73dev04a. status: APT_StatusOk elapsed: 0. It can often be very useful to see how much CPU each operator.11) ##I TFPM 000324 08:59:32(013) <peek. setting $APT_PM_PLAYER_TIMING provides timings on every operator within the flow.09 ssys: 0. Common sense is generally required here.0> Calling runLocally: step=1. This. we’d see many more operators and partitions. If one partition of an operator is using significantly more CPU than others.0> Operator completed.02 (total CPU: 0. however. op=1. Page 26 of 30 . it may mean the data is partitioned in an unbalanced way. Unlike the Job Monitor CPU percentages.02 (total CPU: 0. op=1.Data Stage Best Practices & Performance Tuning 5.00 suser: 0.00 sys: 0. In a real world flow.09 ssys: 0.00 sys: 0.04 user: 0. Be aware.00 user: 0.1> Operator completed. a sort is going to use dramatically more CPU time than a copy. it may be an indication that there is a problem in your flow. for example. might be a useful strategy.0> Calling runLocally: step=1.11) ##I TFPM 000324 08:59:32(006) <peek. If one operator is using a much larger portion of the CPU than others. and each partition of each component.09 ssys: 0.00 suser: 0. node=rh73dev04. which globally disables stage combination. ptn=0 ##I TFPM 000325 08:59:32(012) <peek. and when combined with other metrics presented in this document. or choosing different partitioning keys. Setting $APT_DISABLE_COMBINATION=1 . that setting this flag changes the performance behavior of your flow.

Similarly. 5. Imagine an Oracle read. which does some processing. Keep the degree of parallelism the same. the slowest component is often a result of data skew.6 Ensuring Data is Evenly Partitioned Due to the nature of EE. and is then hashed and joined with another dataset. If the flow is running at roughly an identical speed. when only one repartitioning is ever necessary. a transform between a DB2 read and a DB2 write might need to have a nodemap placed on it to force it to run with the same degree of parallelism as the two DB2 stages in order to avoid two repartitions. The goal of modifying the flow is to see whether modified flow runs noticeably faster than the original flow. Much work has gone into the latest 7. If data is not evenly partitioned.4 Selectively Rewriting the Flow One of the most useful mechanisms you can use to determine what is causing bottlenecks in your flow is to isolate sections of the flow by rewriting portions of it to exclude stages from the set of possible causes. Similarly.Data Stage Best Practices & Performance Tuning 5. 5. For example. Some of this cannot be eliminated. a nodemap on a stage may prove useful for eliminating repartitions. implicit repartition. with a nodemap if necessary. When modifying the flow. This is rarely a problem. change more of the flow. In this case. Changing a job to write into a Copy stage with no outputs discards the data. Comparing the score dump between runs is useful before concluding what has made the performance difference. SAS operators. Repartitions are especially expensive when the data is being repartitioned on an MPP. removing any potentially suspicious stages while trying to keep the rest of the flow intact. adding a persistent dataset to a flow introduces disk contention with any other datasets being read. Sometimes a repartition might be able to be moved further upstream in order to eliminate a previous. and so on) some operators run with a degree of parallelism that is different than the default degree of parallelism. when and why these repartitions occur is important for flow understanding. RDBMS operators. Reading and writing data are two obvious places to be aware of potential performance bottlenecks. export. While editing a flow for testing. There might be a repartition after the Oracle read stage and then the hash. be aware of introducing any new performance problems. Due to operator or license limitations (import. it is important to keep in mind that removing one operator might have unexpected affects in the flow. where significant network traffic is generated. but it might be significant in some cases.5 Eliminating Repartitions Superfluous repartitioning should be eliminated. landing any read data to a dataset can be helpful if the point of origin of the data is a flat file or RDBMS.0 release to improve Transformer performance. but understanding the where. Removing any customer-created operators or sequence operators should be at the top of the list. If one Page 27 of 30 . the entire flow runs as slow as its slowest component. This pattern should be followed.

Differences in data volumes between keys often skew this data slightly. counts across all partitions should be roughly equal. turning off combination for these specific operators may result in a performance increase. EE can simply not make ideal use of the resources. limit the use of variable-length records within a flow. Ideally. but when broken up. There is a buffer operator tuning issue when a flow runs slowly when it is one massive flow. using fixed-length records can dramatically improve performance. performance grinds to a crawl. For more information on buffering. “Data Set Buffering” in the Orchestrate 7. However. such as Sequential File (import and export) and Sort. 5. Identifying such operators can be difficult without trial and error.01 and later releases. The most common situation arises when multiple operators.8. and writing that same data to a dataset. see Appendix A. Page 28 of 30 . and another has ten million. This is no longer an issue in 7. are combined and are performing disk I/O. In I/O-bound situations. xHyperlink\xd2 ” on page Default Font details specific common buffer operator configurations in the context of resolving various bottlenecks. displays the number of records per partition for each component. each component runs quickly. also goes quickly. the target stage has two inputs. is to make the source stage output rate match the consumption rate of the target stage. and waits until it has exhausted one of those inputs before reading from the next. $APT_RECORD_COUNTS=1 5. and in some cases. replacing an Oracle write with a Copy stage vastly improves performance. then loading using Oracle write.0 User Guide. therefore. Identifying these spots in the flow requires an understanding of how each stage involved reads its records. but any significant (over 5 or 10%) differences in volume should be a warning sign that alternate keys or an alternate partitioning strategy might be required. The default goal of the buffer operator.Data Stage Best Practices & Performance Tuning partition has ten records. on a specific link. For example.1 Variable Length Data In releases prior to v7. In any flow.7 Buffering for All Versions Buffer operators are introduced in a flow anywhere that a directed cycle exists or anywhere that the user or operator requests them using the C++ API or osh.01. For example. there may be situations where combining operators actually hurts performance. performance degrades. 5. When the two are put together.2 Combinable Operators Combined operators generally improve performance at least slightly. the performance improvement may be dramatic.8. and is often only found by empirical observation. where there is incorrect behavior for the buffer operator.8 Resolving Bottlenecks 5.

When importing fixed-length data. settings in all caps are environment variables and affect all buffer operators.8. Also. data is written to disk in 1MB chunks. Some disk arrays have read-ahead caches that are only effective when data is read repeatedly in like-sized chunks. however. the stage begins to push back on the rate of the upstream stage. There are. To experiment with this. By default. this option can only be used for fixed-length sequential files. Combinable operators often provide a dramatic performance increase when a large number of variable length fields are used in a flow. Once that buffer reaches two-thirds full. However. in certain situations. 5. some settings and rules of thumb that are often beneficial: • • If data is going to be read back in. it may cause significant performance problems. A dataset or fileset is a much more appropriate format. $APT_CONSISTENT_BUFFERIO_SIZE=n forces import to read data in chunks which are size n or a multiple of n. or not at all. $APT_DISABLE_COMBINATION=1 globally disables operator combining. the -readers option cannot be used because it opens multiple streams at evenly-spaced offsets in the source file. When the target stage reads very slowly. the Number of Readers Per Node option on the Sequential File stage can often provide a noticeable performance boost as compared with a single process reading the data. try disabling the combination of any stages that perform I/O and any sort stages. upstream stages begin to slow down. in parallel. for a length of time. however. the buffer operator has a 3MB in-memory buffer. AIX and HP-UX default to NOMAP. it should never be written as a sequential file.4 Buffering Buffer operators are intended to slow down their input to match the consumption rate of the output. Page 29 of 30 . APT_IO_MAP=1 and APT_BUFFERIO_MAP=1 can be used to turn on memory mapped IO on for these platforms. Once the 3MB buffer is filled. a big performance win.8. APT_IO_NOMAP=1 and APT_BUFFERIO_NOMAP=1 turn off this feature and sometimes affect performance. Memory mapped IO is. This can cause a noticeable performance loss if the optimal behavior of the buffer operator is something other than rate matching. such as a remote disk mounted via NFS. • • 5.x. In the following discussions. in many cases.3 Disk I/O Total disk throughput is often a fixed quantity that EE has no control over.Data Stage Best Practices & Performance Tuning This is a new option in the Advanced stage properties of DataStage Designer version 7. Settings in all lowercase are buffer-operator options and can be set per buffer operator. if there is a need to assign a number in source file row order.

it can only be set at the osh level. $APT_BUFFER_DISK_WRITE_INCREMENT or diskwriteincrement controls this and defaults to roughly 1000000 (1MB). If there is a significant amount of memory available on the machine. queueupperbound (no environment variable exists) can be set equal to max_memory to force a buffer of exactly max_memory bytes.Data Stage Best Practices & Performance Tuning In most cases. It defaults to roughly 3000000 (3MB). in a situation where a large. fixed buffer is needed within the flow. increasing the maximum in-memory buffer size is likely to be very useful if the buffer operator is causing any disk IO. $APT_BUFFER_MAXIMUM_MEMORY or maximummemorybuffersize is used to do this. per-link buffer settings are available in EE. the 1MB write to disk size chunk size may be too small. Finally. like Columns. Page 30 of 30 . This setting is rarely necessary to achieve good performance. They appear on the Advanced tab of the Input & Output tabs. For systems where small to medium bursts of IO are not desirable. The settings saved on an Output tab are shared with the Input tab of the next stage and vice versa. If there is enough disk space to buffer large amounts of data. the easiest way to tune the buffer operator is to eliminate the push back and allow it to buffer the data to disk as necessary. but is useful when there is a large variability in the response time of the data source or data target. Such a buffer blocks an upstream stage (until data is read by the downstream stage) once its buffer has been filled.down issues caused by the buffer operator. For releases 7. so this setting should be used with extreme caution.1 and beyond. this usually fixes any egregious slow. This setting may not exceed max_memory * 2/3.0. No environment variable is available for this flag. $APT_BUFFER_FREE_RUN=n or bufferfreerun do this. The buffer operator reads N * max_memory (3MB by default) bytes before beginning to push back on the upstream.

Sign up to vote on this title
UsefulNot useful