This action might not be possible to undo. Are you sure you want to continue?
y y y y y y y Try not to use a sort stage when you can use an ORDER BY clause in the database. Use a constraint to filter a record set is much slower than performing a SELECT WHERE . Make every attempt to use the bulk loader for your particular database. Bulk loaders are generally faster than using ODBC or OLE. Tune the OCI stage for 'Array Size' and 'Rows per Transaction' numerical values for faster inserts updates and selects. Sort the data as much as possible in DB and reduced the use of DS-Sort for better performance of jobs. Work with DB-admin to create appropriate Indexes on tables for better performance of DS queries. Use the native database stages.
y y y If the source file is fixed width, the Readers Per Node option can be used to read a single input file in parallel at evenly-spaced offsets. The environment variables $APT_IMPORT_BUFFER_SIZE and $APT_EXPORT_BUFFER_SIZE can be used to improve I/O performance. Use DataSet for any intermittent persistent storage needs instead of Sequential File stage since DataSet reads and writes data in a native format editable by DataStage and easily administered through DataStage itself. DataSets should not be edited/deleted directly from Unix.
y The degree of parallelism of a DataStage Job is determined by the number of nodes that is defined in the Configuration File. A configuration file with a larger number of nodes will generate a larger number of processes and will in turn add to the processing overheads as compared to a configuration file with a smaller number of nodes. Therefore, while choosing the configuration file one must weigh the benefits of increased parallelism against the losses in processing efficiency (increased processing overheads and slow start up time).Ideally, if the amount of data to be processed is small, configuration files with less number of nodes should be used while if data volume is more, configuration files with larger number of nodes should be used.
y Proper partitioning of data is another aspect of DataStage Job design, which significantly improves overall job performance. Partitioning should be set in such a way so as to have balanced data flow i.e. nearly equal partitioning of data should occur and data skew should be minimized. Avoid unnecessary Re-partitioning of data to avoid data being skewed.
y Take precautions when using expressions or derivations on nullable columns within the parallel Transformer:
previously sorted option for the key columns in the Sort stage The stable sort option is much more expensive than non-stable sorts. type conversions. are the primary operations which should be implemented using Modify instead of Transformer derivations. the non-key columns must be defined as nullable in the Join stage input in order to identify unmatched records. Always place a reject link on a parallel Transformer to capture / audit possible rejects. Sequence Job y y Always use ³Reset if required. specify the don't sort. use the Join or Merge stage. or when exception processing. Aggregator y y Use Hash method Aggregators only when the number of distinct key column values is small.y y Always convert nullable columns to in-band values before using them in an expression or derivation. Join y Be particularly careful to observe the nullability properties for input links to any form of Outer Join. Lookup y y It is most appropriate when reference data is small enough to fit into available shared memory. Note that sort memory usage can only be specified for standalone Sort stages. A Sort method Aggregator should be used when the number of distinct key values is large or unknown. Limit the use of database Sparse Lookups to scenarios where the number of input rows is significantly smaller (for example 1:100 or more) than the number of reference rows. it cannot be changed for inline (on a link) sorts. The default setting is 20 MB per partition. such as keep/drop. . some string manipulations. then run´ option in Job Activity stages. Sort y y y If data has already been partitioned and sorted on a set of key columns. Transformations that touch a single field. and should only be used if there is a need to maintain row order other than as needed to perform the sort The performance of individual sorts can be improved by increasing the memory usage per partition using the Restrict Memory Usage (MB) option of the Sort stage. If the Data Sets are larger than available memory resources. Even if the source data is not nullable. Any transformation which can be implemented in Modify is more efficient. Please enable checkpoint in the jobs properties. Modify y y The Modify stage is the most efficient stage available. and null handling.
Remove unneeded columns as early as possible within the job flow. Architecture y Datastage support grid architecture from 8. Minimize the warnings. Instead of using stage variables in transformer please use pivot enterprise stage for vertical pivoting.1 version.Environment Variable y DataStage EE provides a number of environment variables to control how jobs operate on a UNIX system. Please use RCP (Runtime Column Propagation) whenever possible. Job Design y y y y Minimize the usage of Transformer (Instead of this use Copy modify Filter Row Generator). Use the new xml input stage and xml transformer stage as it can handle large and complex xml and the processing time is much faster. and to tune performance settings. environment variables can be used to enable or disable various DataStage features. As grid does proper utilization of hardware and speed up the process so if possible it is better to implement the grid architecture. . Use transformer for looping and aggregation. In addition to providing required information. 8. Server Settings y y y In the scratch disk there should be enough space otherwise jobs using sort stage with large volume of data might fail. As the connector stage supports multiple inputs it can load the multiple tables at a time. Reduce the number of lookups in a job design.5 Performance Enhancements y y y y y As connector stages (like Oracle Connector. Db2 connector) are the upgraded stages(Better error Handling) for enterprise stage go for connector stages. Proper NLS setting Need to take care of the environment variables while moving from one environment to other environments. Migrating the entire DSParams file can result in development environment settings being moved into production and trying to migrate just the user defined section can result in a corrupt DSParams file.
IBM has added Resource Estimation to DataStage (and QualityStage) 8. This can be used to determine the needed system requirements or to analyze if the current infrastructure can support the jobs that have been created. Score Dump y In order to resolve any performance issues it is essential to have an understanding of the data flow within the jobs. Performance Analysis y IBM Infosphere DataStage 8. a score dump should be taken.x adds a new capability called Performance Analysis. Resource Estimation y With new sophisticated analytical information and deep understanding of the parallel framework.Performance Monitoring Job Monitor y The IBM InfoSphere DataStage job monitor can be accessed through the IBM InfoSphere DataStage Director. This can be done by setting the APT_DUMP_SCORE environment variable to "true´ prior to running the job. To help understand a job flow. It is enabled through a job property on the execution tab which collects data at job execution time .x.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.