P. 1
Datastage Parallel Job Advanced Developers Guide

Datastage Parallel Job Advanced Developers Guide


|Views: 17,053|Likes:
Published by princeanilb

More info:

Published by: princeanilb on Apr 23, 2009
Copyright:Attribution Non-commercial


Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less





Superfluous repartitioning should be identified. Due to operator or
license limitations (import, export, RDBMS ops, SAS, etc.) some
stages will run with a degree of parallelism that is different than the
default degree of parallelism. Some of these can’t be eliminated, but
understanding the where, when and why these repartitions occur is
important for flow analysis. Repartitions are especially expensive
when the data is being repartitioned on an MPP system, where
significant network traffic will result.

Sometimes you may be able to move a repartition upstream in order
to eliminate a previous, implicit repartition. Imagine an Oracle stage
performing a read (using the oraread operator). Some processing is
done on the data and it is then hashed and joined with another data
set. There might be a repartition after the oraread operator, and then
the hash, when only one repartitioning is really necessary.

Similarly, specifying a nodemap for an operator may prove useful to
eliminate repartitions. In this case, a transform stage sandwiched
between a DB2 stage reading (db2read) and another one writing
(db2write) might benefit from a nodemap placed on it to force it to run
with the same degree of parallelism as the two db2 operators to avoid
two repartitions.

Identifying Buffering Issues

Buffering is one of the more complex aspects to parallel job
performance tuning. Buffering is described in detail in Chapter4, "Link

The goal of buffering on a specific link is to make the producing
operator’s output rate match the consumption rate of the downstream
operator. In any flow where this is incorrect behavior for the flow (for
example, the downstream operator has two inputs, and waits until it
had exhausted one of those inputs before reading from the next)
performance is degraded. Identifying these spots in the flow requires
an understanding of how each operator involved reads its record, and
is often only found by empirical observation.

You can diagnose a buffering tuning issue when a flow runs slowly
when it is one massive flow, but each component runs quickly when

Resolving Bottlenecks

Improving Performance


Parallel Job Advanced Developer’s Guide

broken up. For example, replacing an Oracle write stage with a copy
stage vastly improves performance, and writing that same data to a
data set, then loading via an Oracle stage, also goes quickly. When the
two are put together, performance is poor.

"Buffering" on page3-10 details specific, common buffering
configurations aimed at resolving various bottlenecks.

Resolving Bottlenecks

You're Reading a Free Preview

/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->