You are on page 1of 187

DataStage Advanced Bootcamp -- Labs

IBM® InfoSphere™
DataStage
Advanced
Workshop Lab
Workbook

© Copyright IBM Corporation 2010. Page 1 of 187


DataStage Advanced Bootcamp -- Labs

Table of Contents

Lab 1: DataStage Parallel Engine Architecture ....................................... 5


The default configuration file ......................................................................................... 5
Job using data partitioning and collecting ...................................................................... 5
Examine job log and target files ..................................................................................... 6
Experiment with a different partitioning......................................................................... 8
Lab 2: Reject Links...................................................................................... 9
Sequential File Stage with Reject Link........................................................................... 9
Lookup Stage with Reject Link .................................................................................... 12
Transformer Stage with Reject Link............................................................................. 16
Merge Stage with Reject Link ...................................................................................... 18
Lab 3: Generate Mock Data ..................................................................... 22
Design a job that generates a mock data file................................................................. 22
Examine the job log ...................................................................................................... 29
Lab 4: Read Data in Bulk ......................................................................... 31
Build the job to read data in as single column records ................................................. 31
Lab 5: Read Multiple Format Data in a Sequential File Stage ............. 33
Generate the Header Detail data file............................................................................. 33
Build a job that processes the Header Detail file.......................................................... 37
Lab 6: Complex Flat File Stage ................................................................. 41
Import a COBOL file definition ................................................................................... 41
Using the Complex Flat File stage................................................................................ 43
Lab 7: Optimize a Fork-Join Job ............................................................. 51
Generate the Source Dataset for the Fork-Join Job ...................................................... 51
Build the fork-join job .................................................................................................. 52
Optimize the job............................................................................................................ 53
More optimization......................................................................................................... 54
Lab 8: Sort Stages to Identify Last Row in Group................................. 57
Create the job ................................................................................................................ 57
Lab 9: Globally Sum All Input Rows Using an Aggregator Stage ....... 62
Write to a database table using INSERT ...................................................................... 62
Lab 10: Slowly Changing Dimension Stage ............................................ 67
Create the surrogate key source files ............................................................................ 67
Build an SCD job with two dimensions........................................................................ 70
Build an SCD job to process the first dimension.......................................................... 70
Build an SCD job to process the second dimension ..................................................... 79
Lab 11: Reject Links – DB2 Connector.................................................... 87
DB2 Connector stage with a Reject Link ..................................................................... 87

© Copyright IBM Corporation 2010. Page 2 of 187


DataStage Advanced Bootcamp -- Labs

Lab 12: Dual Inputs to a Connector Stage ............................................... 91


Insert both parents and children records with a single Connector ................................ 91
Lab13: Metadata in the Parallel Framework ......................................... 96
Create a parameter set ................................................................................................... 96
Create a job with a Transformer stage .......................................................................... 97
Use a schema file in a Sequential File stage ............................................................... 101
Define a derivation in the Transformer....................................................................... 103
Create a Shared Container .......................................................................................... 105
Lab 14: Create an External Function Routine ..................................... 109
Use an External Function Routine in a Transformer stage ......................................... 111
Lab 15: Create a Wrapped Stage ........................................................... 113
Create a simple Wrapped stage................................................................................... 113
Lab 16: Working with a Build Stage ..................................................... 116
Create a simple Build stage......................................................................................... 116
Create a job that uses your Build stage....................................................................... 119
Lab 17: Performance Tuning ................................................................. 122
Use Job Monitor.......................................................................................................... 122
Use Performance Analysis tool................................................................................... 124
Analyze the Performance of another Job .................................................................... 127
Lab 18: Process Header / Detail records in a Transformer................. 131
Build a job that processes the Header Detail file........................................................ 131
Lab 19: Exploring the Optimization Capabilities ................................. 134
Creating an optimized version of a parallel job .......................................................... 134
Comparing the performances between root and optimized job .................................. 138
Managing the root versus optimized jobs ................................................................... 145
Pushing the processing to the source and target databases ......................................... 147
Pushing data reduction to database target ................................................................... 155
Optimizing a complex job........................................................................................... 157
Scenario 1: common database for source and target tables ................................... 158
Scenario 2: different databases for source and target tables .................................. 167
Deciding where to stop the optimization process ....................................................... 172
Balancing between Database and DataStage engines ................................................. 173
Lab 20: Repository Functions................................................................. 178
Execute a Quick Find.................................................................................................. 178
Execute an Advanced Find ......................................................................................... 178
Generate a report......................................................................................................... 179
Perform an impact analysis......................................................................................... 181
Find the differences between two jobs........................................................................ 183
Find the differences between two Table Definitions .................................................. 186

© Copyright IBM Corporation 2010. Page 3 of 187


DataStage Advanced Bootcamp -- Labs

List of userids and passwords used in the labs:

ENVIRONMENT USER PASSWORD

SLES user root inf0sphere

IS admin1 isadmin inf0server

DataStage user dsuser inf0server

WAS admin2 wasdmin inf0server

DB2 admin db2admin inf0server

DataStage admin dsadm inf0server

Note: the passwords contain a zero, not the letter o.

All the required data files are located at: /DS_Advanced. You will be using the
DataStage project called “dstage1”. Optionally, you may put all your DataStage
objects (e.g. jobs, parameter sets, etc.) in the project folder /dstage1/Jobs/DS-
Advanced/.

Please start both the DataStage Designer and Director to do the following
exercises.

1
IS admin: InfoSphere Information Server administrator
2
WAS admin: WebSphere Application Server administrator

© Copyright IBM Corporation 2010. Page 4 of 187


DataStage Advanced Bootcamp -- Labs

Lab 1: DataStage Parallel Engine Architecture

The default configuration file


The default configuration file, under which DataStage is using, is stored in a file named
default.apt. This file is located in:

/opt/IBM/InformationServer/Server/Configurations

This configuration file has been modified to have two nodes defined and will allow us to
exercise the capabilities of a parallel engine.

Job using data partitioning and collecting


1. Create a job ‘CreateSeqJobPartition’ and save it in the Jobs folder.

2. Rename the stages and links as shown. This is a good practice as short
documentation.

3. Set up the source Sequential File stage to read the file Selling_Group_Mapping.txt
and don’t forget to import the table definition first.

4. Simply map all the columns across in the Copy stage.

5. Set up the target Sequential File stage to write to two different target files,
TargetFile1.txt and TargetFile2.txt

6. Notice that the partitioning icon is ‘Auto’. (Note: If you do not see this, refresh the
canvas by turning “Show link markings” off and on using the toolbar button.)

7. Compile and run your job.

© Copyright IBM Corporation 2010. Page 5 of 187


DataStage Advanced Bootcamp -- Labs

Examine job log and target files


1. View the job log. Notice how the data is exported to the two different partitions (0
and 1).

2. Log in to the Information Server system using ID “dsadm”. Change directory to


where all the files are. Open the source file, Selling_Group_Mapping.txt, and each of
the two target files, TargetFile1.txt and TargetFile2.txt, with gedit or vi.

Source file:

© Copyright IBM Corporation 2010. Page 6 of 187


DataStage Advanced Bootcamp -- Labs

Target file 1:

Target file 2:

3. Notice how the data partitioned. Here, we see that the 1st, 3rd, 5th, etc. go into one
file and the 2nd, 4th, 6th, etc. go in the other file. This is because the default
partitioning algorithm is Round Robin.

© Copyright IBM Corporation 2010. Page 7 of 187


DataStage Advanced Bootcamp -- Labs

Experiment with a different partitioning


1. Open the target sequential file stage. Go to the ‘Partitioning’ tab. Change the
partitioning algorithm to e.g. ENTIRE.

2. Compile and run the job again. Open the target files and examine. Notice how the
data gets distributed. Experiment with different partitioning algorithms!

3. The following table shows the results for several partitioning algorithms with one
particular system (yours may not match exactly):

Partitioning Algorithm Records in Records in Comments


File1 File2

Round-Robin (Auto) 23 24 Every other records

Entire 47 47 Each file contains all the


records

Random 22 25 Random distribution

Hash on column 20 27 File 1 with Handling_code


“Special_Handling_Code” 6; File 2 with other
Handling_codes

© Copyright IBM Corporation 2010. Page 8 of 187


DataStage Advanced Bootcamp -- Labs

Lab 2: Reject Links

Sequential File Stage with Reject Link


1. Create a job as shown here and save it as ‘RejectLinkSeqFile’ in the Jobs folder of
the project.

2. Set up the Sequential File stage to read the file Selling_Group_Mapping.txt.

3. Compile and run the job to ensure successful execution.

© Copyright IBM Corporation 2010. Page 9 of 187


DataStage Advanced Bootcamp -- Labs

4. Use either gedit or vi to change Selling_Group_Mapping.txt file: put a letter into the
1st column of three records.

5. Run the job again. Check the log messages with the Director and notice the
behavior of the Sequential File stage throwing the record away with a warning
message.

© Copyright IBM Corporation 2010. Page 10 of 187


DataStage Advanced Bootcamp -- Labs

6. Now add a reject link and a Peek stage as shown. Don’t forget to change the “Reject
Mode” property of the Sequential File stage to “Output”.

7. Compile and run the job again. Check the log messages to see the records with
incorrect data were sent down the reject link and captured by the Peek stage.

© Copyright IBM Corporation 2010. Page 11 of 187


DataStage Advanced Bootcamp -- Labs

Note the one from Peek_Reject,0 has (…) at the end. Open it up you will see the
following:

Lookup Stage with Reject Link


1. Create a job as shown here and save it as ‘RejectLinkLU’ in the project.

2. Set up the Sequential File stages to read Warehouse.txt as source records and
Items.txt as reference records. Don’t forget to import the table definitions.

3. Column Item is the lookup key.

© Copyright IBM Corporation 2010. Page 12 of 187


DataStage Advanced Bootcamp -- Labs

4. Map all columns from the source (Warehouse.txt) plus the column Description to
the output.

5. The lookup failure action property can be set to any choice except Reject. Click the
yellow constraint icon and set the lookup failure action.

6. Set up the target Sequential File stage to write the records to Warehouse_Items.txt
file.

7. Compile and run the job.

8. If you set the lookup failure action to FAIL, then you should see your job aborted.

If you set the lookup failure action to Drop, then you will not see any log message
from the Lookup stage. However, you can see that the number of records read from
Warehouse.txt and the records written to Warehouse_Items.txt are differed by 9
records.

If you set the lookup failure action to Continue, then all records will be passed to the
output.

© Copyright IBM Corporation 2010. Page 13 of 187


DataStage Advanced Bootcamp -- Labs

9. Now, let’s change the lookup failure action to REJECT. Add a reject link with a Peek
stage as shown.

10. Compile and run the job.

11. This time you should see log messages from the Peek_Reject stage.

© Copyright IBM Corporation 2010. Page 14 of 187


DataStage Advanced Bootcamp -- Labs

And if you open each log message, you should find a total of 9 records log by the
Peek stage.

© Copyright IBM Corporation 2010. Page 15 of 187


DataStage Advanced Bootcamp -- Labs

Transformer Stage with Reject Link


1. Use (Open) the Lookup job again and save it as ‘RejectLinkXformer’.

2. Add a Transformer stage between the Lookup stage and the target Sequential File
stage. Add a reject link from the Transformer stage to a Peek stage. To do so, you
need to select the link then right click your mouse to choose “Convert to Reject”.

© Copyright IBM Corporation 2010. Page 16 of 187


DataStage Advanced Bootcamp -- Labs

And your final job should look similar to the picture shown.

3. Change the Lookup stage:


Lookup failure action = Continue

Nullable attribute of Description column of Items record = Yes

Nullable attribute of Description column of Warehouse_Items output record = Yes

© Copyright IBM Corporation 2010. Page 17 of 187


DataStage Advanced Bootcamp -- Labs

4. In the Transformer stage, map all columns of input record to output. Change the
derivation for Description to “[“:Warehouse_Items.Description:”]”.

5. Don’t forget to handle the NULL in the target Sequential File stage (just in case).

6. Compile and run your job.

7. With DataStage prior to version 8.5, you should find log messages by the
Peek_X_Reject stage containing those records with a NULL in the Description
column. However, DataStage 8.5 now will handle (allow) NULL in derivation so you
won’t see any rejected records.

Merge Stage with Reject Link


1. Open the Lookup job and save it as ‘RejectLinkMerge’.

© Copyright IBM Corporation 2010. Page 18 of 187


DataStage Advanced Bootcamp -- Labs

2. Add a Remove Duplicate stage. Replace the Lookup stage with a Merge stage. Add
a reject link to a Peek stage from the Merge stage.

3. Set up the Remove Duplicate stage with the following properties: Key = Item, Retain
= Last. Map all columns to output.

© Copyright IBM Corporation 2010. Page 19 of 187


DataStage Advanced Bootcamp -- Labs

4. Set up the Merge stage with the following properties:

5. On the Link Ordering tab, make sure all the internal links and external links are
correctly aligned.

6. Compile and run the job.

7. Find the log message from the Peek_Reject stage.

© Copyright IBM Corporation 2010. Page 20 of 187


DataStage Advanced Bootcamp -- Labs

8. You can see the update records that do not have corresponding master records are
rejected. Viewing the detail of the Peek_Reject stage log message will show these
rejected update records.

© Copyright IBM Corporation 2010. Page 21 of 187


DataStage Advanced Bootcamp -- Labs

Lab 3: Generate Mock Data


In this task, you create a job to generate a mock data file to be used in later exercises.

Design a job that generates a mock data file


1. Create a new parallel job named ‘archGenData’ and save it. Add the stages and
links and name them as shown.

2. Open up the Row Generator stage. On the Properties pages specify that 1000 rows
are to be generated.

© Copyright IBM Corporation 2010. Page 22 of 187


DataStage Advanced Bootcamp -- Labs

3. On the Output>Columns tab, specify the column definitions as shown.

4. Open up the Extended Properties window for the CustID column. (Double-click on
the number to the left of the column.) Specify that the type of algorithm is cycle with
an initial value of 10000.

5. For the Int1 column cycle from 0 to 29.

6. For the Int2 column cycle from 1 to 29. (It’s important that this not start at 0, so that
these cycles won’t repeat.)

7. For the Int3 column cycle from 2 to 29.

8. For the Int4 column cycle from 3 to 29.

9. For the MiddleInit column, use the alphabet algorithm over a string of characters that
might be middle name initials. (That is, remove the numerals from the list.)

© Copyright IBM Corporation 2010. Page 23 of 187


DataStage Advanced Bootcamp -- Labs

10. For the Zip column, cycle as shown.

11. For the CustDate column, generate random dates with a limit of e.g. 20000, so that
the dates don’t get too large.

12. For the InsertUpdateFlagInt, select random integers with a limit of 2. This will ensure
that values are either 0 (meaning update) or 1 (meaning insert).

13. Close the stage and then open it again and click the View Data button to examine a
sampling of the data that will be generated.

14. Edit the Sequential File stages that are used as lookup tables. The sequential files
are FName.txt, LName.txt, Street1.txt, and Street2.txt. Examine these files to get an
idea of the data they contain. Import the metadata for these files and load it into the
stages.

© Copyright IBM Corporation 2010. Page 24 of 187


DataStage Advanced Bootcamp -- Labs

15. Edit the Lookup stage. Map the Int1 to Int4 columns, respectively, to the Num
columns of each of the lookup files. Define the output columns in the order shown at
the far right.

16. Click the Constraints button. Specify that rows that fail to find matches are to be
rejected.

© Copyright IBM Corporation 2010. Page 25 of 187


DataStage Advanced Bootcamp -- Labs

17. Edit the Transformer. Define the target columns as shown.

© Copyright IBM Corporation 2010. Page 26 of 187


DataStage Advanced Bootcamp -- Labs

18. Define the following derivations (in addition to the straight mappings):
• Middle initial should be uppercase.
• Address should consist of the street name followed by the street modifier.
• Rows with customer dates later than the current date should get the current date.
Otherwise, they retain the date in the source row.
• The added column DateEntered should get the date of the job run.
• The InsertUpdateFlag column, which is now Char(1) should replace 0 by “U” and
1 by “I”.

© Copyright IBM Corporation 2010. Page 27 of 187


DataStage Advanced Bootcamp -- Labs

19. Open up the Job Properties window and define a new job parameter named
TargetPath. Provide it with a default that creates a file named CustomersOut.txt in
the /DS_Advanced directory.

20. Edit the target Sequential File stage. Insert your TargetPath job parameter in the
File property to create a new comma-delimited file named CustomersOut.txt. Don’t
surround the column values with quotes.

21. Edit the rejects Sequential File stage. Write rejects to a file named RejectsOut.txt.

22. Compile and run your job. Examine the job log in Director. Fix any errors. Try to
eliminate all warnings.

23. View the target data in DataStage Designer.

24. In addition to viewing the data in DataStage, view the data file in your directory.
Verify that quotes don’t surround the values and that the data is delimited by
commas.

© Copyright IBM Corporation 2010. Page 28 of 187


DataStage Advanced Bootcamp -- Labs

Examine the job log


1. In Director, open up the job log. Locate and examine the message that lists the
values of the job parameters.

2. Locate and examine the message that lists the values of the environment variables
that are in effect at the time the job is run.

3. Locate and examine the message that displays the OSH (Orchestrate Script) that is
generated for the job.

© Copyright IBM Corporation 2010. Page 29 of 187


DataStage Advanced Bootcamp -- Labs

4. Locate and examine the message that displays the configuration file used when the
job was run. How many nodes are defined in the file? (Note: Your job will be using a
different configuration file than the one shown here. This is just an example.)

5. Locate and examine the message that lists the job’s datasets, operators, and
number of processes. This is known as the Score. (Note that you won’t see the
word ‘Score’. The first line is how you can identify it.)

6. Locate the message that says how many rows were successfully written to the target
Sequential File stage and how many were rejected?

© Copyright IBM Corporation 2010. Page 30 of 187


DataStage Advanced Bootcamp -- Labs

Lab 4: Read Data in Bulk


In this task, you create a job that reads data from a file in bulk, that is, in a single
column. The Column Import stage is used to parse the columns.

Build the job to read data in as single column records


1. Create a new parallel Job named dataReadBulkData.

2. Enable RCP for the job and all existing links in the job properties.

3. Edit the source Sequential File stage to read data from the Customers.txt file. On
the Columns tab, specify a single column named RecIn, VarChar(1000).

4. On the Formats tab, specify that there is no quote character. Verify that you can
view the data.

5. Edit the Column Import stage. On the Properties tab, specify that the Import Input
Column is RecIn. As the Column Method, specify Schema File and then reference
the Customers.schema schema file.

6. That’s it for the Column Import stage. It uses RCP to send the columns of data
through to the target.

7. In the target Sequential Stage specify the file to write to.

© Copyright IBM Corporation 2010. Page 31 of 187


DataStage Advanced Bootcamp -- Labs

8. Compile and run. View your data file.

© Copyright IBM Corporation 2010. Page 32 of 187


DataStage Advanced Bootcamp -- Labs

Lab 5: Read Multiple Format Data in a Sequential File Stage


In this lab, you will be creating the header and detail records and then build a job
to process these records so that header information are available on all
associated detail records.

Generate the Header Detail data file


1. Create a new parallel Job and name it partGenHeaderDetailFile.

2. Edit the GenerateHeader Row Generator stage. On the Properties tab specify that 9
records are to be generated.

3. Define the columns, as shown. Save the table definition for the next lab to use.

4. For the OrderNum column, generate numbers from 1 to 9. Also, for this and all
columns add the optional Quote property from the Field Level category and set it to
NONE.

5. For RecType column, generate ‘A’.

6. Choose your own algorithms for the remaining fields. In what follows, I’ve chosen
random for OrdDate with a limit of 20000.

© Copyright IBM Corporation 2010. Page 33 of 187


DataStage Advanced Bootcamp -- Labs

7. Click View Data.

8. Edit the GenerateDetail Row Generator stage. On the Properties tab specify that 81
records are to be generated.

9. Define the columns, as shown. Save the table definition for the next lab to use.

10. For the OrderNum column, generate numbers from 1 to 9. Also, for this and all
columns add the optional Quote property and set it to NONE.

11. For RecType column, generate ‘B’.

12. Choose Random with a limit of 9999 for the remaining fields.

© Copyright IBM Corporation 2010. Page 34 of 187


DataStage Advanced Bootcamp -- Labs

13. Click View Data.

14. In the Column Export stage for the header, in the Input folder of the Properties tab,
specify the input columns that are to be exported to a single output column
(OrderNum, RecType, etc., in the order shown). In the Output folder specify the
name of the single column (Header) and its type (VarChar) that the input columns
are to be combined into.

© Copyright IBM Corporation 2010. Page 35 of 187


DataStage Advanced Bootcamp -- Labs

15. On the Output Columns, create a column named RecOut and map the input field to it
on the Mappings tab.

16. Define the Column Export stage for the detail records in a similar way. Be sure to
use the same name (RecOut) for your output column name.

17. In the Funnel stage, map the single RecOut column across to the target.

18. In the Sort stage, specify that the records are to be sorted in ascending order. The
key is the single RecOut column.

19. Write to a file named Orders.txt. In the Sequential File stage Format tab, set the
quote property to NONE.

© Copyright IBM Corporation 2010. Page 36 of 187


DataStage Advanced Bootcamp -- Labs

20. Compile and run and view your data. It should look something like this, with the
header records at the front of each group of records, grouped by order number.

Build a job that processes the Header Detail file


1. Create a new parallel Job and name it partCombineHeaderDetail. From left to right it
has a Sequential File stage, a Transformer stage, a Join stage, and a DataSet stage.

2. Your job reads data from the Orders.txt file. It reads this data in as one column of
data.

3. In the Transformer, define constraints to send the Header rows down the Header link
and the Detail rows down the Detail link. Also parse out the fields for each of the
record types using the Field function.

© Copyright IBM Corporation 2010. Page 37 of 187


DataStage Advanced Bootcamp -- Labs

4. To create the output column definitions for the two links, load the table definitions
saved from the previous job. The RecType fields won’t be needed downstream, so
delete them from the output. Add a column to the Detail link named RecordNum.
Define an expression that generates a unique integer for each Detail row regardless
of its partition.

5. In the Join stage, specify an inner join by OrderNum.

6. On the Input>Partitioning tab of the Join stage specify that the records are to be
hashed and sort partitioned by OrderNum for both the Header and Detail links.

7. Map all columns through the Join stage.

© Copyright IBM Corporation 2010. Page 38 of 187


DataStage Advanced Bootcamp -- Labs

8. In the Data Set target stage, name the file OrdersCombined.ds. Note the partitioning
icons in front of the Join stage.

9. Compile and run your job.

10. View the data using the Data Set Management tool available in the Designer Tools
menu. You should see 9 records in each group.

11. Next view the data in each partition. Notice that all the records in a group are in a
single partition, which is not spread across multiple partitions.

12. Save your job as partCombineHeaderDetail2. Now change how the partitioning is
done in the Join stage. Choose Entire for the Header link and SAME for the Detail
link. Turn off the sorts.

© Copyright IBM Corporation 2010. Page 39 of 187


DataStage Advanced Bootcamp -- Labs

13. In the Job Properties, add the environment variable, $APT_NO_SORT_INSERTION,


to the job and set its default value to TRUE.

14. Recompile and run your job. View the data using the Data Set management tool.
Notice that the groups of data are now spread across multiple partitions. This should
yield improved performance.

© Copyright IBM Corporation 2010. Page 40 of 187


DataStage Advanced Bootcamp -- Labs

Lab 6: Complex Flat File Stage

Import a COBOL file definition


1. Open the ORDERS.cfd file in the DS_Adv_Student_Files directory with Notepad or
Wordpad. Examine the file. Note the location of level 01 items, the total numbers of
level 01 item, and the names of the level 01 items.

2. In DataStage Designer, click Import > Table Definitions and then select Cobol File
Definitions.

3. Specify a Start position of 2.

4. Select the path to the ORDERS.cfd file.

© Copyright IBM Corporation 2010. Page 41 of 187


DataStage Advanced Bootcamp -- Labs

5. Take the default location in the repository. Select both DETAIL and HEADER
definitions. Click Import.

6. Find the newly imported table definition in the repository. Open one of them. Go to
the layout tab and explore the different settings of Parallel, COBOL, and Standard
views. Here is what the COBOL view looks like.

© Copyright IBM Corporation 2010. Page 42 of 187


DataStage Advanced Bootcamp -- Labs

7. Open up the HEADER table definition. Click on the Columns tab. Open the
extended properties (Edit Column Meta Data) window for the column ORDDATE
(double click on the column number). Set the Date Format field to CCYY-MM-DD.
This is to allow dates to be displayed correctly using this mask.

8. Remove the Level number and then click Apply and close the table definition.

Using the Complex Flat File stage


1. Create a new parallel job named cffOrders. The source stage is a Complex Flat File
stage. The target is a Sequential File stage. Name the links and stages as shown.

© Copyright IBM Corporation 2010. Page 43 of 187


DataStage Advanced Bootcamp -- Labs

2. Open the Orders CFF stage. On the File options tab, select the file to be read,
OrdersCD.txt.

3. Click the arrow at the bottom to move to the next Fast Path page, that is, the
Records tab. Remove the check from the Single record box.

4. Change the name of the default record type to HEADER.

5. Click Load. Select all the columns from the HEADER Table Definition.

© Copyright IBM Corporation 2010. Page 44 of 187


DataStage Advanced Bootcamp -- Labs

6. Click OK to load the columns.

7. Click the icon at the bottom left of the Records tab to add a new record type.
Complete the process to define and load the Table Definition for the DETAIL record
type.

© Copyright IBM Corporation 2010. Page 45 of 187


DataStage Advanced Bootcamp -- Labs

8. Select the HEADER record type and then click the Master Button (rightmost icon at
the bottom of the records tab). This will make the HEADER record type the master.

9. Click the arrow at the bottom to move to the next Fast Path page, that is, the
Records ID tab. Define the Record ID constraint for the HEADER record type.

10. Define the Record ID constraint for the DETAIL record type.

© Copyright IBM Corporation 2010. Page 46 of 187


DataStage Advanced Bootcamp -- Labs

11. Move to the next Fast path page, that is, the Output > Selection tab. Select all
columns from both record types.

12. Move to the last Fast path page, that is, the Output > Constraint tab. Click the
Default button to add the default output constraint. This will insure that only records
of these two record types will go out the output link.

© Copyright IBM Corporation 2010. Page 47 of 187


DataStage Advanced Bootcamp -- Labs

13. Click the Stage tab then Record options tab. Specify Text for Data format. Select
the ASCII character set. And type in the vertical bar (|) for the record delimiter. If
you open up the OrdersCD.txt file on the DataStage server, you will notice that all the
records are bunched up one after another with a vertical bar separating them. There
is no CR or LF character. This is the usual output of COBOL data from mainframe
and the CFF stage is designed to handle it.

14. Click the Layout tab. Select the COBOL layout option. View the COBOL layouts for
each of the record types. Shown below is the HEADER COBOL format.

15. Shown below is the DETAIL COBOL format.

© Copyright IBM Corporation 2010. Page 48 of 187


DataStage Advanced Bootcamp -- Labs

16. Move to the Output tab and Click View Data. Notice that all the columns from all the
record types are displayed with data in them. However, the data in the columns that
are mapped from the DETAIL record are invalid when viewing the record with record
type ‘A’ (HEADER record). But in the case with the record type ‘B’ (DETAIL record),
all columns contain valid data. In effect, we have just populated the HEADER record
information to all its associated DETAIL records.

17. Set up the target Sequential File stage to output all the records to a file named
CFFOrdersCombined.txt with comma separated and no quotes. In the Columns tab,
change the SQL type to Char for the column ORDDATE. Otherwise, you will get a
conversion error during execution.

18. Compile and run your job. You will get some warning about EOF (End-of-File)
without getting a record delimiter. This is normal due to the last record in the file.
These warning did not affect the correct processing of the data.

© Copyright IBM Corporation 2010. Page 49 of 187


DataStage Advanced Bootcamp -- Labs

19. To verify the result, view data on the target Sequential File stage.

© Copyright IBM Corporation 2010. Page 50 of 187


DataStage Advanced Bootcamp -- Labs

Lab 7: Optimize a Fork-Join Job

Generate the Source Dataset for the Fork-Join Job


1. Open up your archGenData job and save it as sortGenData.

2. Replace the target Sequential File stage by a Data Set stage.

3. The dataset accessed by the target dataset stage should be named Customers.ds.

4. Compile and run your job.

5. Verify that you can view the data in Customers.ds.

6. Save the metadata of Customers.ds to a table definition for use in the next section.

© Copyright IBM Corporation 2010 Page 51 of 187


DataStage Advanced Bootcamp -- Labs

Build the fork-join job


1. Create a new parallel job named sortForkJoin.

2. Edit the source stage to read from the Customers.ds dataset. Don’t forget to load
the table definition saved from last job.

3. In the Copy stage map all columns to both output links.

4. Edit the Aggregator stage. Group by Zip. Count the rows in each group of zip
codes. You will add this value to each Customer record. Change Grouping Method
to SORT.

5. Output the Zip column and the new ZipCount column from the Aggregator.

6. On the Partitioning tab, hash and sort by Zip.

7. Edit the Join stage. Specify an inner join on the Zip column.

8. On the Partitioning tab, hash and sort by Zip for both input links to the Join.

9. Write all the rows of the customer record with the added ZipCount column to an
output sequential file named CustomersCount.txt.

© Copyright IBM Corporation 2010. Page 52 of 187


DataStage Advanced Bootcamp -- Labs

10. Your job now looks like this. The hash, sorts on each of the three links going to the
Aggregator and Join stages are what would have implicitly been done by DataStage
if Auto had been selected.

11. Compile and run your job. Verify the data.

12. Examine the score. Are there any inserted tsort operators? What operators are
combined? In addition to the operators corresponding to the Aggregator, Join, and
Copy stages, what other operators are there in the score?

13. Save your job as sortForkJoin2.

14. Remove the Hash partitioning and in-stage sorts, going back to Auto.

15. Compile and run.

16. Examine the score. Compare with the other score in terms of number of operators,
number of processes, number of sorts, hash partitioners, etc.

Optimize the job


1. Save your job as sortForkJoin3.

2. Optimize your job by moving the hash and sort to the Copy stage. Specify SAME
partitioning for the links going into the Aggregator and Join.

© Copyright IBM Corporation 2010. Page 53 of 187


DataStage Advanced Bootcamp -- Labs

3. Recompile and run. Verify the data.

4. View the score. Compare with the scores from the previous jobs. Have the number
of sorts been reduced? Have the number of operators and processes been
reduced?

More optimization
In this task, we will push the partitioning and sorting back even further. We will partition
and sort when the dataset is generated and loaded.

1. Open up your sortGenData job and save it as sortGenData2.

2. On the Partitioning tab of the target Customers dataset, Hash partition and sort by
the Zip. Compile and run to generate a new Customers.ds

3. Save your sortForkJoin3 job as sortForkJoin4.

© Copyright IBM Corporation 2010. Page 54 of 187


DataStage Advanced Bootcamp -- Labs

4. Change the partitioning in the Copy stage to SAME and remove the sort, since the
data is already sorted coming out of the dataset.

5. Compile and run and view the score. Notice here the inserted tsort operators.
Although the data in the dataset is sorted, DataStage doesn’t know this and still
inserts the tsort operators.

© Copyright IBM Corporation 2010. Page 55 of 187


DataStage Advanced Bootcamp -- Labs

6. Open up the job parameters window, and add the environment variable named
$APT_NO_SORT_INSERTION (Disable sort insertion) as a job parameter. When
set, this will cause the Framework to just check that the data is sorted as it is
supposed to be. It will not add tsort operators.

7. Recompile and run. Run it with the $APT_NO_SORT_INSERTION parameter set to


true. View the score. Are there any inserted sorts? How many operators,
processes?

8. Compare when running the job with the $APT_NO_SORT_INSERTION turned off.

9. There is another environment variable called


$APT_SORT_INSERTION_CHECK_ONLY that is similar. “tsort” operators are
inserted, but they do not perform sorts. They just check whether the data is sorted.
Add this environment variable as a job parameter and compare the score when this
is turned on and turned off.

© Copyright IBM Corporation 2010. Page 56 of 187


DataStage Advanced Bootcamp -- Labs

Lab 8: Sort Stages to Identify Last Row in Group


In this exercise, you produce a state count and a list of zip codes from the
CityStateZip.txt file. Since the Aggregator stage can’t produce the list, you will use a
Transformer to produce the count and the list. The main difficulty here is that you need
to know when you have reached the end of each group of state records. To accomplish
this, you will use the Sort stage to add a key change column.

Create the job


1. Create a new parallel job named sortLastRow.

2. In the source Sequential stage read data from the CityStateZip.txt file. The
CityStateZip.txt file contains customer address information. In this job, you will
generate a report that lists each state followed by a count of the addresses in the
state and a list of the zip codes. Here’s a sampling of the source data and the
column names used.

© Copyright IBM Corporation 2010. Page 57 of 187


DataStage Advanced Bootcamp -- Labs

3. Here’s the report to be generated:

4. In the first Sort stage, set Hashing by State as the Partitioning method. We need to
have all the rows of a given state in the same partition in order to get a single count
for the state. The hash should be case insensitive.

5. In the first Sort stage, sort by State in ascending order. The sort should be case
insensitive. Turn off Stable Sort since we don’t need it.

© Copyright IBM Corporation 2010. Page 58 of 187


DataStage Advanced Bootcamp -- Labs

6. In the second Sort stage, set the “Create Cluster Key Change Column” option.
Specify that the data is already sorted as you specified in the previous Sort stage.

7. In the third Sort stage, specify that you want to sort by the cluster key change column
within the State groups. This will place the row with the cluster key change column
of 1 at the end of each State group.

8. Open up the Transformer. Define the stage variables in the order shown.

NewState: Char(1) flag initialized with “Y” indicates that a new state group is being
processed.

© Copyright IBM Corporation 2010. Page 59 of 187


DataStage Advanced Bootcamp -- Labs

Counter: Integer(3) counter initialized with 0 to track the number of members in a


group. Set to 0 when a new state is to be processed.

AddZip: VarChar(255) list of zip codes. Initialize it with an empty string. Lists the
zip codes processed in each group. Set it to the currently read zip code when a new
state is being processed.

PrevClusterKey: Integer(1), initialized to 1. Map the current clusterKeyChange input


value to it.

9. The first (NewState) is a flag. We set it to ‘Y’ when PrevClusterKey is 1.


PrevClusterKey is an integer column. Map the clusterKeyChange input column to it
and initialize it to 1. At the time the derivation for NewState is calculated, it will
contain the cluster key change value from the previous row read or 1 for the first row.
Counter is an integer field initialized to 0. AddZip is a varchar field initialized when a
new state group is read in. For each row read in a state group, Counter adds 1 to
the state count and AddZip adds the current Zip to the list.

10. For the StateCount link, there are three target columns. The State value comes from
the input. The other two target columns get their values from the Counter and
AddZip stage variables. Define a Constraint for the StateCount link. It should only
write out one record per State group. They should be written out when state Count
and Zip lists are complete for the group, i.e., when the clusterKeyChange column
equals 1.

11. Set the target Sequential File stage to write to a file without quotes.

© Copyright IBM Corporation 2010. Page 60 of 187


DataStage Advanced Bootcamp -- Labs

12. Compile and run. Verify the results. (Your ordering may be different.)

© Copyright IBM Corporation 2010. Page 61 of 187


DataStage Advanced Bootcamp -- Labs

Lab 9: Globally Sum All Input Rows Using an Aggregator Stage


You will create this job by generating a constant column with each record so that the
aggregator stage will add up the total number of records sequentially.

Write to a database table using INSERT


1. Create a new parallel job named tipsSumAll.

2. Read records from the Customers.txt file. Since the Customer.ds table definition is
the same as the file, you can use it.

3. In the Copy stage, pass all columns through to the CUSTS stage. Pass just the
CustID column through to the Column Generator.

© Copyright IBM Corporation 2010. Page 62 of 187


DataStage Advanced Bootcamp -- Labs

4. Edit the CUSTS DB2 Connector stage. Connect to the DB2 instance and the
SAMPLE database. Click the “Test” button to make sure you can connect.

© Copyright IBM Corporation 2010. Page 63 of 187


DataStage Advanced Bootcamp -- Labs

5. Write mode is INSERT. The table your job will create is named CUSTS. Select
REPLACE as the Table action with the statement generation and error handling as
shown.

6. Edit the Column Generator stage. Generate a new column named GroupByCol,
Char(1). Set the generation algorithm to just create a single letter ‘A’ using the
extended column properties window.

© Copyright IBM Corporation 2010. Page 64 of 187


DataStage Advanced Bootcamp -- Labs

7. Edit the Aggregator stage. Group by the GroupByCol. Count the number of rows in
the group and send the results to the NumRecs column. Specify Hash as the
Aggregation method, since the data doesn’t need sorting.

8. Specify that the Aggregator stage is to run sequentially.

9. Edit the target Sequential stage. Write to a file named CUSTS_Log.txt.

10. Compile and run your job.

© Copyright IBM Corporation 2010. Page 65 of 187


DataStage Advanced Bootcamp -- Labs

11. Check the results. The log file should contain the number of records read from the
source file and written to the target table, unless the database rejects some rows.

© Copyright IBM Corporation 2010. Page 66 of 187


DataStage Advanced Bootcamp -- Labs

Lab 10: Slowly Changing Dimension Stage

Create the surrogate key source files


1. In DataStage Designer, create a new parallel job named
scdCreateSurrogateKeyStateFiles.

2. From the Processing folder add two Surrogate Key Generator stages to the canvas.
Name them as shown. Also add the two DB2 Connector stages with links to the
Surrogate Key Generator stages.

© Copyright IBM Corporation 2010. Page 67 of 187


DataStage Advanced Bootcamp -- Labs

3. Open up the PRODDIM Connector stage. Specify the Connection and Usage
properties. Choose to have the stage generate SQL.

4. Click the Columns tab. Load the column definitions into the stage. The table
definition is stored in the repository at “Table Definitions  DB2  sample”.

5. Open up the STOREDIM Connector stage. Specify the Connection and Usage
properties. Choose to have the stage generate SQL. Load the column definitions
into the stage.

© Copyright IBM Corporation 2010. Page 68 of 187


DataStage Advanced Bootcamp -- Labs

6. Open the ProdDim_SKG_Create stage properties. The Key Source Update Action is
Create and Update. Select PRODSK for the input column name. Specify a path to a
source key file name proddim as shown.

7. Open the StoreDim_SKG_Update stage properties. The Key Source Update Action
is Create and Update. Select STORESK for the input column name. Specify a path
to a source key file named storedim as shown.

8. Compile and run your job. Check the job log for errors.

9. Verify that the files have been created and that they are not empty. If you encounter
error and need to run the job again, delete the state files.

© Copyright IBM Corporation 2010. Page 69 of 187


DataStage Advanced Bootcamp -- Labs

Build an SCD job with two dimensions


In this section you will update a star schema with two dimensions. The completed job
will look like the following. However, to ease the development and debugging, two
separate jobs will be built. The first will process the PRODDIM dimension table and
write its output to a dataset. The second will read the data from the dataset, process the
STOREDIM dimension table, and write the results to the fact table. We will not build this
complete job shown here since outside of a classroom there are many dimensions to
process. The standard practice is to process one dimension per job.

Build an SCD job to process the first dimension


1. Create a new parallel job named scdLoadFactTable_1.

2. ***Important*** Open the Job Properties window and make sure that Runtime
Column Propogation is not enabled. Otherwise, you will get runtime errors when
source columns such as StoreID are written to the PRODDIM_upd link.

© Copyright IBM Corporation 2010. Page 70 of 187


DataStage Advanced Bootcamp -- Labs

3. Add the stages and links as shown. Notice that the link from the PRODDIM
Connector stage to the Slowly Changing Dimension stage in the middle is a lookup
reference link.

4. Edit the SaleDetail stage. Read data from the SaleDetail.txt file. Import the table
definition. The column definitions are shown below. Correct them if necessary.

© Copyright IBM Corporation 2010. Page 71 of 187


DataStage Advanced Bootcamp -- Labs

5. Verify that you can view the data.

6. Edit the PRODDIM reference link stage. Set the Generate SQL property to Yes.
Click View Data.

7. On the Columns tab, load the column definitions. Select SKU, which is the business
key, as the lookup key field.

© Copyright IBM Corporation 2010. Page 72 of 187


DataStage Advanced Bootcamp -- Labs

8. Open the PROD_SCD stage. On the Stage > General tab, select SaleDetailOut as
the output link.

9. Move to the next Fast Path page (using the arrow key at the bottom left), that is, the
Input>Lookup tab. Specify the column matching to use to lookup a matching
dimension row. Here we want to retrieve the row with the matching PRODDIM
business (natural) key. Also select the purpose codes for each of the dimension
table columns, as shown.

© Copyright IBM Corporation 2010. Page 73 of 187


DataStage Advanced Bootcamp -- Labs

10. Move to the next Fast Path page, that is, the Input>Surrogate Key tab. Select the
surrogate key source file (proddim). Specify the surrogate key initial value, 1. Also
specify how many surrogate key values to retrieve from the state file in a single block
read. Specifying a block size of 1 ensures that there will be no gaps in the key
usage.

11. Move to the next Fast Path page, that is, the Output>Dim Update tab. Here specify
how to create a new dimension record and how to expire a dimension record that
has Type 2 columns in it. Be sure Output name is PRODDIM_Upd, that is, the name
of the dimension table update link. Use the Expression Editor to specify values and
functions.

© Copyright IBM Corporation 2010. Page 74 of 187


DataStage Advanced Bootcamp -- Labs

12. Move to the next Fast Path page, namely Output>Output Map tab. Here the
PRODDIM surrogate key field (PRODSK) replaces the business key field in the
source file.

© Copyright IBM Corporation 2010. Page 75 of 187


DataStage Advanced Bootcamp -- Labs

13. Click OK to close the SCD stage.

14. Open up the PRODDIM_Upd stage. Use Update then Insert to write to the target
SUPER.PRODDIM table. Let the stage generate the SQL.

© Copyright IBM Corporation 2010. Page 76 of 187


DataStage Advanced Bootcamp -- Labs

15. In the columns tab, make sure the PRODSK is the only column set as the key.

16. Edit the target DataSet stage.

© Copyright IBM Corporation 2010. Page 77 of 187


DataStage Advanced Bootcamp -- Labs

17. Compile. Before you run the job, view the data from the SaleDetail.txt file and the
dimension table. This way you can see clearly what happens when you execute the
job.

-----------------

18. Run the job. Check the job log for errors. View the data in PRODDIM to see if the
table was updated properly. SKU 3 doesn’t change. SKU 1 and 2 are new inserts.
SKU 4 and 5 are new Type 2 updates. The original records are preserved as
historical records (CURR=N) PRODSK=2 and 10 are kept as historical records.

19. View the data in the target dataset. A1111 and A1112 are assigned new surrogate
key values since they are inserts. A1113 was not changed, so it has the same
surrogate key value. A1114 and A1115 are new Type 2 updates. They received
new surrogate key values and are inserted into the target.

© Copyright IBM Corporation 2010. Page 78 of 187


DataStage Advanced Bootcamp -- Labs

20. If you want to rerun your job. Drop the three star schema tables and then re-run the
SQL file that creates the tables. Delete the surrogate key source files and then re-
run the job that creates and updates them.

Build an SCD job to process the second dimension


1. Create a new parallel job named scdLoadFactTable_2. Add the stages and links as
shown. Turn off RCP in the Job Properties window.

© Copyright IBM Corporation 2010. Page 79 of 187


DataStage Advanced Bootcamp -- Labs

2. Edit the SaleDetailOut DataSet stage. Extract data from the SaleDetailOut.ds file
that you created in the previous job. To get the Table Definition go to the Columns
tab of the target DataSet stage in your previous job. Click the Save button to save
the columns as a new Table Definition.

3. After you finish editing the stage, verify that you can view the data.

© Copyright IBM Corporation 2010. Page 80 of 187


DataStage Advanced Bootcamp -- Labs

4. Edit the STOREDIM stage. Load column definitions. Select the ID column as the
lookup key. Verify that you can view the data.

5. Open the STORE_SCD stage.

6. Specify the output link, SaleDetailOut2, on the first Fast Path page.

© Copyright IBM Corporation 2010. Page 81 of 187


DataStage Advanced Bootcamp -- Labs

7. Move to the next Fast path page, that is, the Input > Lookup tab. Specify the lookup
condition and purposes.

8. Move to the next Fast Path page, that is the Input > Surrogate Key tab. Select
storedim as the source key file to be used. Specify the other information as shown.

© Copyright IBM Corporation 2010. Page 82 of 187


DataStage Advanced Bootcamp -- Labs

9. Move to the next Fast Path page, that is the Output > Dim Update tab. Specify the
mappings and derivations as shown.

10. Move to the next Fast Path page, that is the Output > Output Map tab. Here the
STORE surrogate key replaces the Store business key from the source file.

© Copyright IBM Corporation 2010. Page 83 of 187


DataStage Advanced Bootcamp -- Labs

11. Edit the STOREDIM_upd stage. Be sure to qualify the table name by the schema
name, as shown.

12. Make sure STORESK is the only column set as the key.

13. Edit the FACTTBL stage. Be sure to qualify the table name by the schema name.

© Copyright IBM Corporation 2010. Page 84 of 187


DataStage Advanced Bootcamp -- Labs

14. Compile. Before you run the job, view the data from the SaleDetailOut.ds file and
the STOREDIM dimension table. This way you can see clearly what happens when
you execute the job.

-------------------

© Copyright IBM Corporation 2010. Page 85 of 187


DataStage Advanced Bootcamp -- Labs

15. Run the job. Check the job log for errors. View the data in the updated STOREDIM
table and in the FACTTBL.

-------------------------- STOREDIM

-------------------------- FACTBL

© Copyright IBM Corporation 2010. Page 86 of 187


DataStage Advanced Bootcamp -- Labs

Lab 11: Reject Links – DB2 Connector

DB2 Connector stage with a Reject Link


1. Create a job as shown here and save it as “RejectLinkDB2Connector” in the Jobs
folder of the project.

2. Set up the Sequential File stage to read the file Employees.txt. Load the Columns
from the table definition of DB2 table EMPLOYEE in the repository.

© Copyright IBM Corporation 2010. Page 87 of 187


DataStage Advanced Bootcamp -- Labs

3. Set up the DB2 Connector stage to write (insert) to database SAMPLE and table
DB2INST1.EMPLOYEE.

4. In the DB2 Connector properties, click on the reject link on the graph and edit the
Reject tab properties.

5. On the Columns tab, set the Enable Runtime Columns Propagation.

6. Set up the Reject Sequential File stage to write to file SQL_Error.txt.

7. Compile and run the job.

© Copyright IBM Corporation 2010. Page 88 of 187


DataStage Advanced Bootcamp -- Labs

8. Your job execution should be aborted since the Employees.txt contains duplicate
rows and the DB2 Connector options do not tell the job to reject these rows.

9. Go to the DB2 Connector properties > Reject tab and select the SQL Error check-
box.

10. Compile and run your job.

11. You should see the job finish successfully. This means your records should be
passed to the output and the rows that generate SQL error will be in the reject file.

© Copyright IBM Corporation 2010. Page 89 of 187


DataStage Advanced Bootcamp -- Labs

12. Open the SQL_Error.txt and verify that it contains the rows that already existed in the
Employees table.

13. Open the DB2 Connector properties again, click on the reject link and edit the Reject
tab options as below (Abort after property = 3):

14. Compile and run your job.

15. You should see your job aborted since Employees.txt contains more than 3 duplicate
rows.

© Copyright IBM Corporation 2010. Page 90 of 187


DataStage Advanced Bootcamp -- Labs

Lab 12: Dual Inputs to a Connector Stage


In this simple lab, we will be processing an input file that contains two kinds of records,
one is the PROJECT (project) record and the other one is the PROJACT (project
activity) records. The project activity records have a foreign key relating back to the
project record. With the referential integrity set in the database tables, we must insert all
the project records before the project activity records can be inserted. Here are the
CREATE statements for both tables and note the column PROJNO is the relationship:

CREATE TABLE "DB2INST1"."PROJECT" (


"PROJNO" CHAR(6) NOT NULL ,
"PROJNAME" VARCHAR(24) NOT NULL WITH DEFAULT '' ,
"DEPTNO" CHAR(3) NOT NULL ,
"RESPEMP" CHAR(6) NOT NULL ,
"PRSTAFF" DECIMAL(5,2) ,
"PRSTDATE" DATE ,
"PRENDATE" DATE ,
"MAJPROJ" CHAR(6) )
IN "USERSPACE1" ;

CREATE TABLE "DB2INST1"."PROJACT" (


"PROJNO" CHAR(6) NOT NULL ,
"ACTNO" SMALLINT NOT NULL ,
"ACSTAFF" DECIMAL(5,2) ,
"ACSTDATE" DATE NOT NULL ,
"ACENDATE" DATE )
IN "USERSPACE1" ;

Insert both parents and children records with a single Connector


1. Create a job as shown here and save it as “ParentChildRecords” in the Jobs folder of
the project.

2. Set up the Sequential File stage to read the file Parent_Child_Records.txt and read
each record in as a single column.

3. Use the Transformer stage to split the records. Hint: use constrains to examine the
record type indicator and the output record is parsed accordingly. As you have done
in an earlier exercise, use the Field function to parse the record. Also, load the table
definitions for both the Child and Parent output links from the Table Definition folder
in the repository.

© Copyright IBM Corporation 2010. Page 91 of 187


DataStage Advanced Bootcamp -- Labs

4. For setting up the DB2 Connector properties, open it and then click on the connector
icon. Set up the credentials as shown. And also select “All records” for recording
ordering.

© Copyright IBM Corporation 2010. Page 92 of 187


DataStage Advanced Bootcamp -- Labs

5. Click on the Parent link and set up the stage to insert records into the “project” table.
Let the stage generate the SQL. Be sure to set the Table action to Append.

© Copyright IBM Corporation 2010. Page 93 of 187


DataStage Advanced Bootcamp -- Labs

6. Repeat for the Child link to insert into table “db2inst1.projact”.

7. On the Link Ordering tab, make sure the parent is the first link as the records from
the first link will be processed first.

8. One other thing: since the job is running in partition mode, it is important to set the
partitioning of each input link to hash on PROJNO. This is to ensure all records with
the same project number in the same partition thereby the parent records and child
records will.

9. Compile and run the job.

© Copyright IBM Corporation 2010. Page 94 of 187


DataStage Advanced Bootcamp -- Labs

10. Your job execution should contain no error. And you should see a total of 4 records
inserted into table PROJECT and 4 records inserted into table PROJACT. The log
messages are: “[Input link n] Number of rows inserted: 2”.

© Copyright IBM Corporation 2010. Page 95 of 187


DataStage Advanced Bootcamp -- Labs

Lab13: Metadata in the Parallel Framework

Create a parameter set


1. Click the New button on the Designer toolbar and then open the “Other” folder.

2. Double-click on the Parameter Set icon.

3. On the General tab, name your parameter set SourceTargetData.

© Copyright IBM Corporation 2010. Page 96 of 187


DataStage Advanced Bootcamp -- Labs

4. On the Parameters tab, define the parameters as shown.

5. On the Values tab, specify a name for the Value File that holds all the job parameters
within this Parameter Set.

6. Save your new parameter set.

Create a job with a Transformer stage


1. Create a parallel job TransSellingGroup as shown then save the job.

2. Open up your Job Properties and select the Parameters tab. Click Add Parameter
Set. Select your SourceTargetData parameter set and click OK.

© Copyright IBM Corporation 2010. Page 97 of 187


DataStage Advanced Bootcamp -- Labs

3. Import the Selling_Group_Mapping.txt Table Definition.

4. Configure the source Sequential File stage properties using the parameters included
in the SourceTargetData parameter set. Also, set the option “First Line is Column
Names” as True.

5. Click Format tab, set Quote to none under Field defaults.

© Copyright IBM Corporation 2010. Page 98 of 187


DataStage Advanced Bootcamp -- Labs

6. Load the Table Definition previously imported in the Columns tab.

7. Open the transformer stage. Go to edit constraints by clicking on the chain icon and
create a constraint that selects only records with a Special_Handling_Code = 1.
Close the stage editor.

© Copyright IBM Corporation 2010. Page 99 of 187


DataStage Advanced Bootcamp -- Labs

8. In the Transformer stage, map all the columns from the source link to the target link
selecting all the source columns and drag-dropping them to the output link. The
transformer editor should appear as shown below:

9. Configure the properties for the target Sequential File stage. Use the TargetFile
parameter included in the SourceTargetData parameter set to define the File
property as shown. Also, set the option First Line is Column Names as True.

10. Compile and run your job.

11. View the data in the target and verify that there are only records having
Special_Handling_Code = 1.

© Copyright IBM Corporation 2010. Page 100 of 187


DataStage Advanced Bootcamp -- Labs

Use a schema file in a Sequential File stage


1. Log on to Administrator. On the Projects tab, select your project and then click
Properties. Enable RCP (Runtime Column Propogation) for your project or verify
that it is enabled. If you have to enable it, then you need to restart the Designer in
order to pick up the change.

2. Open your TransSellingGroup job and save it as Metadata_job.

3. Open up the Job Properties window and enable RCP for all links of your job. When
closing the Job Properties, answer YES to let Designer to turn on RCP for all the
links already in the job.

4. In the Repository window, locate the Selling_Group_Mapping.txt Table Definition that


was loaded into the source. Double-Click to open the Table Definition.

© Copyright IBM Corporation 2010. Page 101 of 187


DataStage Advanced Bootcamp -- Labs

5. On the Layout tab, select the Parallel button to display the OSH schema. Click the
right mouse button to save this as a file called Selling_Group_Mapping.osh. Note
that this file is saved on the client machine. Normally, you would have to transfer this
file to the DataStage server. We have already done this step for you.

6. Open up the schema file to view its contents. The “{prefix=2}” must be removed.
The version on the server does not contain these.

7. Open up your Source Sequential stage to the Properties tab. Add the Schema file
option. Then select the Selling_Group_Mapping.osh schema file.

8. On the Columns tab, remove all the columns.

© Copyright IBM Corporation 2010. Page 102 of 187


DataStage Advanced Bootcamp -- Labs

9. In the Transformer, clear all column derivations (don’t delete the output columns!)
going into the target columns. Also remove any constraints, if any are defined. If
you don’t remove the constraints, the job won’t compile, because the constraint
references an unknown input column.

10. Compile and run your job. Verify that the data is written correctly. That is now all
records are written since we don’t have a constraint any more.

11. If you need the constraint, then try defining the constraint in the Transformer stage
again. In addition, go to the Columns tab of the source Sequential File stage and
import just the Special_Handling_Code column from the Table Definition. Compile
and run your job. This time you should only have records that meet the constraint.

Define a derivation in the Transformer


1. Save your job as Metadata_job_02.
2. Open the target Sequential File stage. Remove all the columns. Add the
optional Schema File property and select the same schema file for it since the
metadata will be the same.

© Copyright IBM Corporation 2010. Page 103 of 187


DataStage Advanced Bootcamp -- Labs

3. Add a Copy stage just before the Transformer.

4. If you have loaded the Special_Handling_Code column in the source Sequential


File stage from the last exercise, remove it.
5. On the Columns tab of the Copy stage, load just the
Distribution_Channel_Description field from the Selling_Group_Mapping.txt
Table Definition. Verify that RCP is enabled.

6. Open the Transformer. If you have a constraint left from the last exercise,
remove it. Map the Distribution_Channel_Description column across the
Transformer. Define a derivation for the output column that turns the data to
uppercase.

7. Compile and run your job.

© Copyright IBM Corporation 2010. Page 104 of 187


DataStage Advanced Bootcamp -- Labs

8. View the data in the file (not using DataStage View Data). Notice that the
Distribution_Channel_Description column data has been turned to uppercase.
All other columns were just passed through untouched.

Create a Shared Container


1. Highlight the Copy and Transformer stages of your job. Click Edit>Construct
Container>Shared. Save your container named UpcaseField.
2. Close your job without saving it. ***NOTE: Don’t save your job! It was just used
to create the container. ***
3. Create a new parallel job named Metadata_Shared_Container. Check the Job
Properties and make sure that RCP is turned on for this job.
4. Drag your shared container to the canvas. This creates a reference to the
shared container, meaning that changes to the shared container will
automatically apply to any job that uses it.

© Copyright IBM Corporation 2010. Page 105 of 187


DataStage Advanced Bootcamp -- Labs

5. Click the right mouse button over the container and click Open.

6. Open up the Transformer and note that it applies the Upcase function to a
column named Distribution_Channel_Description. Close the Transformer and
the Container without saving it.
7. Add a source Sequential File stage, Copy stage, and a target Peek stage as
shown. Name the stages and links as shown.

© Copyright IBM Corporation 2010. Page 106 of 187


DataStage Advanced Bootcamp -- Labs

8. Edit the Items Sequential stage to read from the Items.txt sequential file. You
should already have a Table Definition, but if you don’t you can always import it.
9. Verify that you can view the data.
10. In the Copy stage, move all columns through. On the Columns tab, change the
name of the second column to Distribution_Channel_Description so that it
matches the column in the Shared Container Transformer that the Upcase
function is applied to.

11. Double-click on the Shared Container. On the Inputs tab, map the input link to
the Selling_Group_Mapping container link.

© Copyright IBM Corporation 2010. Page 107 of 187


DataStage Advanced Bootcamp -- Labs

12. On the Outputs tab, map the output link to the Selling_Group_Mapping_Copy
container link.

13. Compile and run your job.


14. Open up the Director log and find the Peek messages. Verify that the second
column of data has been changed to uppercase.

© Copyright IBM Corporation 2010. Page 108 of 187


DataStage Advanced Bootcamp -- Labs

Lab 14: Create an External Function Routine

In this task, you will create a function that checks for key words in a string that is passed
to it. It returns “Y” if it finds a key word, else it returns “N”.

1. In gedit or vi open the file named keyWords.cpp in the directory. This function
returns “Y” if it finds any of a list of words.

2. Compile your keyWords.cpp file into an object file by log in to the Information Server
system as “dsadm” and change to the /DS_Advanced directory:

g++ -c keyWords.cpp

© Copyright IBM Corporation 2010. Page 109 of 187


DataStage Advanced Bootcamp -- Labs

3. Verify that your directory contains the object file.

4. In DataStage, click your right mouse button over the Jobs folder and then click
New>Parallel Routine, then create a new External Function routine named
keyWords. Save it in the Jobs folder.

Create an Object type External function. Specify the return type (char *). Specify the
path to the object file.

5. On the Arguments tab, specify the input argument to your function. It should match
the type expected by the function you defined.

6. Save and close your External Function Routine.

© Copyright IBM Corporation 2010. Page 110 of 187


DataStage Advanced Bootcamp -- Labs

Use an External Function Routine in a Transformer stage


In this task, you create a simple job to test the use of your new function.

1. Create a new job named buildop_KeyWords.

2. Create a job parameter named inField that can be used to pass in a string value that
you can apply your routine to.

3. In the Row Generator, define a single column. (It can be anything you want. It won’t
be used.) On the Properties tab, specify that you want to generate a single row.

4. Define a VarChar output field named Result in your Transformer stage. Define a
derivation that returns “Key word found” or “Key word not found” in the Result field
depending on whether the key word was found in the input string. Also define a field
to store the input string from the job parameter.

© Copyright IBM Corporation 2010. Page 111 of 187


DataStage Advanced Bootcamp -- Labs

5. Run and test your job.

© Copyright IBM Corporation 2010. Page 112 of 187


DataStage Advanced Bootcamp -- Labs

Lab 15: Create a Wrapped Stage


In this exercise you create a simple wrapped stage that wraps the “ls /DS_Advanced”
command and then use it in a job.

Create a simple Wrapped stage


1. Manually create a Table Definition. Define one VarChar(5000) column. This
definition will be used to define the output interface from the Wrapped stage.

2. Click the right mouse button over a folder in the Repository and click
New>Other>Parallel Stage Type (Wrapped). On the General tab, enter the name
and command (the UNIX list files command): ls

© Copyright IBM Corporation 2010. Page 113 of 187


DataStage Advanced Bootcamp -- Labs

3. On the Wrapped>Interfaces>Output tab, select the Table Definition you created in an


earlier step. And select Stream=Standard Output.

4. On the Properties tab, define an optional property named “Dir” that is to be passed
the path to the directory to be listed. The Conversion type must be set to “Value
Only”, because we only want the value to be passed to the wrapper, not the property
name followed by the value.

5. Click Generate and then OK.

6. Create a new job named wrapperGenFileList. Add your new Wrapped stage with an
output link to a Sequential File stage.

© Copyright IBM Corporation 2010. Page 114 of 187


DataStage Advanced Bootcamp -- Labs

7. Open the Wrapped Stage. On the Output>Columns tab, load your Table Definition
that defines the output if it is not already there.

8. On the Stage>Properties tab, add the Dir property and then specify the directory
/DS_Advanced to be listed.

9. Edit the target Sequential File stage.

10. Compile and run. Examine the job log.

11. View the data in the output.

© Copyright IBM Corporation 2010. Page 115 of 187


DataStage Advanced Bootcamp -- Labs

Lab 16: Working with a Build Stage

Create a simple Build stage


Create a new Build stage named Total_Items_Amount that takes three input values (Qty,
Price, Tax_Rate) and calculates the total amount (Amount). This stage should satisfy
the following requirements:
• One input; one output. Create Table Definitions to define the input and output
columns.
• One property named Exchange that is used to multiply the Amount before it is written
out. Its default is 1. The Exchange rate can be used to calculate the results for
different currencies.
• All reads, writes, transfers are done automatically.
• If the input dataset contains additional column values (beyond Qty, Price, Tax_Rate),
these should be passed through unchanged.

1. Create and save a Table Definition named InRec_TIA defining the input.

2. Create and save a Table Definitions named OutRec_TIA defining the output.

3. Create a new Build stage named Total_Items_Amount.

4. On the Properties page, define a required property named Exchange. Its default is 1
and its Conversion type is the –Name Value type.

© Copyright IBM Corporation 2010. Page 116 of 187


DataStage Advanced Bootcamp -- Labs

5. On the Build>Interfaces Input tab, define the input. Call the port InRec. Specify Auto
Read. Select the input interface Table Definition you defined earlier.

6. On the Build>Interfaces Output tab, define the output. Call the port OutRec. Specify
Auto Write. Select the output interface Table Definition you defined earlier.

7. On the Transfer tab, define an auto transfer with no separate transfer (false).

8. On the Logic Definitions tab, define a variable named beforeTaxAmount. You will
use this to define the base amount before tax is added. Also define a variable
named tax to store the calculated tax.

© Copyright IBM Corporation 2010. Page 117 of 187


DataStage Advanced Bootcamp -- Labs

9. On the Per-Record tab, define the code that calculates the Amount. Be sure to
multiply the final result by the Exchange Property.

10. Click Generate. If the generation fails, fix any errors and then regenerate.

© Copyright IBM Corporation 2010. Page 118 of 187


DataStage Advanced Bootcamp -- Labs

Create a job that uses your Build stage


Create a new job named buildop_Total_Items_Amount that reads rows from
order_items.txt file and totals the amount for each row. The amount should be put into a
column named Amount which is an additional column on each row.

1. Import a Table Definition for the source file order_items.txt. The column names
should be as follows: OrderID, ItemNumber, Quantity, Price, TaxRate. Use float
type for Price and TaxRate.

2. Create a new job that reads the source file, passes the rows to your new Build stage,
and then write the rows to a Sequential File stage.

© Copyright IBM Corporation 2010. Page 119 of 187


DataStage Advanced Bootcamp -- Labs

3. Use the Copy stage to modify the input column names and types to match the input
columns expected by the Build stage.

4. In the Build stage, the output link should include all columns that are in the source
stage plus the Amount column.

5. Edit your target Sequential File stage.

© Copyright IBM Corporation 2010. Page 120 of 187


DataStage Advanced Bootcamp -- Labs

6. Run and test your job. Be sure to test your Exchange Property by trying out different
exchange rates.

© Copyright IBM Corporation 2010. Page 121 of 187


DataStage Advanced Bootcamp -- Labs

Lab 17: Performance Tuning

Use Job Monitor


1. Open up the sortForkJoin job. Save it as perfForkJoin.

2. Compile and run it.

3. In Director, click Tools > New Monitor open up a Monitor on the job.

4. Click the right mouse button over the window. Set or verify that the Monitor is
showing instances and percentage of CPU.

© Copyright IBM Corporation 2010. Page 122 of 187


DataStage Advanced Bootcamp -- Labs

5. Note these are the results when this job was run on a particular virtual machine.
Your results may differ significantly.

6. Expand all the folders. Notice the following:

• Correlate each stage in the job with the stages listed in the first column.

• Identify the different instances of each stage.

• Correlate the links listed with the stage links.

• Identify where the slowest processing (rows/sec) occurs.

7. Save the job sortForkJoin3 as perfForkJoin3. Compile and run it.

© Copyright IBM Corporation 2010. Page 123 of 187


DataStage Advanced Bootcamp -- Labs

8. Open a Job monitor and compare the performance results. Clearly, in this example
the performance has improved.

Use Performance Analysis tool


1. In Designer, open the sortForkJoin job and save it as perfForkJoin2.

© Copyright IBM Corporation 2010. Page 124 of 187


DataStage Advanced Bootcamp -- Labs

2. Open up the Job Properties window. Click the Execution tab. Select “Record job
performance data.”

3. Recompile and run your job.

4. Click the Performance Analysis icon in the toolbar.

5. Click Charts and then select Record Throughput.

© Copyright IBM Corporation 2010. Page 125 of 187


DataStage Advanced Bootcamp -- Labs

6. Click Stages and then de-select everything. One-by-one, select a stage and
examine its throughput. Shown here is the chart for the Aggregator Sort.

7. In a similar manner, select and examine other charts.

8. Now set up the job property of perForkJoin3 the same way and recompile and run.

9. Open the Performance Analysis tool and view the results. Compare the results with
the un-optimized version.

© Copyright IBM Corporation 2010. Page 126 of 187


DataStage Advanced Bootcamp -- Labs

Analyze the Performance of another Job


1. Open the runPerf job in Designer.

2. Open up Job Properties and click on the Execution tab. Select Record job
performance data (if it hasn’t been selected already).

3. Change the two Data Set target stages’ file property to write to the correct directory.

4. Compile and run your job. Verify in Director that it runs to successful completion.

5. Click on the Performance Analyzer icon in the toolbar.

© Copyright IBM Corporation 2010. Page 127 of 187


DataStage Advanced Bootcamp -- Labs

6. Open the Charts folder and select Job Timeline (the default chart).

7. Open the Partitions folder. Deselect one of the Partitions. Notice that the
corresponding tab disappears on the chart. Reselect the partition.

8. Open the Stages folder. Select just the first Generator, the Sort, and the RemDup
stages.

© Copyright IBM Corporation 2010. Page 128 of 187


DataStage Advanced Bootcamp -- Labs

9. Click on the black bars to the right of the stages to display the phases of each
process.

10. Open the Phases folder. Select just the runLocally phase.

© Copyright IBM Corporation 2010. Page 129 of 187


DataStage Advanced Bootcamp -- Labs

11. Open the Filters tab. Deselect each box one at a time and examine the effect on the
chart. Shown below is the effect of deselecting the Hide Startup Phases box.

12. Open up the Charts folder. Examine each chart in the Job Timing, Record
Throughput, CPU Utilization, Memory Utilization, and Machine Utilization folders.

© Copyright IBM Corporation 2010. Page 130 of 187


DataStage Advanced Bootcamp -- Labs

Lab 18: Process Header / Detail records in a Transformer

Build a job that processes the Header Detail file


In this task, you redesign your partCombineHeaderDetail2 job to add the header
information to the detail records in a Transformer stage using stage variables. This will
avoid the Join buffering of the records in each group.

1. Open your partCombineHeaderDetail2 job and save it as buffCombineHeaderDetail.

2. Modify the job as shown below. Move the output of the end of the Orders link to the
added Column Import stage. Remove the Join stage and its two input links and drag
the input side of the OrdersCombined DataSet Stage link to the Transformer stage.
Draw a link from the Column Importer stage to the transformer.

3. Edit the Column Import stage. On the Stage Advanced tab, set the stage to run
sequentially. This is necessary to preserve the ordering and groups of records going
into the Transformer stage.

4. On the Stage Properties tab, import the OrderNum and RecType columns. Set the
“Keep Import Column” property to True, so that the total record is also passed
through.

© Copyright IBM Corporation 2010. Page 131 of 187


DataStage Advanced Bootcamp -- Labs

5. On the Output Columns tab, specify the metadata for the imported columns. Make
sure the RecType field is VarChar(1) rather than Char(1); otherwise, it won’t import
correctly.

6. Edit the Transformer. On the Partitioning tab, specify Hash by OrderNum.

7. In the main window of the Transformer, define two stage variables to store the Name
and OrderDate from the Header records. To simplify the derivation of the OrderDate
field, define it as a VarChar(10) instead of a Date type. Define the derivations for
these Stage variables. Use the Field function to parse the columns from the Header
record. For the output record, only when it is a detail record. Also drag over the
RecIn column to help verify the results when you run the job.

8. Make sure the Transformer stage is set to Hash partitioning on OrderNum.

© Copyright IBM Corporation 2010. Page 132 of 187


DataStage Advanced Bootcamp -- Labs

9. Compile and run your job and verify the results.

© Copyright IBM Corporation 2010. Page 133 of 187


DataStage Advanced Bootcamp -- Labs

Lab 19: Exploring the Optimization Capabilities


In this lab you will learn to optimize a job by InfoSphere DataStage Balanced
Optimization. You will also understand how the optimizer operates on your root job
during the optimization process, how to analyze and compare the performances of root
and optimized jobs, and what the relationship between them is.

Creating an optimized version of a parallel job


In this task you will acquire familiarity with the Optimizer interface to optimize a job which
performs a conditional join between two source tables.

1. Browse the folder Jobs -> DataStage Advanced which contains the jobs you will use
for this lab.

2. Open the job Populate_Orders and edit the Row Generator stage “Orders_gen”, set
the Number of Records = 2,000,000 to generate into the table db2inst1.orders in the
SAMPLE database. The orders table will be used as a source table in the following
exercises.

3. Compile and run the job. Verify that the execution has completed successfully. You
should have now populated the ORDERS table.

© Copyright IBM Corporation 2010. Page 134 of 187


DataStage Advanced Bootcamp -- Labs

4. Open and explore the job JoinOrdEmp. This job performs a join between the orders
with AMOUNT>100 (filtered by a Transformer stage) and the employee who
managed each order. The result is then stored in a Data Set.

Note: both source tables belong to the same database (SAMPLE).

5. Compile and run the job. In the Director client verify that the execution has
completed successfully.

6. Select the Optimize button in the bar as shown to open the Optimizer interface.

© Copyright IBM Corporation 2010. Page 135 of 187


DataStage Advanced Bootcamp -- Labs

7. Select the option Push processing to database sources and press the Optimize
button. In this way the optimizer will attempt to push the processing of the
Transformer and Join stages into the source DB2 Connector, if possible.

8. Explore the Compare tab to see a comparison between the root job and the
optimized job. Notice that the two source DB2 Connector stages, the Join stage, and
the Transformer stage in the root job have been replaced by a single DB2 Connector
stage in the optimized job.

© Copyright IBM Corporation 2010. Page 136 of 187


DataStage Advanced Bootcamp -- Labs

9. Explore the Logs tab which contains the details about the changes made by the
optimizer in defining the optimized job. Looking at the messages you can
understand what exactly the optimizer has accomplished: the identification of the
patterns of stages suitable for optimization, the impact on partitioning and the query
definitions which allow pushing the processing, in this case, to the source database.

10. Save the optimized job by accepting the default proposed job name
Optimized1OfJoinOrdEmp. Close the Optimizer.

© Copyright IBM Corporation 2010. Page 137 of 187


DataStage Advanced Bootcamp -- Labs

11. Open the Optimized1OfJoinOrdEmp (if it is not already opened) and expand the DB2
Connector stage properties. Notice the select statement that optimizer has built to
define the same logic previously implemented by the two DB2 Connectors,
Transformer, and Join stages. For those of you who are SQL curious, simply copy
and paste the SQL statement to Notepad for more detail examination.

12. Compile and run the optimized job.

Comparing the performances between root and optimized job


In this task you will explore an approach to compare the optimized and root versions of a
job, from the performance and resource usage perspectives.

Note: the data appearing in the following analysis about such as figures for timing
measures, throughput, etc, might be different from the ones you will get during the
exercise. Follow in any case the procedure and adapt the results comparison to your
case.

1. Use the Director to compare the execution time of root versus optimized job. Notice
that pushing the operations implemented by the Join and Transformer in the root job
to the source database has achieved an improvement of the performance.

© Copyright IBM Corporation 2010. Page 138 of 187


DataStage Advanced Bootcamp -- Labs

2. Looking at job monitors for the two jobs (you can open both of them) you can
understand that the optimized job has processed fewer records than the root job.
This is because in the root job the ORDERS DB2 Connector retrieved 2 millions rows
from the database and filtered afterwards by the Transformer stage’s constraint. In
the optimized job the source DB2 Connector has retrieved directly only the records
respecting the SQL query which actually implements the root job’s transformer
constraint for ‘AMOUNT’.

© Copyright IBM Corporation 2010. Page 139 of 187


DataStage Advanced Bootcamp -- Labs

3. Refer also to the job logs in the Director to understand the different execution steps
performed by the two jobs. Compare the startup and production run times, which
help you in roughly understanding the elapsed time composition and the benefit you
can reach.

JoinOrdEmp

Optimization1OfJoinOrdEmp

4. To understand in more detail the behaviors of the root and optimized jobs, open the
Performance Analysis tool to compare their resources usage and the record
throughput.

© Copyright IBM Corporation 2010. Page 140 of 187


DataStage Advanced Bootcamp -- Labs

5. For the JoinOrdEmp job you can filter the stages to consider during the analysis
selecting only the ORDERS and EMPLOYEES DB2 Connector stages and the
output Data Set stage.

6. Now examine the Record Throughput Outputs for all the partitions. Crossing this
chart with the Director’s logs, notice that the output stage begins to have records
some seconds after the job starts. However your mouse over these lines to see
exactly when records start arriving.

© Copyright IBM Corporation 2010. Page 141 of 187


DataStage Advanced Bootcamp -- Labs

7. Repeating the same analysis on the Optimized1OfJoinOrdEmp job you can see that
the output stage has records from around the same time after the jobs starts, which
is comparable with the root job.

8. The significant difference between them is not how fast the jobs have records
available for the target loading, but their record throughputs. You can evaluate the
approximated slopes of the output stages’ throughput curves for both the charts,
considering all the partitions, to get comparable figures. Try then to justify the
comparison results.

- For JoinOrdEmp:

( 20,000[rows/sec in Part1] + 30,000[rows/sec in Part2] ) / 26 [sec] = 1920 [row/s2]

- For Optimized1OfJoinOrdEmp:

( 53,000[rows/sec in Part1] + 53,000[rows/sec in Part2] ) / 12 [sec] = 8800 [row/s2]

In the Optimized1OfJoinOrdEmp the data coming from the source DB2 Connector
stage have been already processed by the source database engine, while in the
JoinOrdEmp they must go through the Transformer and Join stage, hence the
resulting record throughputs could not be similar.

© Copyright IBM Corporation 2010. Page 142 of 187


DataStage Advanced Bootcamp -- Labs

9. Open the Memory Usage Density Page Ins for the two jobs and notice that the root
job is more memory intensive than its optimized version. Notice, in job JoinOrdEmp,
the maximum usage of memory for JoinOrdEmp (9000 pages) is mainly due to the
orders records processing.

JoinOrdEmp

© Copyright IBM Corporation 2010. Page 143 of 187


DataStage Advanced Bootcamp -- Labs

Optimized1OfJoinOrdEmp

10. Optional: perform a similar analysis considering the CPU and Disk utilizations.

Note: To perform a more detailed comparison between the root and optimized jobs,
or even to decide the best optimized version for a job, there are also other
parameters to consider: the degree of source/target database concurrency, the
amount of system resources available for DataStage and the source/target
databases, the number of records to process, the database tuning level, etc.

© Copyright IBM Corporation 2010. Page 144 of 187


DataStage Advanced Bootcamp -- Labs

Managing the root versus optimized jobs


In this lab, your goal is now to explore which are the optimized versions of a root job and
also the reverse operation, retrieving the root job for a certain optimized job. You can
find this information by leveraging the automatically maintained relationship between the
optimized versus root versions.

1. Locate the job JoinOrdEmp in the Repository Window, right-click and select Find
where used -> Jobs.

2. The Repository Advanced Find window appears and shows the


Optimized1OfJoinOrdEmp as a dependent job.

© Copyright IBM Corporation 2010. Page 145 of 187


DataStage Advanced Bootcamp -- Labs

3. To perform the reverse operation, exploring what is the root job for the
Optimized1OfJoinOrdEmp job, locate it in the Repository window and select Find
dependencies -> Jobs.

4. The Repository Advanced Find window appears and displays the jobs dependent on
Optimized1OfJoinOrdEmp, in this case the root job JoinOrdEmp.

© Copyright IBM Corporation 2010. Page 146 of 187


DataStage Advanced Bootcamp -- Labs

5. To remove the dependency between the optimized and root job, open the
Optimized1OfJoinOrdEmp in the Designer and select Edit -> Job Properties.

6. In the Dependencies tab, right click on the JoinOrdEmp entry and select Delete row.
The Optimized1OfOrdersReport in this way will loose its relationship with the root job
JoinOrdEmp.

Pushing the processing to the source and target databases


In this lab you will optimize a job by pushing the processing to the source and target
databases. This is a very common situation you may face when a parallel job reads and
loads one or more tables. You will need to consider the scenarios in which Source and
Target tables are in the same database or in different databases.

1. Create a copy of the JoinOrdEmp and save it as JoinOrdEmpTrg.

© Copyright IBM Corporation 2010. Page 147 of 187


DataStage Advanced Bootcamp -- Labs

2. Replace the target Data Set stage with a DB2 Connector stage.

3. Configure the ORDERSTRG stage properties as shown below.

4. Save and compile the job, you will execute it later.

© Copyright IBM Corporation 2010. Page 148 of 187


DataStage Advanced Bootcamp -- Labs

5. You can now optimize the job by distributing the processing to source and target
DB2 connector stages; then verify if that could be a convenient choice. Open the
Optimizer for the JoinOrdEmpTrg job, select the Push processing to database
sources and Push processing to database target options and press the Optimize
button.

Note: the source and target tables are all in the same SAMPLE database.

6. Save the job as Optimized1OfJoinOrdEmpTrg.

7. Another way in which this job can be optimized is pushing the entire processing to
the target database. This is possible because all the tables you are using in the root
job are in the same database. You want now to understand if this optimization
version performs better than the Optimized1OfJoinOrdEmpTrg.

8. Select the optimization options as follows and save the job as


Optimized2OfJoinOrdEmpTrg.

© Copyright IBM Corporation 2010. Page 149 of 187


DataStage Advanced Bootcamp -- Labs

9. Open and compare the two optimized jobs: Optimized1OfJoinOrdEmpTrg and


Optimized2OfJoinOrdEmpTrg.

10. Notice as in the job Optimized1OfJoinOrdEmpTrg, a part of the root’s job logic (the
Transformer constraint and the ORDERS DB2 Connector) has been implemented
within the source DB2 Connector.

© Copyright IBM Corporation 2010. Page 150 of 187


DataStage Advanced Bootcamp -- Labs

11. In Optimized1OfJoinOrdEmpTrg, the optimizer has implemented the Join stage’s


logic inside the target DB2 Connector as you can see from the produced query.

12. The Optimized2OfJoinOrdEmpTrg is based on a single DB2 Connector stage fed by


a Row Generator stage. The latter is not a real source of data, but a dummy stage
inserted by the optimizer not to have a single Connector stage parallel job, which is
not possible. Explore the target DB2 Connector stage and notice that it implements
the entire root’s job logic by a single query.

13. Execute the root job JoinOrdEmpTrg, Optimized1OfJoinOrdEmpTrg and


Optimized2OfJoinOrdEmpTrg, one at a time.

© Copyright IBM Corporation 2010. Page 151 of 187


DataStage Advanced Bootcamp -- Labs

14. Use the Director to compare their Elapsed Times and notice that the job with the
shortest execution time is Optimized2OfJoinOrdEmpTrg.

15. Following the same approach as seen for Lab1, you can use the Performance
Analysis tool to explain the differences between the performances of these three
jobs.

16. Notice that for the job Optimized2OfJoinOrdEmpTrg, actually no record was
processed by DataStage: all the operations have been performed by the target
database in response to the SQL statement pushed down by the job. The job
Optimized1OfJoinOrdEmpTrg processes all the rows (1327629 rows) selected by the
SQL query in the source DB2 Connector stage, and then passes them to the target
DB2 Connector stage as shown below.

© Copyright IBM Corporation 2010. Page 152 of 187


DataStage Advanced Bootcamp -- Labs

Optimized1OfJoinOrdEmpTrg

Optimized2OfJoinOrdEmpTrg

© Copyright IBM Corporation 2010. Page 153 of 187


DataStage Advanced Bootcamp -- Labs

17. Open the JoinOrdEmpTrg job and modify the target DB2 Connector stage properties,
setting QS as a target database, then save the job as JoinOrdEmpTrg2 and compile
it.

18. Open the Optimizer and notice that the Push all processing into the (target)
database is no longer available. This is because the source and target tables reside
on different databases, so the job cannot be built using a single DB2 Connector
stage as happened for Optimized2OfJoinOrdEmpTrg.

19. Optional: optimize the JoinOrdEmpTrg2 job and analyze the performances, using the
Push processing to database sources and Push processing to database
targets optimization options.

© Copyright IBM Corporation 2010. Page 154 of 187


DataStage Advanced Bootcamp -- Labs

Pushing data reduction to database target


When a parallel job performs data reduction operations, such as aggregations or filtering
that reduce the record volume moved from source to target, another possibility for job
optimization you have beside the ones you used in the previous labs is pushing the data
reduction processing to the target database. This could be particularly convenient when
the reduction is performed on data that is already located in the target database.

1. Open the Populate_Orders job and edit the Row Generator stage, setting the
Number of Records = 100,000 as the number of records to be generated into the
target table “ORDERS”. Then compile and run the job.

2. Open and explore the job SalesReport. This job calculates the total order Amount
for the record in the ORDERS table and loads the result into the TOTORD table.

Note: the source and target tables are in the same database (SAMPLE).

3. Compile and run the job.

© Copyright IBM Corporation 2010. Page 155 of 187


DataStage Advanced Bootcamp -- Labs

4. Considering that your job performs a data reduction on the input records from the
ORDERS table (100,000 rows) generating a single output row, and also considering
that both the tables are in the same database, you might try to push the data
reduction processing to the target database. To do that select the optimization
options Push processing to database targets and Push data reduction
processing to database targets and click on Optimize.

5. Select the Compare tab and notice that the two Transformer and the Aggregator
stages have been pushed to the target DB2 Connector stage, while the source DB2
connector appears to be the same as before the optimization.

6. Save the optimized job as Optimized1OfSalesReport.

© Copyright IBM Corporation 2010. Page 156 of 187


DataStage Advanced Bootcamp -- Labs

7. Open the target DB2 Connector stage and look at the insert SQL statement
generated by the optimizer.

8. Now you can compile and run the optimized job.

9. Compare the execution times, the performances and the system resources usage of
the root and the optimized jobs by the Director and the Performance Analysis tools
as you did for the previous labs.

Optimizing a complex job


In this lab you will practice the optimization process on a more complex job, built with
multiple stages and performing the parallel jobs’ typical operations: data transformations,
sorting, aggregations, and horizontal combinations. You will also experience the case in
which some of the stages cannot be considered by the optimizer.

You will refer the optimization to two main cases:

• Same database for source and target tables

• Different databases for source and target tables

Although the job design will be the same in both of these scenarios, you will see their
differences in terms of optimization options you can use and performance improvements
you can achieve.

You will also learn a way to explicitly condition the optimization process, excluding one
or more stages from the optimization patterns.

© Copyright IBM Corporation 2010. Page 157 of 187


DataStage Advanced Bootcamp -- Labs

Scenario 1: common database for source and target tables

1. Open the Populate_Orders and verify that the Number of Records = 100,000 to be
generated into the target table “ORDERS”. Then compile and run the job in case
currently you don’t have such a number of records in the ORDERS table.

2. Open the PopulateEmployees job and set Number of Records = 1,000,000 to be


generated into the target table “EMPLOYEES”. Then compile and run the job.

Note: if at any moment you need to reload the original 10 records into the
“EMPLOYEES” table, you can simply compile and run the RestoreEmployees job.

© Copyright IBM Corporation 2010. Page 158 of 187


DataStage Advanced Bootcamp -- Labs

3. Open the job OrdersReport and analyze the logic implemented by each stage. This
job calculates, for each order in the table ORDERS, the total amount of orders
summarized by employee and year. The aggregated values are then inserted into
the target table ORDER_REPORT in which the Employee ID code is replaced by
his/her first name and last name by a lookup operation.

4. Compile and run the job.

5. Analyzing the job you can notice that the first two stages following the source DB2
Connector stage respect the Balanced Optimization requirements (the Copy stage’s
multiple output on the contrary are not supported), so as a possible attempt of
optimization you can consider pushing the processing towards the source database.
Open the Optimizer and check only the Push processing to database sources
option. Then press the Optimize button.

© Copyright IBM Corporation 2010. Page 159 of 187


DataStage Advanced Bootcamp -- Labs

6. Open the compare tab and notice that only the Transformer and Sort_1 stages have
been pushed to the source database. The processing logic implemented by the fork
join structure (i.e. Copy, Aggregator and Join stages) could not be pushed to the
source and it has not been changed.

© Copyright IBM Corporation 2010. Page 160 of 187


DataStage Advanced Bootcamp -- Labs

7. Explore the Logs tab and notice the WARNING messages. Notice the second and
third messages which explain why the stages composing the fork join structure have
not been optimized.

8. Save the job as Optimized1OfOrdersReportSrc and open the source DB2 Connector
stage to see how the optimizer has converted the logic originally defined by the
Transformer and Sort_1 stages into a single SQL query.

9. Compile and run the job.

© Copyright IBM Corporation 2010. Page 161 of 187


DataStage Advanced Bootcamp -- Labs

10. As a second attempt of optimization, you may choose to push the processing
towards the target database. Open again the optimizer for the OrdersReport job.
This time select the Push processing to database targets option, then press the
Optimize button.

11. Browse the Compare tab and notice that only the target side stages (the Lookup
stage and the last Transformer stage) have been pushed to the target database.
Save the job as Optimized1OfOrdersReportTrg. Close the optimizer window.

© Copyright IBM Corporation 2010. Page 162 of 187


DataStage Advanced Bootcamp -- Labs

12. Open the target DB2 Connector stage to analyze the SQL query defined by the
optimizer, which implements the Lookup and Transformer2 root stages’ logic.

13. Compile and run the job.

14. Open again the optimizer and select both the Push processing to database
sources and the Push processing to database targets options.

© Copyright IBM Corporation 2010. Page 163 of 187


DataStage Advanced Bootcamp -- Labs

15. Compare the original and optimized version and notice that the only part not pushed
to the database is the fork join, and this version is a composition of the two previous
optimizations.

© Copyright IBM Corporation 2010. Page 164 of 187


DataStage Advanced Bootcamp -- Labs

16. Save the job as Optimized1OfOrdersReport and analyze the SQL generated in the
source and target DB2 connectors. Notice also that the fork join structure could not
be optimized for the same reason you have faced previously.

17. Compile and run the job.

© Copyright IBM Corporation 2010. Page 165 of 187


DataStage Advanced Bootcamp -- Labs

18. As you learned during Lab2, if source and target tables are on the same database,
the best optimization could be achieved pushing all the processing to the target
database. You can try to apply the same to the OrdersReport job as shown below.

19. Despite you have tried to push all the processing to the target database, the
optimizer has ignored that option. In fact you don’t see a single DB2 Connector
stage fed by a Row Generator as in Lab2, but the optimized job is exactly the same
as Optimized1OfOrdersReport. This is again due to the fork join structure that
prevents the possibility of full optimization.

20. Now you can compare the execution times, the performances and the system
resources usage of the root and the optimized jobs by the Director and the
Performance Analysis tools as you did for Lab1.

© Copyright IBM Corporation 2010. Page 166 of 187


DataStage Advanced Bootcamp -- Labs

Scenario 2: different databases for source and target tables

1. Open and explore the OrdersReportTargetDB job. This job is similar to the
OrdersReport job, but the source and target tables are in two different databases as
you can see exploring the source and target DB2 Connector stages.

2. Compile and run the job.

3. You can now optimize the job using the same approach you followed for the
OrdersReport job: generating different versions of the root job based on different
optimization options. Comparing their performances and resources usage to
determine which optimized option is more appropriate to match your requirements.
Open the optimizer and select the Push processing to database sources, then
save it as Optimized1OrdersReportTargetDB.

4. Compile and run the job.

5. Open again the optimizer for the OrdersReportTargetDB and select Push
processing to database targets, then save the optimized job as
Optimized2OrdersReportTargetDB.

6. In the Logs tabs notice that the tables EMPLOYEES and ORDER_REPORT cannot
be part of the same optimization pattern as happened for the OrderReport job
because now they reside in different databases.

© Copyright IBM Corporation 2010. Page 167 of 187


DataStage Advanced Bootcamp -- Labs

7. In job Optimized2OrdersReportTargetDB, for the reason just explained, the lookup


operation cannot be pushed to the target database as happened for the
Optimized1OfOrdersReportTrg job.

8. Open the optimizer, select the both the Push processing to database sources and
Push processing to database targets options.

9. Save the optimized job as Optimized3OfOrdersReportTargetDB and analyze it.

10. Compile and run the job.

11. When the source and target tables are on different databases, another possibility you
way want to consider is the Bulk Loading optimization option. In this way the target
DB2 Connector will first bulk load a temporary staging table created during the job
execution in the target database. Then SQL statements will load the actual target
table reading from the temporary staging table so any transformation will occur
directly in the target database after the high-performance bulk loading process.

© Copyright IBM Corporation 2010. Page 168 of 187


DataStage Advanced Bootcamp -- Labs

12. Open the optimizer and select the Push processing to database sources, Push
processing to database targets, and Use bulk loading of target tables options.

13. Save the optimized job as Optimized4OfOrdersReportTargetDB and analyze it.


Notice in the target DB2 Connector stage, the Bulk load write mode and the staging
table defined by the optimizer.

© Copyright IBM Corporation 2010. Page 169 of 187


DataStage Advanced Bootcamp -- Labs

14. Notice also the Before/After SQL statement that will be used to load the actual target
table by using the bulk loaded staging table as a source.

15. Enable the Auto commit mode option for the target DB2 Connector stage to allow the
database to commit the transactions automatically.

16. Compile and run the job.

17. Now you can compare the execution times, the performances and the system
resources usage of the root and the optimized jobs by the Director and the
Performance Analysis tools as you did for Lab1.

18. Notice that in this scenario the Optimized4OfOrdersReportTargetDB, which uses the
bulk load option for the target database, does not perform better than the other
optimized versions. In fact the Optimized3OfOrdersReportTargetDB is the fastest
optimization.

© Copyright IBM Corporation 2010. Page 170 of 187


DataStage Advanced Bootcamp -- Labs

19. Using the Performance Analysis tool, compare the performances of the
Optimized3OfOrdersReportTargetDB job versus the Optimized1OfOrdersReport job
which have been generated using the same optimization option. Try to understand
what the reasons are of their elapsed time differences.

Tip. Look at the Record Throughput and compare the lookup stage elapsed time for
OrdersReportTargetDB and the target DB2 Connector stage for OrdersReport.

Optimized3OfOrdersReportTargetDB

Optimized1OfOrdersReport

© Copyright IBM Corporation 2010. Page 171 of 187


DataStage Advanced Bootcamp -- Labs

Deciding where to stop the optimization process


1. Open the job OrdersReport and open the Optimizer.

2. Select the both the Push processing to database sources and Push processing
to database targets options.

3. You may now optimize the job, forcing the sort operation to be executed by
DataStage, instead of pushing it into the database. To explicitly exclude the sort
stage from the optimization, select the “Advanced Options” tab and set the value
Sort_1 for the property Name of a stage where optimization should stop and
press the Optimize button.

4. Notice that the optimizer has not considered the Sort_1 stage.

© Copyright IBM Corporation 2010. Page 172 of 187


DataStage Advanced Bootcamp -- Labs

Balancing between Database and DataStage engines


In the exercises you have done so far, pushing the processing logic to the source and/or
target databases achieved performance improvement. However, depending on the type
and amount of processing, optimizing a job often means trade-off between DataStage
processing and database processing in order to leverage the best from both types.

In this lab you will see a job that performs better when the processing is done entirely by
the DataStage engine rather than by the database engine.

1. Open the job Populate_Orders and edit the Row Generator stage to set the Number
of Records = 2,000,000.

2. Compile and run the job to populate the table ORDERS in the SAMPLE database.

© Copyright IBM Corporation 2010. Page 173 of 187


DataStage Advanced Bootcamp -- Labs

3. Open the LoadProcessing job and analyze it. Notice that the Transformer stage
implements conversion functions and decision logic for some of the output
derivations.

4. Compile and run the job.

5. Open the Optimizer and check the Push processing to database sources option.
Then press the Optimize button and save the optimized job as
Optimized1OfLoadProcessing.

© Copyright IBM Corporation 2010. Page 174 of 187


DataStage Advanced Bootcamp -- Labs

6. Open the Optimized1OfLoadProcessing job and notice how the logic originally
implemented by the Transformer stage has been converted into a single SQL
statement in the source DB2 Connector stage.

7. Compile and run the optimized job.

8. Compare the execution times, the performances and the system resources usage of
the root and the optimized jobs by the Director and the Performance Analysis tools
as you did for the previous labs. Notice that the optimized job is slower than the root
job.

9. Notice the Percent CPU Utilization charts. The LoadProcessing requires significant
CPU activity when the Transformer stage starts processing the records after they are
made available by the source DB2 Connector stage (refer to the Percent of time In
CPU chart), while the Optimized1OfLoadProcessing starts processing the records
when the source DB2 Connector connects to the database. The top levels of CPU
usage by the two jobs are comparable; however, looking at the Throughput charts
you can see that the LoadProcessing job performs faster.

Note: in some of the following pictures only the data is about one Partition only.
When you do these analyses you should consider all the partitions.

© Copyright IBM Corporation 2010. Page 175 of 187


DataStage Advanced Bootcamp -- Labs

LoadProcessing

© Copyright IBM Corporation 2010. Page 176 of 187


DataStage Advanced Bootcamp -- Labs

Optimized1OfLoadProcessing

© Copyright IBM Corporation 2010. Page 177 of 187


DataStage Advanced Bootcamp -- Labs

Lab 20: Repository Functions

Execute a Quick Find


1. Open Quick Find by clicking the link at the top of the Repository window.

2. In the Name to find box type sort* and in the Types to find list select Parallel Jobs.

3. Click the Find button.

4. The first found item will be highlighted.

5. Click Next to highlight the next item.

Execute an Advanced Find


1. Click on the link that displays the number of matches. This opens the Advanced Find
window and displays the items found so far in the right pane.

2. Open the Last modification folder. Specify objects modified within the last week.

© Copyright IBM Corporation 2010. Page 178 of 187


DataStage Advanced Bootcamp -- Labs

3. Open up the Where Used folder. Add the SUPER_PRODDIM Table Definition.
Change Name to find to an asterisk (*). Click Find. This reduces the list of found
items to those that use this Table Definition.

4. Close the Advanced Find window.

Generate a report
1. Click the number of matches to get the search result window again. Click File >
Generate Report to open a window from which you can generate a report describing
the results of your Find.

© Copyright IBM Corporation 2010. Page 179 of 187


DataStage Advanced Bootcamp -- Labs

2. Click on the top link to view the report. This report is saved in the Repository where
it can be viewed by logging onto the Reporting Console.

3. After closing this window, click on the Reporting Console link. On the Reporting tab,
expand the Reports folder as shown. Click View Reports.

© Copyright IBM Corporation 2010. Page 180 of 187


DataStage Advanced Bootcamp -- Labs

4. Select your report and then click View Report Result. This displays the report you
viewed earlier from Designer. By default, a Suite user only has permission to view
the report. A Suite administrator can give additional administrative functions to a
Suite user, including the ability to alter report properties, such as format.

5. Close all windows and then close the Quick Find.

Perform an impact analysis


1. In the Repository window, select your SUPER_STOREDIM Table Definition. Click
the right mouse button and then select Find Where Used > All Types.

© Copyright IBM Corporation 2010. Page 181 of 187


DataStage Advanced Bootcamp -- Labs

2. Click the right mouse button over the ForkJoin job listed and then click “Show
dependency path to…”

© Copyright IBM Corporation 2010. Page 182 of 187


DataStage Advanced Bootcamp -- Labs

3. Use the Zoom button to adjust the size of the dependency path so that it fits into the
window.

4. Hold right mouse button over a graphical object and move the path around.

5. Notice the “birds-eye” view box in the lower right hand corner. This shows how the
path is situated on the canvas. You can move the path around by clicking to one side
of the image in the birds-eye view window and by holding the right mouse button
down over the image and moving the image around.

6. Close the window.

Find the differences between two jobs


1. Open your CreateSeqJobPartiton job. Save it as CreateSeqJobPartitonComp.

2. Make the following changes to the CreateSeqJobPartitonComp job.

3. Open up the Selling_Group_Mapping Sequential File stage. On the Columns tab,


change the length of the first column (Selling_Group_Code) to 111. On the
Properties tab, change the First Line is Column Names to False.

4. Change the name of the output link from the Copy stage to TF (from TargetFile).

© Copyright IBM Corporation 2010. Page 183 of 187


DataStage Advanced Bootcamp -- Labs

5. Save the changes to your job.

6. Open up both the CreateSeqJobPartiton and the CreateSeqJobPartitonComp jobs.


Click Tile from the Window menu to display both jobs in a tiled manner.

© Copyright IBM Corporation 2010. Page 184 of 187


DataStage Advanced Bootcamp -- Labs

7. Right-click over your CreateSeqJobPartitonComp job name in the Repository window


and then select Compare against.

8. In the Compare window select your CreateSeqJobPartiton job on the Item Selection
window.

© Copyright IBM Corporation 2010. Page 185 of 187


DataStage Advanced Bootcamp -- Labs

9. Click OK to display the Comparison Results window.

10. Click on firstLineColumnNames in the report. Notice that the stage is opened to the
properties tab when the change was.

Find the differences between two Table Definitions


1. Create a copy of your Warehouse.txt Table Definition.

2. Make the following changes to the copy.

3. On the General tab, change the short description to your name.

4. On the Columns tab change the name of the Item column to ITEM_ZZZ. And
change its type and length to Char(33).

5. Click OK.

6. Right-click over your Table Definition copy and then select Compare Against.

7. In the Comparison window select your Warehouse.txt

© Copyright IBM Corporation 2010. Page 186 of 187


DataStage Advanced Bootcamp -- Labs

8. Click OK to display the Comparison Results window.

© Copyright IBM Corporation 2010. Page 187 of 187

You might also like