DataStage Essentials v8.


DataStage Essentials v8.1 Core Modules

© Copyright IBM Corporation 2009 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.




DataStage Essentials v8.1

Copyright, Disclaimer of Warranties and Limitation of Liability
© Copyright IBM Corporation 2009 IBM Software Group One Rogers Street Cambridge, MA 02142 All rights reserved. Printed in the United States. IBM and the IBM logo are registered trademarks of International Business Machines Corporation. The following are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both:

AnswersOnLine AIX APPN AS/400 BookMaster C-ISAM Client SDK Cloudscape Connection Services Database Architecture DataBlade DataJoiner DataPropagator DB2 DB2 Connect DB2 Extenders DB2 Universal Database Distributed Database Distributed Relational DPI DRDA DynamicScalableArchitecture DynamicServer DynamicServer.2000 DynamicServer with Advanced DecisionSupportOption DynamicServer with Extended ParallelOption DynamicServer with UniversalDataOption DynamicServer with WebIntegrationOption

DynamicServer, WorkgroupEdition RedBrick Decision Server Enterprise Storage Server RedBrickMineBuilder FFST/2 RedBrickDecisionscape Foundation.2000 RedBrickReady Illustra RedBrickSystems Informix RelyonRedBrick Informix4GL S/390 InformixExtendedParallelServer Sequent InformixInternet Foundation.2000 SP Informix RedBrick Decision Server System View J/Foundation Tivoli MaxConnect TME MVS UniData MVS/ESA UniData&Design Net.Data UniversalDataWarehouseBlueprint NUMA-Q UniversalDatabaseComponents ON-Bar UniversalWebConnect OnLineDynamicServer UniVerse OS/2 VirtualTableInterface OS/2 WARP Visionary OS/390 VisualAge OS/400 WebIntegrationSuite PTX WebSphere QBIC QMF RAMAC RedBrickDesign RedBrickDataMine

Microsoft, Windows, Window NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. Java, JDBC, and all Java-based trademarks are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. UNIX is a registered trademark of The Open Group in the United States and other countries. All other product or brand names may be trademarks of their respective companies. All information contained in this document has not been submitted to any formal IBM test and is distributed on an “as is” basis without any warranty either express or implied. The use of this information or the implementation of any of these techniques is a customer responsibility and depends on the customer’s ability to evaluate and integrate them into the customer’s operational environment. While each item may have been reviewed by IBM for accuracy in a specific situation, there is no guarantee that the same or similar results will result elsewhere. Customers attempting to adapt these techniques to their own environments do so at their own risk. The original repository material for this course has been certified as being Year 2000 compliant. This document may not be reproduced in whole or in part without the priori written permission of IBM. Note to U.S. Government Users – Documentation related to restricted rights – Use, duplication, or disclosure is subject to restrictions set forth in GSA ADP Schedule Contract with IBM Corp.



DataStage Essentials v8.1

Table of Contents
Table of Contents .......................................................................................................................... 3 Lab 01: Lab Introduction............................................................................................................ 4 Lab 02: Installation and Deployment......................................................................................... 5 Lab 03: Administering DataStage .............................................................................................. 7 Lab 04: DataStage Designer...................................................................................................... 17 Lab 05: Creating Parallel Jobs ................................................................................................. 24 Lab 06: Accessing Sequential Data .......................................................................................... 33 Working with Data Sets ........................................................................................................... 39 Reading and Writing NULLs................................................................................................... 42 Lab 07: Platform Architecture ................................................................................................. 48 Lab 08: Combining data............................................................................................................ 51 Range lookups .......................................................................................................................... 60 Combining data using the Funnel Stage ................................................................................ 66 Lab 09: Sorting and Aggregating Data.................................................................................... 68 Lab 10: Transforming Data ...................................................................................................... 74 Lab 11: Repository Functions................................................................................................... 83 Job and Table Difference Reports........................................................................................... 90 Lab 13: Metadata in the Parallel Framework....................................................................... 117 Lab 14: Job Control................................................................................................................ 125



04/01/2009 4 . To perform the task. Each exercise consists of a series of tasks.DataStage Essentials v8. follow the steps listed below it.1 Lab 01: Lab Introduction There are no tasks associated with this lab. Each module has one or more exercises associated with it.

Then enter a Suite Administrator user ID and password.1 Lab 02: Installation and Deployment Assumptions • • IBM Information Server is installed and running. 04/01/2009 5 .DataStage Essentials v8.) ID: ______________________ Password: ________________ ID: ______________________ Password: ________________ Windows XP Information Server Suite Administrator Task: Log onto the Information Server Administrative console 1. (Default is Windows XP uses Administrator/inf0server & Information Server Suite Administrator uses isadmin/inf0server. From the IBM Information Server menu select InfoSphere Information Server Web Console. Ask your instructor and record it here.

1 2. If you see the following window.DataStage Essentials v8. Information Server is up and running. 04/01/2009 6 . Click Login.

04/01/2009 7 .DataStage Essentials v8. log onto the IBM Information Server Web Console. From the IBM Information Server menu select InfoSphere Information Server Web Console. If you have not done so already.1 Lab 03: Administering DataStage Assumptions • You have obtained the Suite Administrator user ID and password. Task: Open the Administration Console 1. Then enter a Suite Administrator user ID and password.

04/01/2009 8 . Task: Create a DataStage administrator and user 1.DataStage Essentials v8. Click DataStage Credentials. Click the Administration tab. Expand the Domain Management folder.1 2.

Expand the Users and Groups folder and then click Users. is displayed. the Information Server Suite Administrator user ID. no user ID will be able to login to any DataStage client tool. Otherwise. type a valid user ID and password on the DataStage Server machine and has Administrator privilege on that machine is already filled in. isadmin. Below Administrator / inf0server has been entered. DO NOT CHANGE THIS.1 2. 3. is displayed.DataStage Essentials v8. Also the InfoSphere Application Server administrator user ID. Here. In the Default DataStage and QualityStage Credentials boxes. Select the isadmin user and then click Open. *** Important: If you are using the VMWare image that comes with this course. 4. 04/01/2009 9 . Select the DataStage Server and then click Open Configuration. wasadmin.

7. Return to the Users main window. 8. Expand the Suite Component. Use dsadmin for the first and last names and password as well. 04/01/2009 10 . Assign Suite User role and DataStage Administrator Suite Component role to this user. Create a new user ID named dsadmin. Note what Suite Roles and Product Roles that have been assigned to this user. Note the first and last names of this user.DataStage Essentials v8. Click New to create a new user. Scroll down to click Save and Close.1 5. 6.

Following the same procedure. Click Save and Close. 10. create an additional user named dsuser.1 9. 04/01/2009 11 . Open the DataStage Administrator Client icon on your Windows desktop.DataStage Essentials v8. Assign Suite User and DataStage User roles to dsuser (password is dsuser). Task: Log onto DataStage Administrator 1.

DataStage Essentials v8.1 2. Specify your domain name, followed by a colon, followed by the port number (9080) you are using to connect to the domain application server. Type dsadmin in the User name and Password boxes. Select your DataStage server.

3. Click OK.

TASK: Specify property values in DataStage Administrator
1. Click the Projects tab. Select your project (ask your instructor if there are more than one project) and then click the Properties button.



DataStage Essentials v8.1 2. On the General tab, check Auto-purge of the job log, up to the two previous job runs.

3. Click the Environment button to open up the Environment variables window. In the Parallel folder, examine the APT_CONFIG_FILE parameter and its default. (The configuration file is discussed in a later module.



DataStage Essentials v8.1 4. In the Reporting folder, set the variables shown below to true. These are APT_DUMP_SCORE, APT_ MSG_FILELINE, APT_RECORD_COUNTS, OSH_DUMP, OSH_ECHO, OSH_EXPLAIN, and OSH_PRINT_SCHEMAS.

5. Click OK. 6. On the Parallel tab, check the box to make the generated OSH visible. Note the default date and time formats. For example, the default date format is “YYYY-MM-DD”, which is expressed by the format string shown. 7. On the Sequence tab, check the box to add checkpoints so a job sequence is restartable on failure.



Notice that isadmin and dsadmin already exist as DataStage Administrators. Select dsuser and then click Add. does not receive permission to develop within a specified DataStage project unless a DataStage Administrator explicitly gives permission. dsuser. 04/01/2009 15 . DataStage administrators have full developer and administrator permissions in all DataStage projects. Notice that dsuser is available to be added. 2. Click the Permissions tab.1 Task: Set DataStage permissions and defaults 1. This is because they were assigned the DataStage Suite Component Administrator role in the Information Server Administration console. Click Add User or Group.DataStage Essentials v8. On the other hand.

6. Close DataStage Administrator.DataStage Essentials v8.1 3. Close down DataStage Administrator. Select your project and then click Properties. select the DataStage Developer role. 4. Select dsuser. 7. Log back onto DataStage Administrator using the dsuser ID. 5. Notice that the Permissions tab is disabled. Click OK to return to the Permissions tab. This is because dsuser has not been assigned the DataStage Administrator role and therefore does not have the authority to set DataStage permissions. 04/01/2009 16 . In the User Role box.

2.1 Lab 04: DataStage Designer Assumptions • • You have created a user ID named dsuser for logging onto DataStage. create two folders: Jobs and Metadata. Create a Repository folder 1. Ask the instructor for the complete path to the ISFiles folder (default is C:\Class_files\DX444\lab\ISFiles Task: Log onto DataStage Designer 1. and then click New>Folder. click your right mouse button. Select your project folder in the Repository. 3. Open the DataStage Designer Client icon on the Windows desktop. Type information to log into your DataStage project using the dsuser ID. 04/01/2009 17 .DataStage Essentials v8. Click OK. Under it. Create a folder named “_Training”.

dsx file in your ISFiles>DsxFiles directory on your client machine. 3. select the TableDefs. Click OK.1 2. 2. Select the Import selected button. 4. Click Import>DataStage Components.DataStage Essentials v8. Click Repository>Refresh to refresh the Repository (which moves the folder you created to the top). In the Import from file box. Task: Import DataStage component files 1. 04/01/2009 18 .

In the Export to file box. You will find it in the _Training>Metadata folder. 04/01/2009 19 . click your right mouse button. note the Data source type. Note the column definitions and their types on the Columns tab. 1 2 On the General tab. and Table/file name. Task: Backup your project In this task. Data source name.1 5. 6. select your ISFiles>DsxFiles folder and specify a file named Training. Select your _Training folder. Select the table definition and then click OK. 1. 2.dsx.DataStage Essentials v8. Open up the table definition you’ve imported and examine it.dsx. and then click Export. you backup (export) your _Training folder into a file named Training.

Task: Import a table definition from a sequential file 1. In Designer. 04/01/2009 20 . Some questions to consider: • • • Is the first row column names? Are the columns delimited or fixed-width? How many columns? What types are they? 2.DataStage Essentials v8. click Import>Table Definitions>Sequential File Definitions. In a text editor.1 3. Click Export.txt file in your ISFiles directory and examine its format and contents. open up the Selling_Group_Mapping.

5. Click Import. Select your ISFiles directory.txt file.DataStage Essentials v8. Specify \_Training\Metadata as the “To category”.1 3. 4. Select your Selling_Group_Mapping. 6. 04/01/2009 21 .

if this is the case. 8. Click Preview to view the data in your file in the specified format. This is a check whether you have defined the format correctly. Be sure to specify that the first line is column names. since you didn’t change anything. Then DataStage can use these names in the column definitions.1 7. If it looks like a mess. 04/01/2009 22 . In this case. there would be no effect.DataStage Essentials v8. you haven’t correctly specified the format. Specify the general format on the Format tab.

1 9. locate and examine your new table definition in the Repository window. Click OK. 10. 04/01/2009 23 . 11. Click the Define tab to examine the column definitions.DataStage Essentials v8. to import your table definition. After closing the import window.

one encloses all the nodes. Save and close.1 Lab 05: Creating Parallel Jobs Assumptions • None Task: Create a two-node configuration file The lab exercises that follow in this and later modules are more instructive when the jobs are run using a two-node (or more) configuration file. one encloses the node1 definitions. select the default configuration. When done. and one encloses node2 definitions. your file should now look like this: 4. 1. 3. Configuration files are discussed in more detail in a later module.DataStage Essentials v8. Click Tools>Configurations. Be careful you only have a total of 3 pairs of the curly brackets. make a copy of node through the curly braces and change the name of the node to “node2”. 2. If only one node is listed. 04/01/2009 24 . In the Configurations box.

DataStage Essentials v8. Open up the Row Generator stage to the Columns tab. Change the names of the stages and links as shown. Click the Load button to load the column definitions from the Employees Table definition you imported in an earlier lab. Open a new Parallel job and save it under the name GenDataJob. Draw a link from the Row Generator stage to the Peek stage. Save it your _Training>Jobs folder.1 Task: Create a parallel job 1. 4. Add a Row Generator stage and a Peek stage. 04/01/2009 25 . 3. 2.

04/01/2009 26 .DataStage Essentials v8. Verify your column definitions with the following.1 5.

and monitor the job 1. Compile your job. Task: Compile. run. On the Properties tab specify the 100 rows are to be generated.1 6. Click View Data to view the data that will be generated. 7. 04/01/2009 27 .DataStage Essentials v8.

There should be no warnings (yellow) or errors (red).1 2. double-click on the messages to examine their contents. Fix the problem and then recompile and run. Scroll through the messages in the log. Double click on one of these to open the message window.” the label on your Peek stage. 5. 4. Select or verify that “Show performance statistics” is enabled. Run your job. In the Director Status window select your job. 04/01/2009 28 . 8. 7. Notice that there are one or more log messages starting with “PeekEmployees. 6. 3.DataStage Essentials v8. Move to the job log. If there are. Move to Director from within Designer by clicking Tools>Run Director. Click your right mouse button over an empty part of the canvas.

3. 04/01/2009 29 . Double-click on the row number to the left of the first column name. Specify the Extended Properties as shown. 2. your choice. Save your job as GenDataJobAlgor in your _Training>Jobs folder.DataStage Essentials v8. 4. For the HireDate column. specify that you want the dates generated randomly. 5. For the Name column specify that you want to cycle through 3 names.1 TASK: Specify Extended Properties 1. Open up the Row Generator stage to the Columns tab.

DataStage Essentials v8. TASK: Document your job 1. Add an Annotation stage to your job diagram that describes what your job does. 04/01/2009 30 . Open up the Annotation stage and choose another background color. Click View data to see the data that will be generated.1 6.

2. Compile and run your job. In the Job Properties window. Open up the Job Properties window.1 2. also briefly describe what your job does. 3. Save your job as GenDataJobParam in your _Training>Jobs folder. Define a new parameter named NumRows with a default value of 10. Verify the data by examining the Peek stage messages in the log. Its type is Integer. Fix any warnings or errors. 04/01/2009 31 . Click on the Parameters tab. short description. 4. 3. View the messages in the Director log.DataStage Essentials v8. Task: Add a job parameter 1.

6.dsx file. Compile and run your job. Task: Backup your project 1. Verify the results. 2. 5. Open up the Properties tab of the RowGenerator stage in your job.DataStage Essentials v8.1 4. Select your _Training folder. Use your NumRows job parameter to specify the number of rows to generate. 04/01/2009 32 . Export the contents of your folder to your Training. View the data.

a Copy stage. Save it into your _Training>Jobs folder. Name the stages and links as shown. copies it through a Copy stage. Here be sure to set the First Line is Column Names to True. 04/01/2009 33 . and then writes the data to a new file named Selling_Group_Mapping_Copy. Open a new Parallel job and save it under the name CreateSeqJob. load the format and column definitions from the Selling_Group_Mapping. On the Properties tab specify the file to read and other relevant properties. 2. Add a Sequential stage. If you don’t your job will have trouble reading the first row and issue a warning message in the Director log.txt file.txt. In the Sequential source stage Columns and Format tabs. Task: Read and write to a sequential file In this task.DataStage Essentials v8. 4.1 Lab 06: Accessing Sequential Data Assumptions: • No new assumptions. 1. and a second Sequential stage.txt table definition you imported in a previous exercise. 3. Draw links. you design a job that reads data from the Selling_Group_Mapping.

Compile and run your job.DataStage Essentials v8.1 5. 04/01/2009 34 . 6. drag the columns across from the source to the target. In the Copy stage Output>Mapping tab. Create it with a first line of column names. Name the file the Selling_Group_Mapping_Copy. create a delimited file with the comma as the delimiter. Click View data to view the data to verify that the metadata has been specified properly.txt and write to your ISFiles>Temp directory. In the target Sequential stage. It should overwrite any existing file with the same name. 7. 8.

. In Director. Fix any errors. view the job log. Task: Create and use a job parameter 1. e.1 9. 04/01/2009 35 . Rename the last link and Sequential File stage to “TargetFile”. Open up the job properties window. 3.DataStage Essentials v8.txt. define a job parameter named TargetFile of type string. 2. Create an appropriate default filename. Save your CreateSeqJob job as CreateSeqJobParam. On the parameters tab.g. TargetFile.

Give appropriate names to these new stages and links.DataStage Essentials v8. Retain the rest of your file path. replace the name of your file by your job parameter. Open up your target stage to the Properties tab.1 4. Add a second link (which will automatically become a reject link) from the source Sequential File stage to a Peek stage. Select the File Property. In the File text box. Also add an output link from the target Sequential File stage to a Peek stage. 04/01/2009 36 . Task: Add Reject links 1.

DataStage Essentials v8.1 2. On the Properties tab of each Sequential File stage, change the Reject Mode property value to “Output”.

3. Compile and run. Verify that it’s running correctly. You shouldn’t have any rejects or errors or warnings. 4. To test the rejects link, temporarily change the property First Line is Column Names to False in the Source stage and then recompile and run. This will cause the first row to be rejected because the values in the first row, which are all strings, won’t fit the column definitions, some of which are integers. 5. Examine the DataStage log. What row shows up in the Peek message? Examine the warning messages before the Peek. Note the number of rows that are successfully imported and how many are rejected.

Task: Read a file using multiple readers
1. Save your job as CreateSeqJobMultiRead. 2. Click the Properties tab of your source stage.



DataStage Essentials v8.1 3. Click the Options folder and add the “Number of Readers Per Node” property. Set this property to 2.

4. Compile and run your job. 5. View the job log. (Note: You will receive some warning messages related to the first Columns names row. And this row will be rejected. You can ignore these.) In the job log, you will find log messages from Selling_Group_Mapping,0 and Selling_Group_Mapping,1.

Task: Create job reading multiple files
1. Save your job as CreateSeqJobPattern. 2. Run your job twice specifying the following file names in the job parameter for the target file: OutFileA.txt, OutFileB.txt. 3. Edit the source Sequential stage: a. Change read method to File Pattern



DataStage Essentials v8.1 b. Place a wildcard (?) in the last portion of the file name: OutFile?.txt

c. Change Record Delimiter to Record Delimiter String on Record Level category on the Format tab of Selling_Group_Mapping Stage. 4. Click View Data to verify that you can read the files. 5. Compile and run the job. View the job log. 6. Verify the results. There should be two copies of each row, since you are now reading two identical files.

Working with Data Sets Task: Create a data set
1. Open up your CreateSeqJob job and save it as CreateDataSetJob. 2. Delete the target sequential stage leaving a dangling link. 3. Add a DataSet stage and connect it to the dangling link. Change the name of the target stage as shown.



Write to a file named Selling_Group_Mapping. 7. Edit the Data Set stage properties. 04/01/2009 40 . and if necessary complete. the Copy stage columns mapping. Open the source stage and add the optional property to read the file using multiple readers. Check. Check the Director log for errors. 8.) 6. Change the value of the property to 2. Compile your job. (This will insure that data is written to more than one partition.1 4.ds in your ISFiles>Temp directory. Run your job. 5.DataStage Essentials v8.

In Designer click Tools > Data Set Management. Click the Show Data Window icon at the top of the window to view the dataset data. Select your dataset.1 Task: View a dataset 1. 3. Click the Show Schema Window icon at the top of the window to view the dataset schema. 04/01/2009 41 .DataStage Essentials v8. 2.

Notice also that the last column (Distr_Chann_Desc) is missing some values. 04/01/2009 42 .txt file in your ISFiles directory. Open up the source Sequential stage to the columns tab. 3. 2. Open up the Selling_Group_Mapping_Nulls. Notice in the data that the Special_Handling_Code column contains some integer values of 1.1 Reading and Writing NULLs Task: Read values meaning NULL from a sequential file 1. Save your CreateSeqJobParam job as CreateSeqJobNULL. To test how to read NULLs.DataStage Essentials v8. In the next step we will specify this. Double-click to the left of the Special_Handling_Code column to open up the Edit Column Meta Data window. 4. let’s assume that 1 in the third column means NULL and that nothing in the last column means NULL.

Select this folder and then add the Null field value property.DataStage Essentials v8. 6. Click Apply.1 5. Notice that the Nullable folder shows up in the Properties window. Specify a value of 1 for it. Change the field to nullable. 04/01/2009 43 .

Notice that values that is interpreted by DataStage as NULL show up as the word “NULL”. 8. Move to the Distr_Chann_Desc column. Add the Null field value property. we will treat the empty string as meaning NULL. 9. Set this field to nullable.txt file to read. Click the View Data button. select the Selling_Group_Mapping_Nulls.DataStage Essentials v8. Here. Make sure Record Level delimiter is DOS Format string. 04/01/2009 44 .1 7. To do this specify “” back-to-back quotes. Otherwise you won’t see the NULL for the empty string. On the Propeties tab.

What happens? 04/01/2009 45 . Compile and run your job. 3. Task: Write values meaning NULL to a sequential file 1. 2. Compile and run your job.1 10. Save your job as CreateSeqJobHandleNULL. Open up your target stage to the Columns tab. It should abort since NULL values will be written to non-nullable columns on your target.DataStage Essentials v8. Specify that the Special_Handling_Code column and the Distribution_Channel_Description column are nullable.

we will specify values to be written to the target file that represent NULLs.1 4. In this case. 04/01/2009 46 . For the Special_Handling_Code column we will specify a value of -99999.” Notice that they are written to the Peek stage. For the Distribution_Channel_Description column we will specify a value of “UNKNOWN”.DataStage Essentials v8. the job doesn’t abort. since NULL values aren’t being written to a non-nullable column. 5. let’s handle the NULL values. But the rows with NULL values get rejected because the NULL values “aren’t being handled. That is. Now. where you can view them.

8. The procedure is the same as when the Sequential stage is used as a source.) 04/01/2009 47 . (Note: if you view the data in DataStage. You should get no errors or warnings or rejects. 7. all you will see is the word “NULL”. Compile and run your job.1 6. Open up the target stage and specify these values. not the actual value that means NULL. Verify the results by viewing your target file on the DataStage Server. View the job log.DataStage Essentials v8.

View the data. It indicates that the stage is collecting the data. Note the icon on the input link to the target stage (fan-in). Note the collecting algorithm (Auto) that’s selected. 4.1 Lab 07: Platform Architecture Assumptions: • No new assumptions. Compile and run your job. Open up the target Sequential stage to the Input>Partitioning tab. 04/01/2009 48 .DataStage Essentials v8. 6. 2. 3. 5. Save your CreateSeqJobParam job as CreateSeqJobPartition. Task: Partitioning and collecting 1.

refresh the canvas by turning “Show link markings” off and on using the toolbar button. If it says “Collector type:”. 11. (Note: If you don’t see this. 9. Open up the target Sequential stage to the Properties tab. You can see this by noting the words on top of the “Partitioning / Collecting” box. If it says “Partition type:”. Create the files in your ISFiles>Temp directory.) The icon you see now means Auto partitioning. 10. 12. Compile and run your job. 8. Notice how the data is exported to the two different partitions (0 and 1). it is collecting. but now is partitioning.DataStage Essentials v8. Go to the Partitioning tab. 04/01/2009 49 . Make sure they have different names. Notice that the partitioning icon has changed.1 7. It no longer indicates collecting. Close the stage. write to two files. Open each of the two target files in a text file viewer outside of DataStage. Notice that it is no longer collecting. View the job log. Instead of writing to one file. then the stage is partitioning.

TargetFile. Notice how the data is partitioned.. Notice how the data gets distributed. Source file: 14. etc. TargetFile. 6th. 5th. Here.txt1: 15. Change the partitioning algorithm to. we see that the 1st. 4th. etc.txt2: 16.1 13. e.g. 17. This is because the default partitioning algorithm is Round Robin.DataStage Essentials v8. go in the other file. Entire. go into one file and the 2nd. 3rd. Experiment with different partitioning algorithms! 04/01/2009 50 .

1 Lab 08: Combining data Assumptions: • No new assumptions Task: Lookup Address 1. Open a new Parallel job and save it under the name LookupWarehouseItem. Add the stages and links and name them as shown. 04/01/2009 51 .DataStage Essentials v8.

as VarChar(50).txt sequential file to your _Training>Metadata folder. This is because some of the data contains double quotes as part of the data. 5. In the Warehouse Sequential File stage extract data from the Warehouse. 7. 4. Use the types shown below. 04/01/2009 52 . viz. Import the Table Definition for the Warehouse.txt file. Map the Item field in the top left pane to the lookup Item key field in the bottom left pane by dragging the one to the other. (Otherwise.txt file. In the Items Sequential File stage. Open the Lookup stage.DataStage Essentials v8. you will get errors when your job reads the file. Be sure you can view the data. Drag all the Warehouse columns to the target link. The Item field should be defined like it is defined in the Warehouse stage. Import the Table Definition.. on the Format tab. 6. change the Quote character to the single quote (‘).1 2.) 3. In the Items Sequential File stage extract data from the Items.

DataStage Essentials v8. (You will fix things in the next task. 10. Compile and run. Try to determine why it failed and think what you might do about it. Drag the Description field in the lower left pane to target link placing it just below the Item field.1 8. Examine the job log. Change its name to ItemDescription.) Task: Handle lookup failures 1. 04/01/2009 53 . Edit your target Sequential stage as needed. Save your job as LookupWarehouseItemHandleNoMatch. 9. Your job probably failed.

Replace NULLs by the string “NOMATCH”. (The number of rows to view has to be more than 100. Open up the Lookup stage. View the data in the target Sequential File stage. Examine the log. 7. Compile and run. Compile and run. 4. when a there is a lookup failure with Continue. Notice the NULL values. 8. Task: Add a Reject link 1. 5. these rows get rejected. You should not get any fatal errors this time. Since these columns are not nullable. 9. DataStage outputs NULLs to the lookup columns.DataStage Essentials v8. 04/01/2009 54 . Open up Lookup stage. 6. When the lookup fails. second from left). Save your job as LookupWarehouseItemReject. Click the constraints icon (top.1 2.) What happened to these rows? This difficult question will be answered in the next task. View the data. By default. Do you find any rows in the target file in which the lookup failed? (These would be rows with missing item descriptions. Make both the Description column on the left side and the ItemDescription column on the right side nullable. specify that the job is to continue. 3. Open up the target sequential stage.

Compile and run. Open up Lookup stage and specify that Lookup Failures are to be rejected.DataStage Essentials v8. 4.1 2. Examine the Peeks in the job log to see what rows were lookup failures. 04/01/2009 55 . 3. Add a Rejects link going to a Peek stage to capture the lookup failures.

Replace the Lookup stage by the Join stage.1 5. Perform an inner join.DataStage Essentials v8. Notice in the Peek messages that a number of rows were rejected. Save it as LookupWarehouseItemJoin. 2. Examine the job log. 04/01/2009 56 . Join by Item in a case sensitive manner. Task: Use Join stage 1. Open your LookupWarehouseItemNoMatch job. 3.

You should see NULLs in the eight no-matches. You can verify this by examining the performance statistics or by seeing how many records are exported by the Warehouse_Items stage in the job log. Recompile and run your job after selecting Right Outer Join as the join type. View the data. Click the Link Ordering tab. 5. 7. 6. Check the Output>Mappings tab to make sure everything is mapped correctly. Make Items the Left link.DataStage Essentials v8. Compile and run.1 4. 04/01/2009 57 . Verify the results and verify that the number of records written to the target sequential file is the same for the Lookup job.

DataStage Essentials v8.1 Task: Use Merge stage 1. In this task. In the Merge stage. specify that data is to merged. 04/01/2009 58 . We will see that it cannot be successfully used. Also specify that unmatched records from Warehouse (the Master) are to be dropped. Assume that the data is sorted in ascending order. we will see if the Merge stage can be used in place of the Join stage. with case sensitivity. 2. 3. Save your job as LookupWarehouseItemMerge. by the key (Item). Replace the Join stage by the Merge stage.

1 4. Map input columns to the appropriate output columns. Compile and run. 04/01/2009 59 . View the data.DataStage Essentials v8. Examine the performance statistics and/or job log to determine how many records were written to the target sequential file. 6. Insure that the Warehouse link is the Master link. 5. 7.

DataStage Essentials v8.1 8. Notice that over 80 fewer rows are written to the target sequential file. Why? Examine the job log. Notice that a bunch of records from the Warehouse.txt file have been dropped because they have duplicated key values.

9. The moral here is that you cannot use the Merge stage if your Master source has duplicates. None of the duplicate records will match with update records. Recall that another requirement of the Merge stage (and Join stage) is that the data is hashed and sorted by the key. We did not do this explicitly, so why didn’t our job fail? Let’s examine the job log for clues. Open up the Score message (you need to set the message filter to a value of 300 or “select all entries” in the Director via the “view” pulldown menu after you are in any job’s log view). 10. Notice that sorts (tsort operators to be precise) have been added by DataStage.

Range lookups Task: Design a job with a reference link range lookup
This job reads Warehouse item records from the source file. The lookup file contains start and end 04/01/2009 60

DataStage Essentials v8.1 item numbers with descriptions that apply to items within the specified range. The appropriate description is added to each record which is then written out to a sequential file. 1. Open a new Parallel job and save it under the name LookupItemsRangeRef. Save in the _Training>Jobs folder. Add the stages and links and name them as shown.

2. Edit the Warehouse sequential stage to read from the Warehouse.txt file. Verify that you can view the data.



DataStage Essentials v8.1 3. Import the Table Definition for the Range_Descriptions.txt sequential file. The StartItem and EndItem fields should be defined like the Item field is defined in the Warehouse stage, viz., as VarChar(50).

4. Edit the Range_Description sequential stage to read from the Range_Descriptions.txt file. Verify that you can view the data.



7. 9. 04/01/2009 63 . Write to a file named WarehouseItems. Edit the target Sequential stage.Item column value is to be greater than or equal to the StartItem column value and less than the EndItem column value.DataStage Essentials v8. Then drag the Description column from the Range_Description link across. Drag all the Warehouse columns across. Select the Range checkbox to the left of the Item field in the Warehouse table.) 6. Open the Lookup stage. 8. (The Description column on the left and the Description column on the right should both be nullable.1 5. Create the file with column headings. Specify that the Warehouse. **Important!!!** Set the Description column to nullable. Double-click on the Key Expression cell for the Item column to open the Range Expression editor. Open the constraints window and specify that the job is to continue if a lookup failure occurs.txt.

11.DataStage Essentials v8. 3. 12. The appropriate description is added to each record which is then written out to the DB2 table. For each row read. The Description column in the Sequential file stage is nullable. 1. Replace NULL values by NO_DESCRIPTION. Go to the extended properties window for this column.txt file. Save your job as LookupItemsRangeStream in your _Training>Jobs folder. Open the target file in a text editor to view and verify the data.txt file with items within the range. First make the source link a reference link. It then does a lookup into the Warehouse. it selects all the records from the Warehouse.txt file. 2. Reverse the source and lookup links. Compile and run your job. Open up your Lookup stage. Select Item in the Warehouse link as the key. Then make the lookup link a stream link. Click the right mouse button and click Convert to reference. 04/01/2009 64 . Task: Design a job with a stream range lookup This job reads from the RangeDescription. Specify the Key type as Range.1 10.

View the data. 6. Specify the range expression. 5.DataStage Essentials v8.1 4. 04/01/2009 65 . 7. Double-click on the Key Expression cell. Also specify that the job is to continue if there is a lookup failure. Compile and run your job. Specify that multiple rows are to be returned from the Warehouse link. Click the Constraints icon.

1 Combining data using the Funnel Stage Task: Build a job with a Funnel stage In this task. Edit the two source Sequential stages to extract data from the twoWarehouse files. 4. 5.DataStage Essentials v8. 1. They have the same metadata as the Warehouse. 3. Warehouse_031005_01.txt. Edit the Funnel stage to combine data from the two files in Continuous mode. On the Output>Mapping tab map all columns through the stage.txt files into a single file. Write to a file named Warehouse_031005. 2. you’ll combine data from two of the Warehouse. Add links and stages and name them as shown.txt file.txt.txt and Warehouse_031005_02. Open a new Parallel job and save it under the name FunnelWarehouse. 04/01/2009 66 .

Compile and run.DataStage Essentials v8. Verify that the number of rows going into the target is the sum of the number of rows coming from the source.1 6. 04/01/2009 67 . And verify the data.

04/01/2009 68 .txt file.1 Lab 09: Sorting and Aggregating Data Assumptions: • No new assumptions Task: Create the job design 1.txt file. This file has the same format as the Selling_Group_Mapping. Edit the Selling_Group_Mapping_Dups Sequential stage to read from the Selling_Group_Mapping_Dups.DataStage Essentials v8. Add stages and links and name them as shown. Open a new parallel job and save is as ForkJoin. 2.

1 3. Specify that the aggregation amount is to go into a column named CountGroup. Specify that records are to be grouped by Selling_Group_Code. 4. 7. 6. 5. specify that all columns move through the stage to the output link going to the Join stage. Specify that only the Selling_Group_Code column moves through the Copy stage to the Aggregator stage.DataStage Essentials v8. Define this column on the Outputs>Columns tab as an integer. Send all columns through the stage. Edit the Sort_By_Code Sort stage. The sort should not be a stable sort. Specify that the type of aggregation is to count the rows. In the Copy stage. 8. Edit the Aggregator stage. length 10. Perform an ascending sort by Selling_Group_Code. 04/01/2009 69 .

The join key is Selling_Group_Code. 10.1 9.) 04/01/2009 70 . Select Sort as the aggregation method. The Join Type is Left Outer.DataStage Essentials v8. (Verify on the Link Ordering tab that the CopyToJoin link is the left link. because the data has been sorted by the grouping key column. Edit the Join stage.

04/01/2009 71 . On the Outputs>Mapping tab. Turn off Stable Sort. Move all columns through the stage. we still need to specify Sort. the sort stage requires all key columns to be sorted. so that the stage doesn’t repeat this sort. you can specify that the data has already been sorted by Selling_Group_Code. At print time of this document. map all columns across.DataStage Essentials v8. Edit the Sort_By_Handling_Code stage. In the future. Therefore even though the key column of Selling_Group_Code has already been sorted. 12.1 11.

Compile and run.txt. Edit the target Sequential stage. 15. Edit the Remove Duplicates stage. Group by Selling_Group_Code.1 13. On the Partitioning tab. Write to a file named Selling_Group_Code_Deduped. Retain the last record in each group.DataStage Essentials v8. 04/01/2009 72 . 14. View the job log. collect the data using Sort Merge based on the two columns the data has been sorted by.

1 16.DataStage Essentials v8. View the data. 04/01/2009 73 .

Click the New button on the Designer toolbar and then open the “Other” folder. Double-click on the Parameter Set icon.DataStage Essentials v8.1 Lab 10: Transforming Data Assumptions: • No new assumptions Task: Create a parameter set 1. 04/01/2009 74 . 2.

6. Open up the Transformer and map the columns across. define the parameters shown. Replace the Copy stage by a Transformer stage. 4.1 3. just as was done in the Copy stage. Compile and run. Open up your CreateSeqJobParam job and save it as TransSellingGroup. 4. 04/01/2009 75 . On the Values tab. 3. On the Parameters tab. 2. specify the file and values as shown.DataStage Essentials v8. Save your new parameter set in _Training>Metadata in a folder. 5. name your parameter set SourceTargetData. On the General tab. Task: Add a Transformer stage to a job 1.

Task: Define a constraint 1. Save your job as TransSellingGroupConstraint.1 5. Add a new string. 4. 04/01/2009 76 . 2. 3. job parameter named Channel_Description with a default of “Other” (without the quotes). Open up your Job Properties to the Parameters window. Click Add Parameter Set. Open up the Transformer and create a constraint that selects just records with a channel description equal to that in the job parameter at run time. Delete any existing parameters. 5.DataStage Essentials v8. Select your SourceTargetData parameter set and click OK. Check the job log.

Task: Define an Otherwise Link 1. 11. Open up your source Sequential stage. run. 9. lower.DataStage Essentials v8. Hint: Use Upcase() function. Hard-code the directory path to your ISFiles>Temp directory. 10. Open up your target Sequential stage. 04/01/2009 77 . or mixed case. 7. Use the TargetFile parameter from your parameter set for the name of the target file in the File property. Compile. Save your job as TransSellingGroupOtherwise. Compile and run your job. Modify your constraint in the Transformer so that descriptions can be entered in upper. View the data in the target file to verify that it correctly selects the right rows. 8. and test your job. Use the Dir and SourceFile parameters from your parameter set for the directory and name of the source file in the File property.1 6.

(Note: Depending on how you drew your links. In the Transformer. map all input columns across to the new target link. Add an additional link to a Sequential File stage and name them as shown.) 04/01/2009 78 .DataStage Essentials v8.1 2. In the Transformer. 4. this link may already be last. 3. reorder the links so that the Selling_Group_Mapping_Other link is last in output link ordering. Use the icon at the top right of the Transformer to accomplish this.

The second replaces one special handling code by another. Save your job as TransSellingGroupDerivations. and test your job.1 5.DataStage Essentials v8. The first derivation constructs addresses from several input columns. Compile. you define two derivations. If you do not see the Stage Variables window at the top right corner. 7. The rows going down the Selling_Group_Mapping_Other link should be all the rows that do not satisfy the constraint on the first link. Open the Transformer. 1. Select the Otherwise box on your Selling_Group_Mapping_Other link. 2. in the toolbar at the top of the Transformer. run. 6. click the Show/Hide Stage Variables icon. Edit the Selling_Group_Mapping_Other Sequential File stage as needed. Open the Constraints window. Task: Define derivations In this task. 04/01/2009 79 .

Create a varchar stage variable named HCDesc. 4. 04/01/2009 80 .1 3. 5. Close the Stage Variable Properties window. Create a new column named Handling_Code_Description for each of the two output links. 6. Define a derivation that for each row’s Special_Handling_Code produces a string of the following form: “Handling code is: [xxx]”. Click the right mouse button over the Stage Variables window and click Stage Variable Properties. Here “xxx” is the value in the Special_Handling_Code column. Double-click in the cell to the left of the HCDesc stage variable.DataStage Essentials v8.

1 7. Pass the value of the HCDesc stage variable to each of these link columns. 04/01/2009 81 .DataStage Essentials v8.

you may need to use the substring operator and Len functions. Compile. Write a derivation for the target Selling_Group_Desc columns that replaces “SG055” by “SH055”. which shows the replacement of SG055 with SH055 in the second column. 9. becomes “SH005 Live Swine”. for example. Also. Notice specifically.DataStage Essentials v8. Here some the output. In other words “SG055 Live Swine”.1 8. and test your job. Hint: Use the IF THEN ELSE operator. the highlighted row (550000). Also notice the format of the data in the last column. run. 04/01/2009 82 . leaving the rest of the description as it is.

2. 5. Open Quick Find by clicking the link at the top of the Repository window. 4. Click Find. Click Next to highlight the next item.DataStage Essentials v8. In the Name to find box type Lookup* and then click the Find button.1 Lab 11: Repository Functions Assumptions: • No new assumptions Task: Execute a Quick Find 1. 04/01/2009 83 . 3. In the Types to find list select Parallel Jobs. The first found item will be highlighted.

5.1 Task: Execute an Advanced Find 1. This reduces the list of found items to those that use this Table Definition. Specify objects modified within the last week.DataStage Essentials v8. Click on the link that displays the number of matches. Open up the Where Used folder.txt Table Definition. Add the Range_Descriptions. 04/01/2009 84 . Click Find. Specify that the search is to be case sensitive. Open up the Options folder. 2. Change the folder to search to _Training. Open the Last modification folder. 3. This opens the Advanced Find window and displays the items found so far in the right pane. Click Find. 4.

DataStage Essentials v8.1 6.dsx in your _Training>Temp folder. Close the Export window. Export these jobs to a file named LookupJobsUsingRangeDescriptionsFile. 7. Select the found items and then click the right mouse button over them. 04/01/2009 85 .

Click on the top link to view the report.DataStage Essentials v8. This report is saved in the Repository where it can be viewed by logging onto the Reporting Console. Click File>Generate Report to open a window from which you can generate a report describing the results of your Find. 2.1 Task: Generate a report 1. 04/01/2009 86 .

This displays the report you viewed earlier from Designer. 04/01/2009 87 .1 3. A Suite administrator can give additional administrative functions to a Suite user. such as format. On the Reporting tab. Select your report and then click View Report Result. 4. including the ability to alter report properties. expand the Reports folder as shown.DataStage Essentials v8. By default. click on the Reporting Console link. a Suite user only has permission to view the report. After closing this window.

DataStage Essentials v8. Click the right mouse button and then select Find Where Used>All Types. In the Repository window.txt Table Definition. 04/01/2009 88 .1 Impact Analysis Task: Perform an impact analysis 1. select your Selling_Group_Mapping.

This shows how the path is situated on the canvas. Hold right mouse button over a graphical object and move the path around. Use the Zoom button to adjust the size of the dependency path so that it fits into the window.DataStage Essentials v8. Click the right mouse button over the ForkJoin job listed and then click “Show dependency path to…” 3. You can move the path around by clicking to one side of the image in the birds-eye view window and by holding the right mouse button down over the image and moving the image around. Notice the “birds-eye” view box in the lower right hand corner. 5. 04/01/2009 89 .1 2. 4.

4. 04/01/2009 90 .1 Job and Table Difference Reports Assumptions: • • You have created the LookupItemsRangeRef job in a previous exercise You have created the Warehouse. Open up both the LookupItemsRangeRef and the LookupItemsRangeRefComp jobs. 3. Open the Lookup stage.txt Table Definition in a previous exercise Task: Find the differences between two jobs 1. Change the name of the link going to the Warehouse_Items target Sequential File stage to WAREHOUSE_ITEMS. change the Lookup Failure condition to “Drop”. On the Properties tab. On the Columns tab. Save the changes to your job. Open your LookupItemsRangeRef job. 6. Make the following changes to the LookupItemsRangeRefComp job. In the constraints window.DataStage Essentials v8. change the length of the first column (StartItem) to 111. 7. 5. Save it as LookupItemsRangeRefComp into your _Training>Jobs folder. Click Tile from the Window menu to display both jobs in a tiled manner. 2. Open up the RangeDescription Sequential File stage on the reference link. change the First Line is Column Names to False.

DataStage Essentials v8. In the Compare window select your LookupItemsRangeRef job on the Item Selection window.1 8. Right-click over your LookupItemsRangeRefComp job name in the Repository window and then select Compare Against. 04/01/2009 91 . 9.

On the General tab.DataStage Essentials v8. Notice that the editor is opened for the referenced item. Make the following changes to the copy. Range_Description.. 4. Right-click over your Table Definition copy and then select Compare Against.1 10. 11. Notice that the stage is highlighted in both of the jobs. 14. 04/01/2009 92 . Open up the html file in a Browser to see what it looks like. 13. 6. This saves your report as an html file. On the Columns tab change the name of the Item column to ITEM_ZZZ. Create a copy of your Warehouse. 2. 5. Click OK. Task: Find the differences between two Table Definitions 1. Click on one of the underlined words. change the short description to your name. e. Click OK to display the Comparison Results window. Click on a stage or link in the report. click File>Save as. 3.g. And change its type and length to Char(33).txt Table Definition. 12. With the Comparison Results window the active window.

1 7. 04/01/2009 93 . 8.DataStage Essentials v8. Click OK to display the Comparison Results window. In the Comparison window select your Warehouse.txt Table.

Make a note of your DB2 instance name under the Instances folder. 3. 4.DataStage Essentials v8. You instructor will give you this information. If this is the DB2 instance with the Information Server Repository you will see the Repository database. Here it is “DB2”. 04/01/2009 94 . Keep the Control Center open. Click on the All Databases folder. XMETA. Click the Finish button to create the database. 2. Click on your DB2 icon in the icon tray and select Control Center. 1. Click Create New Database and named it ISUITE.1 Lab 12: Work with Relational Data Assumptions: • • You have access to DB2 You have a working LookupWarehouseItem job Task: DB2 Setup ***NOTE: The steps in this task only need to be performed if DataStage and DB2 have not been set up to work with each other.

1 5. The first command connects to the database you created. Create a variable named DB2INSTANCE. Click the Command Editor Button in the toolbar. Shown below is for DB2 UDB v9. and then click the Environment button. This is different depending on the version of DB2 installed.cfg is located. There should be no warning and no error reported. Close the Administrator client after this setting. Type the commands shown (or copy them from the DB2 Bind. The second binds DataStage components to this database. 7. (Note: You may need to alter the directory path to the db2esql80.5. click the Properties button. Set the APT_DBNAME variable to a database (ISUITE) that you want to be the default. Set the APT_DB2INSTANCE_HOME variable to where the file db2nodes. Here is “DB2”.txt file in your ISFiles folder). Click the Operator Specific folder. Click the Execute button. 04/01/2009 95 .DataStage Essentials v8.bnd file. Click the User-Defined folder. 9. Set it to the name of the DB2 instance you identified in the DB2 Control Center.) 8. Open up Administrator. Open up the DB2 Control Center. 6. select your project.

Close the Command Editor and Control Center. Specify that the Lookup Failure action is Drop. You’re your job as LookupWarehouseItemDB2.1 10.DataStage Essentials v8. Open up your LookupWarehouseItem job. 4. Name the target stage as shown. 3. Task: Create and load a DB2 table using the DB2 Connector stage 1. ***Important: Compile and run your job and verify that it is working before you proceed. Replace your target Sequential File stage by a DB2 Connector stage. 04/01/2009 96 . Open up the Lookup stage and click the Constraints icon. 2.

Set Generate SQL to Yes.DataStage Essentials v8. 04/01/2009 97 .1 5. Set the Instance property to DB2. Set the Write mode property to Insert. Edit the DB2 Connector stage as shown. Set the Database to ISUITE. The Table name is ITEMS. Fill in the operating system ID and password that has the privilege to access DB2.

7. 04/01/2009 98 . Scroll down and set the Table action property to “Replace”. Check the job log for errors.DataStage Essentials v8. Compile and run.1 6.

Then select Orchestrate schema definitions. click the Import menu and then select Table Definitions.) Task: Import a table definition using orchdbutil 1. the schema name is ADMINISTRATOR. Make a note of the Schema name of the ITEMS table you created. In the Import Orchestrate Schema window. In Designer. 2. (Here. Double-click on the name of your ITEMS table in the ISUITE database and view the data.DataStage Essentials v8. 04/01/2009 99 . Open up the DB2 Control Center. select the “Database table (via orchdbutil)” button.1 8. 9.

DataStage Essentials v8. Fill in the information needed to import a table definition for the ITEMS table in your DB2 database named ISUITE.) 04/01/2009 100 . (Note: Your schema name may be different than what’s shown here.1 3.

DataStage Essentials v8. 04/01/2009 101 . You should see the schema definition of ITEMS displayed in the window. Click Next.1 4.

Select your _Training>Metadata folder to import to. Open up your Table Definition in the Repository and examine its column definitions. 04/01/2009 102 . Change the table name to ITEMS.DataStage Essentials v8. 7. Click Next until you move to the Table definition window.1 5. Complete the import process. 6.

and TABLE. Click the System DSN tab. 5. 3. especially DATABASE. Open up your operating system ODBC Data Source Administrator.DataStage Essentials v8. This information will be used in the SQL Builder. Click Add. Be sure all the information is correct. 9. Check with your instructor. Select the IBM DB2 ODBC driver. 6. 2. Close the Table Definition.1 8. 1. SCHEME. Task: Create an ODBC Data Source This step is necessary only if a DB2 ODBC data source named ISUITE hasn’t been set up for you already. Click OK. 04/01/2009 103 . 4. Click on the Locator tab and examine its contents. The data source name is ISUITE. The Database alias is ISUITE.

The second is named WarehouseHigh with a default value of 999999.1 Task: Create a job that reads from a DB2 table using the ODBC Connector stage 1. 2.DataStage Essentials v8. Create two job parameters. 04/01/2009 104 . Create a new job named relReadTable_odbc. The first is named WarehouseLow with a default value of 0.

Load your ITEMS table definition. 4. Open up the ITEMS stage to the Properties tab. Select ISUITE.DataStage Essentials v8. Click Test to test the connection. Click the Data Source cell and then click the Data Source button.1 3. Click the Columns tab. 04/01/2009 105 . Click the output link in the Navigator panel.

In the target Sequential stage. This opens the SQL Builder window. In the Usage folder.. Select the extended syntax. set the Generate SQL property to No. Save your job as relReadTable_odbc_sqlBuild. ITEMS.txt. 8. Compile and run your job. i. set the Generate SQL property to Yes. Click the Select statement cell and then click the Build. Click the Properties tab. Task: Build an SQL SELECT statement using SQL Builder 1. Type in the name of the table. 6. 04/01/2009 106 . In the Transformer stage map all columns across.e. 2.DataStage Essentials v8. write to a file named ITEMS. Open up the Connector source stage.1 5. In the Usage folder. 7. 3.

1 4. Select all the columns except ALLOCATED and HARDALLOCATED and drag them to the Select columns window. 04/01/2009 107 . Drag your ITEMS Table Definition onto the canvas. 5.DataStage Essentials v8.

Sort by ITEM and WAREHOUSE in that order. 04/01/2009 108 . Click the Sql tab at the bottom of the window to view the generated SQL. Specify the sort order in the last column. 9. Click OK to save and close your SQL statement. 8. To accomplish this select Ascending in the Sort column. In the Transformer stage remove the ALLOCATED and HARDALLOCATED columns from the output or verify that they have been removed. Accept the SQL as generated and allow DataStage to merge the SQL Builder selected columns with the columns on the Columns tab. 10. 7.DataStage Essentials v8. ascending. 11.1 6. You may get some warning messages. View the job log. Compile and run. View the data in the target stage.

In the Parameters string box. Click in the empty Column Expression cell after the last listed column. Open up your source ODBC Connector stage. In the Expression Editor box select the Functions predicate and then select the LCASE function. Notice the expression that has been built in the string box. Save your job as relReadTable_ODBC_expr. 3. 7. 04/01/2009 109 . Click OK to return to the first Expression Editor window. This opens the Expression Editor Dialog window. In this use the SUBSTRING function to select the first 15 characters of the ITEMDESCRIPTION column. 6. 2. Select Expression editor from the drop-down list. ITEMDESCRIPTION. select Expression Editor again to open a second Expression Editor Dialog.1 Task: Use the SQL Builder expression editor 1. Then open SQL Builder for the SELECT statement.DataStage Essentials v8. 4. Click OK. 5.

View the data. Click Yes so that DataStage will add the SHORT_DECRIPTION column to your metadata. Click OK to return to the Properties tab. 10. specify the following WHERE clause: Warehouses with numbers between WarehouseLow and WarehouseHigh. On the Columns tab. View the log. 15. 13. 11. Open the Transformer and map the new SHORT_DESCRIPTION column across. 9. A message is displayed informing you that your columns in the stage don’t match columns in the SQL statement. 12. Varchar(15). 14. specify the type correctly for the SHORT_DESCRIPTION column. Click the Add button to add it to the Selection window. 04/01/2009 110 . Verify that it is correct.1 8. 16. In the Construct filter expression window. Try out different parameter values.DataStage Essentials v8. Specify a Column Alias named SHORT_DESCRIPTION. Close down the Expression Editor windows. where these are job parameters. Compile and run. Click the SQL tab at the bottom of the SQL Builder to view the constructed SQL.

DataStage Essentials v8. Save your job as relWriteTable_odbc. Replace your target Sequential File stage with an ODBC Connector stage.1 Task: Create and write to a DB2 table 1. 04/01/2009 111 . Name the links and stages as shown.

Click OK. Fail on error (for the drop statement) is No. Compile and run. 04/01/2009 112 . The Table name is ITEMSUPD. 5. The Generate SQL property is set to Yes. Set the properties as shown. Check the job log for errors. The Data source is ISUITE. The Write mode is Insert.DataStage Essentials v8. and the Table Action is Replace. Click the input link in the Navigator panel.1 2. Generate drop statement is Yes. View the data. Open the target Connector stage to the Properties tab. 4. 3.

04/01/2009 113 .1 Task: Update a DB2 table 1. 2. Save your job as relWriteTable_odbc_Update.DataStage Essentials v8. Import the table definition for your ITEMSUPD table into your _Training>Metadata folder. and specify the correct table name and schema. 3. Make sure you correct the table name to be ITEMSUPD. Open your table definition to the Locator tab.

Select the Update statement cell and then click Build to open SQL Builder. 5. 04/01/2009 114 . Open the target Connector to the Properties tab. Change the Write mode to Update. Set Generate SQL to No.DataStage Essentials v8.1 4.

DataStage Essentials v8.1 6. Drag your ITEMSUPD table definition to the canvas. 04/01/2009 115 . Select and drag all the columns down to the Update columns pane.

10. 9. Open up Director and view the job log. 04/01/2009 116 . Compile and run. Here. View the generated SQL.1 7. which you select from the drop-down lists. 8.DataStage Essentials v8. Click the Add button after each expression is defined. 11. Close the SQL Builder and the stage. you do equality comparisons between columns in the table and input columns. Define a WHERE clause that updates rows with the same WAREHOUSE and ITEM values.

DataStage Essentials v8.txt Table Definition that was loaded into the source. Open your TransSellingGroup job and save it as Metadata_job. Remove the two reject links. On the Projects tab. locate the Selling_Group_Mapping.1 Lab 13: Metadata in the Parallel Framework Assumptions: • You have a working TransSellingGroup job. In the Repository window. Open the two Sequential Files stages and change the value of the Reject mode property to Continue. Enable RCP for all existing stages as well. Double-Click to open the Table definition. Open up the Job Properties window and enable RCP for all links of your job. 3. Log on to Administrator. 2. 04/01/2009 117 . Enable RCP (Runtime Column Propogation) for your project or verify that it is enabled. select your project and then click Properties. 4. Task: Use a schema in a Sequential stage 1.

select the Parallel button to display the OSH schema.1 5. Open up the schema file in WordPad to view its contents. Remove them. 7. 6.DataStage Essentials v8. On the Parallel tab. delete the Record Delimiter property and add the Record Delimiter String property and set it to DOS String. Important: The “{prefix=2}” extended properties you see in the schema file are definitely something we don’t want. 04/01/2009 118 . Name the file Selling_Group_Mapping. On the Layout tab.schema. Click the right mouse button to save this as a file in your ISFiles directory.

9.DataStage Essentials v8. remove all the columns. On the Columns tab. Add the Schema file option. 11. the job won’t compile. Open up your Source Sequential stage to the Properties tab. 04/01/2009 119 . In the Transformer. Then select the schema file you copied to the ISFiles directory in the previous step. because the constraint references an unknown input column. Verify that the data is written correctly. 10. if any are defined. Compile and run your job. clear all column derivations (don’t delete the output columns!) going into the target columns and verify that RCP is enabled. Also remove any constraints. If you don’t remove the constraints.1 8.

Save your job as Metadata_job_02. Add a Copy stage just before the Transformer. 2. Verify that RCP is enabled.DataStage Essentials v8. 4. Remove all the columns. On the Columns tab of the Copy stage.1 Task: Define a derivation in the Transformer 1. Open the target Sequential File stage. 3. load just the Distr_Chann_Desc field from the Selling_Group_Mapping Table Definition. 04/01/2009 120 . Add the optional Schema File property and select your schema file for it.

1 5. Define a derivation for the output column that turns the input column to uppercase. 7. Compile and run your job. All other columns were just passed through untouched. Notice that the Distr_Chann_Desc column data has been turned to uppercase. 6.DataStage Essentials v8. Open the Transformer. View the data in the file (not using DataStage View Data). 04/01/2009 121 . Map the Distr_Chann_Desc column across the Transformer.

Create a new parallel job named Metadata_Shared_Container. 6. Open up the Transformer and note that it applies the Upcase function to a column named Distr_Chann_Desc. Close your job without saving it.DataStage Essentials v8. Highlight the Copy and Transformer stages of your job. Click the right mouse button over the container and click Open. 5. named UpcaseFieldC1. 04/01/2009 122 . Close the Transformer and Container job without saving it. This creates a reference to the shared container. 4.1 Task: Create a Shared Container 1. Click Edit>Construct Container>Shared. meaning that changes to the shared container will automatically apply to any job that uses it. Save your container. ***NOTE: Don’t save your job! It was just used to create the container. 2. into your _Training>Jobs folder. Drag your shared container to the canvas. *** 3.

9. but if you don’t you can always create one. move all columns through. Name the stages and links as shown.DataStage Essentials v8. Edit the Items Sequential stage to read from the Items. 11. You should already have a Table Definition. On the Inputs tab. change the name of the second column to Distr_Chann_Desc so that it matches the column in the Shared Container Transformer that the Upcase function is applied to. In the Copy stage. and a target Peek stage as shown.txt sequential file. Copy stage. On the Columns tab. Verify that you can view the data. map the input link to the Selling_Group_Mapping container link. Double-click on the Shared Container. Add a source Sequential File stage. 8. 10.1 7. 04/01/2009 123 .

Compile and run your job.DataStage Essentials v8. 04/01/2009 124 . 14. map the output link to the TargetFile container link. 13. Verify that the second column of data has been changed to uppercase. On the Outputs tab. Open up the Director log and find the Peek messages.1 12.

2. and seqJob3. 3. This file contains the jobs you will execute in your job sequence: seqJob1. Open up seqJob1.1 Lab 14: Job Control In this exercise. Open the Transformer. Import the seqJobs. The other two jobs are similar. 04/01/2009 125 . 4. Open a new Job Sequence and save it as seq_Jobs. 5.DataStage Essentials v8. Assumptions: • Your ISFiles directory contains a file named seqJobs. Open up the Job Parameters tab and note the parameters defined. you create a single job sequence that executes three jobs.dsx.dsx file in your ISFiles>Dsx directory. Notice that the job parameter PeekHeading prefixes the one and only column of data that will be written to the job log using the Peek stage. Task: Build a Job Sequence 1. seqJob2.

Read and check all the compilation options. 7. 04/01/2009 126 . 8.DataStage Essentials v8. link them.1 6. Drag 3 Job Activity stages to the canvas. Open the Job Properties to the General tab. Add job parameters to the job sequence to supply values to the job parameters in the jobs. and name the stages and links as shown.

Open up each of the Job Activity stages and set the job parameters in the Activity stages to the corresponding job parameters of the Job Sequence.RUNOK or DSJS.DataStage Essentials v8.JOBSTATUS is either DSJS.RUNWARN. 11. although possibly with warnings. 10. set the job triggers so that later jobs only run if earlier jobs run without errors. Create a custom trigger such that the previous job’s status is equal to one of the above two values. 04/01/2009 127 . Click the right mouse button in the expression window to insert the Activity variable.1 9. Compile and run your job sequence. Note: This means that the DSJS. For PeekHeading value use a string with a single space. In each of the first two Job Activity stages.

13. 04/01/2009 128 . Notice from the log and job summary that the first job runs OK but the second job aborts. Re-run the job using valid parameter values.DataStage Essentials v8. 14. Verify that each job ran successfully and examine the job sequence summary.1 12. This aborts the job sequence. Examine what happens if the second job aborts. To see this pass -10 as the record count (RecCount2). Notice that the job restarts with seqJob2 because the Restart option was selected on the Job Properties window and seqJob1 ran successfully in the previous run. Open the job log for the sequence.

a single space.1 Task: Add a user variable 1. 2. Create a user variable named varMessagePrefix. and an equal sign. Save your job as seq_Jobs_2. Add a User Variables stage as shown. Open the expression editor. Open the User Variables stage to the User Variables tab. 04/01/2009 129 .DataStage Essentials v8. Concatenate the DSJobName DSMacro. the DSJobStartDate DSMacro.

Save your job as seq_Jobs_3. 1. View the job log for the seqJob1 job. Verify that the PeekHeading is inserted before the column values in the Peek messages in the log.1 3. Open each Job Activity stage. 4. you modify your design so that the job isn’t executed until the Start Run file disappears from your ISFiles directory. Compile and run. 04/01/2009 130 . Task: Add a Wait for File stage In this task. For each PeekHeading parameter insert the varMessagePrefix for its value expression.DataStage Essentials v8.

Define an unconditional trigger. Edit the Wait for File stage. 04/01/2009 131 .DataStage Essentials v8.1 2. Add Wait for File stage as shown. 4. 3. Select the StartRun file. Specify that the job is to wait forever until this file disappears.

Compile and run your job.1 5. Test the Wait for File stage by first starting your job. Task: Add exception handling 1. 2.DataStage Essentials v8. Save your job as seq_Jobs_4. 04/01/2009 132 . After you view the log. Add the Exception Handler and Terminator stages as shown. rename the StartRun file (so that the StartRun file disappears).

Edit the TerminateJobs stage so that any running jobs are stopped when an exception occurs. For example. 4. set one of the run count parameters to -10. Compile and run your job. 04/01/2009 133 .DataStage Essentials v8. To test that it handles exceptions make an Activity fail.1 3.

Sign up to vote on this title
UsefulNot useful