You are on page 1of 88

DataStage Enterprise Edition

01/07/09

Sayrite Inc.

1

Introduction to DataStage EE Part 1

01/07/09

Sayrite Inc.

2

Ascential Platform

01/07/09

Sayrite Inc.

3

What is DataStage?
o Design jobs for Extraction, Transformation, and loading (ETL) o Ideal tool for data integration projects - such as, data warehouses, data marts, and system migrations o Import, export, create, and managed metadata for use within jobs o Schedule, run, and monitor jobs all within DataStage o Administer your DataStage development and execution environments
01/07/09 Sayrite Inc. 4

5 .DataStage Server and Clients 01/07/09 Sayrite Inc.

6 .Datastage Administrator 01/07/09 Sayrite Inc.

Specify a user name and password for scheduling jobs on the Schedule tab. and to set project properties. Specify hashed file stage read and write cache sizes on the Tunables tab. 01/07/09 Sayrite Inc.  Use the Administrator Project Properties window to:      Set job monitoring limits and other Director defaults on the General tab. Enable or disable server-side tracing on the Tracing tab. add and delete projects. Set user group privileges on the Permissions tab. 7 .Datastage Administrator  Use the Administrator to specify general server defaults.

Client Logon 01/07/09 Sayrite Inc. 8 .

9 .DataStage Manager 01/07/09 Sayrite Inc.

and jobs that are defined in the project. In addition to table and file layouts. it displays the routines.  01/07/09 Sayrite Inc. This metadata includes table and file layouts and routines for transforming extracted data.DataStage Manager  Use the Manager to store and manage reusable metadata for the jobs you define in the Designer. Custom routines and transforms can also be created in Manager. 10 . Manager is also the primary interface to the DataStage repository. transforms.

11 .DataStage Designer 01/07/09 Sayrite Inc.

DataStage Designer  The DataStage Designer allows you to use familiar graphical point-and-click techniques to develop processes for extracting. transforming. The Designer provides a "visual data flow" method to easily interconnect and configure reusable components. cleansing.data into warehouse tables. 12  01/07/09 . integrating and loading . Sayrite Inc.

and monitor your DataStage jobs. 01/07/09 Sayrite Inc.DataStage Director Use the Director to validate. run. 13 . You can also gather statistics as the job runs. schedule.

Define data transformations · .Define data flows ..Define data constraints · .Define data loads (writes) · .Define data integration · .Developing in DataStage       Define your project's properties: Administrator Open (attach to) your project Import metadata that defines the format of data stores your jobs wil read from or write to: Manager Design the job: Designer · .Define data aggregations Compile and debug the job: Designer Run and monitor the job: Director Sayrite Inc.Define data extractions (reads) · . 14 01/07/09 .

DataStage Projects 01/07/09 Sayrite Inc. 15 .

DataStage Projects 01/07/09 Sayrite Inc. 16 .

o Manager is used to execute your Jobs after you build them. o Administrator is used to set global and project properties.Review o DataStage Designer is used build and compile your ETL jobs. 17 . 01/07/09 Sayrite Inc. o Director is used to execute your jobs after you build them.

Intro Part 2: Configuring Projects 01/07/09 Sayrite Inc. 18 .

Set EE global properties in Administrator 01/07/09 Sayrite Inc.Set project properties in Administrator .Module Objectives After this module you will be able to: .Explain how to create and delete projects . 19 .

Recall from module 1: In DataStage all development work is done within a project. etc.Project Properties   Projects can be created and deleted in Administrator. metadata. The directory stores the objects (jobs. 01/07/09 Sayrite Inc. custom routines. You can set the default properties of a project using DataStage Administrator. Project properties and defaults are set in Administrator.) created in the project. 20 . Each project is associated with a directory. Before you can work in a project you must attach to it (open it). Projects are created during installation and after installation using Administrator.

21 .Setting Project Properties 01/07/09 Sayrite Inc.

Licensing Tab 01/07/09 Sayrite Inc. 22 .

23 .Projects General Tab 01/07/09 Sayrite Inc.

and when it aborts. You can limit the logged events either by number of days or number of job runs. The Auto-purge of job log box tab allows you to specify conditions for purging these events. 24 . you can perform some administrative functions in Director without opening Administrator.) The default is the General tab. For example.Projects General Tab Click Properties on the DataStage Administration window to open the Project Properties window. There are nine tabs. If you select the Enable job administration in Director box. 01/07/09 Sayrite Inc. events are logged describing the progress of the job. when it stops. When a job is run in Director. (The Mainframe tab is only enabled if your license supports mainframe jobs. events are logged when a job starts. The number of logged events can grow very large.

Environment Variables 01/07/09 Sayrite Inc. 25 .

Permissions Tab 01/07/09 Sayrite Inc. 26 .

 <None>. the groups displayed are defined in /etc/group. who can run and manage released DataStage jobs. This helps to prevent unauthorized access to DataStage projects. 27 . UNIX note: In UNIX. who has full access to all areas of a DataStage project. There are three roles of DataStage user:  DataStage Developer. who does not have permission to log on to DataStage. All DataStage users must belong to a recognized user role before they can log on to DataStage.  DataStage Operator.Permissions Tab Use this page to set user group permissions for accessing and using DataStage. 01/07/09 Sayrite Inc.

28 .Tracing Tab 01/07/09 Sayrite Inc.

When you enable it. Users with in-depth knowledge of the system software can use it to help identify the cause of a client problem. users receive a warning message whenever they invoke a DataStage client. This information is written to trace files. If tracing is enabled. This should only be used to diagnose serious problems. Warning: Tracing causes a lot of server system overhead.   01/07/09 Sayrite Inc. 29 . information about server activity is recorded for any clients that subsequently attach to the project. The default is for server-side tracing to be disabled.Tracing Tab  This tab is used to enable and disable server-side tracing.

you can specify the sizes of the memory caches used when reading rows in hashed files and When writing rows to hashed files. 01/07/09 Sayrite Inc. Hashed files are mainly used for lookups and are discussed in a later module. 30 .TunablesTab On the Tunables tab.

31 .Parallel Tab You should enable OSH for viewing . 01/07/09 Sayrite Inc.OSH is generated when you compile a job.

Intro Part 3: Managing Meta Data 01/07/09 Sayrite Inc. 32 .

Import and export DataStage objects .Import metadata for a sequential file 01/07/09 Sayrite Inc. 33 .Module Objectives  After this module you will be able to: .Describe the DataStage Manager components and functionality .

34 . It also includes the specific column definitions. This includes general format information such as whether the record columns are delimited and. if so. the delimiting character.What Is Metadata? Metadata is "data about data" that describes the formats of sources and targets. 01/07/09 Sayrite Inc.

The left pane contains the project tree. 01/07/09 Sayrite Inc. 35 . There are seven main branches. but you can create subfolders under each.DataStage Manager DataStage Manager is a graphical tool for managing the contents of your DataStage project repository. Select a folder in the project tree to display its contents. which contains metadata and other DataStage components such as jobs and routines.

routines.. table.  01/07/09 Sayrite Inc. 36 .Manager Contents  Metadata describing sources and targets: Table definitions DataStage objects: jobs. definitions. etc.

Import and Export
    

Any object in Manager can be exported to a file Can export whole projects Use for backup Sometimes used for version control Can be used to move DataStage objects from one project to another Use to share DataStage jobs and projects with other developers

01/07/09

Sayrite Inc.

37

Export Procedure

  

In Manager, click "Export>DataStage Components" Select DataStage objects for export Specified type of export: DSX, XML Specify file path on client machine

01/07/09

Sayrite Inc.

38

Review Q

You can export DataStage objects such as jobs, but you can't export metadata, such as field definitions of a sequential file. (T/F) The directory to which you export is on the DataStage client machine, not on the DataStage server machine. (T/F)

01/07/09

Sayrite Inc.

39

Exporting DataStage Objects

01/07/09

Sayrite Inc.

40

41 .Exporting DataStage Objects 01/07/09 Sayrite Inc.

Import Procedure  In Manager. 42 . click "lmport>DataStage Components“ Select DataStage objects for import  01/07/09 Sayrite Inc.

43 .Importing DataStage Objects 01/07/09 Sayrite Inc.

Import Options 01/07/09 Sayrite Inc. 44 .

Metadata Import     Import format and column destinations from sequential files Import relational table column destinations Imported as "Table Definitions" Table definitions can be loaded into job stages 01/07/09 Sayrite Inc. 45 .

click Import>Table Definitions>Sequential File Definitions Select directory containing sequential file and then the file Select Manager category Examined format and column definitions and edit is necessary 01/07/09 Sayrite Inc.Sequential File Import Procedure     In Manager. 46 .

Click the Columns tab to view and modify any column definitions.Manager Table Definition In Manager. 01/07/09 Sayrite Inc. Select the Format tab to edit the file format specification. Double-click the table definition to open the Table Definition window. select the category (folder) that contains the table definition. 47 .

Importing Sequential Metadata 01/07/09 Sayrite Inc. 48 .

Intro Part 4: Designing and Documenting Jobs 01/07/09 Sayrite Inc. 49 .

and load job .Design a simple extraction.Module Objectives After this module you will be able to: -Describe what a DataStage job is .Document your job 01/07/09 Sayrite Inc.List the steps involved in creating a job .Create parameters to make your job flexible .Compile your job . 50 .Describe links and stages .Identify the different types of stages .

51 . but can use components from Manager Built using a graphical user interface Compiles into Orchestrate shell language (OSH) 01/07/09 Sayrite Inc.What Is a Job?     Executable DataStage program Created in DataStage Designer.

run. validate. and monitor your job Sayrite Inc. import metadata defining sources and targets In Designer.Job Development Overview       In Manager. 52 01/07/09 . add stages defining data extractions and loads Add Transformers and other stages to defined data transformations Add links defining the flow of data from sources to targets Compile the job In Director.

53 .Designer Work Area 01/07/09 Sayrite Inc.

54 .Designer Toolbar 01/07/09 Sayrite Inc.

Tools Palette 01/07/09 Sayrite Inc. 55 .

56 .Adding Stages and Links   Stages can be dragged from the tools palette or from the stage type branch of the repository view Links can be drawn from the tools palette or by right clicking and dragging from one stage to another 01/07/09 Sayrite Inc.

a sequential file Specify full path to the file Specify a file format: fixed width or delimited Specified column definitions Specify write action 01/07/09 Sayrite Inc. 57 .Sequential File Stage      Used to extract data from. or load data to.

Create New Job 01/07/09 Sayrite Inc. 58 .Designer .

59 .Drag Stages and Links Using Palette 01/07/09 Sayrite Inc.

60 .Assign Meta Data Meta data may be dragged from the repository and dropped on a link. 01/07/09 Sayrite Inc.

Editing a Sequential Source Stage 01/07/09 Sayrite Inc. 61 .

If the file doesn't exist. you will get an error at run time. to the output link. you specify a format for the source file. You will be able to view its data using the View data button. On the Format tab. that is. Think of a link as like a pipe. You are defining the format of the data flowing out of the stage.Editing a Sequential Source Stage       Any required properties that are not completed will appear in red. 62 . What flows in one end flows out the other end (at the transformer stage). Define the output link listed in the Output name box. 01/07/09 Sayrite Inc. You are defining the file from which the job will read.

Editing a Sequential Target 01/07/09 Sayrite Inc. 63 .

The format going in is the same as the format going out. Sayrite Inc. that is. it will be created. You are defining the format of the data flowing into the stage. You are defining the file the job will write to." error. you will not (of course!) be able to view its data until after the job runs. you can specify a different format for the target file than you specified for the source file. Specify whether to overwrite or append the data in the Update action set of buttons. 64 01/07/09 . If the file doesn't exist.. The column definitions you defined in the source stage for a given (output) link will appear already defined in the target stage for the corresponding (input) link. What flows in one end flows out the other end. from the input links. DataStage will return a "Failed to open . If you click the View data button..Editing a Sequential Target        Defining a sequential target stage is similar to defining a sequential source stage. On the Format tab. If the target file doesn't exist. Define each input link listed in the Input name box. Think of a link as like a pipe.

65 .Transformer Stage    Used to define constraints. and column mappings A column mapping maps an input column to an output column In this module will just defined column mappings (no derivations) 01/07/09 Sayrite Inc. derivations.

66 .Transformer Stage Elements 01/07/09 Sayrite Inc.

This will be discussed in a later module. right pane. 01/07/09 Sayrite Inc.  The bottom area shows the column definitions (metadata) for the input and output links. right pane displays the contents of the stage variables. right pane displays the contents of the output link.  For now. Notice the following elements of the transformer:  The top. Both look the same but access different routines and functions. 67 . Unresolved column mapping will show the output in red. left pane displays the columns of the input links.Transformer Stage Elements There are two: transformer and basic transformer.  The lower. ignore the Stage Variables window in the top.  The top.

68 .Create Column Mappings 01/07/09 Sayrite Inc.

69 .Creating Stage Variables Stage variables are used for a variety of purposes: Counters Temporary registers for derivations Controls for constraints 01/07/09 Sayrite Inc.

Result 01/07/09 Sayrite Inc. 70 .

71 .Adding Job Parameters    Makes the job more flexible Parameters can be: .Used in constraints and derivations .Used in directory and file names Parameter values are determined at run time 01/07/09 Sayrite Inc.

Short and long descriptions .Is a stage on the tool palette .Shows on the job GUI (work area) 01/07/09 Sayrite Inc.Adding Job Documentation   Job Properties .Shows in Manager Annotation stage . 72 .

Job Properties Documentation 01/07/09 Sayrite Inc. 73 .

Annotation Stage on the Palette 01/07/09 Sayrite Inc. 74 .

75 .Annotation Stage Properties 01/07/09 Sayrite Inc.

Final Job Work Area with Documentation 01/07/09 Sayrite Inc. 76 .

. To compile it.Compiling a Job Before you can run your job. you must compile it. click File>Compile or click the Compile button on the tool bar. 01/07/09 77 Sayrite Inc. The Compile Job window displays the status of the compile. A compile will generate OSH.

01/07/09 Sayrite Inc. This can be lengthy for parallel jobs. This will highlight the stage in error. 78 . Click More to retrieve more information about the error.Errors or Successful Message If an error occurs: Click Show Error to identify the stage where the error occurred.

79 .Intro Part 5: Running Jobs 01/07/09 Sayrite Inc.

80 .Set to run options .Module Objectives After this module you will be able to: .View job log messages 01/07/09 Sayrite Inc.Use DataStage Director to runyour job .Validate your job .Monitor your job's progress .

Prerequisite to Job Execution 01/07/09 Sayrite Inc. 81 .

82 .DataStage Director    Can schedule. validating. and run jobs Can be invoked from DataStage Manager or Designer Tools > Run Director 01/07/09 Sayrite Inc.

01/07/09 83 Sayrite Inc. select it and then click Job>Run Now. Then click green arrow to execute job.Running Your Job This shows the Director Status view. To run a job. Better yet: Shift to log view from main Director screen. .

Parameters and Limits 01/07/09 Sayrite Inc.Run Options . 84 .

 Verifying that SOL statements used to select data can be prepared. The Status column displays the status of the job run.Parameters and Limits The Job Run Options window is displayed when you click Job>Run Now. You can validate your job before you run it. 85 . 01/07/09 Sayrite Inc.  Verifying that files can be opened.  A certain number of warning messages.Run Options . This window allows you to stop the job after:  A certain number of rows. Click Run to run the job after it is validated. These include:  Verifying that connections to data sources can be made. Validation performs some checks that are necessary in order for your job to run successfully.

and aborting of a job. error messages. finishing. These events include control events. such as the starting. 01/07/09 Sayrite Inc. informational messages.Director Log View Click the Log button in the toolbar to view the job log. and programgenerated messages. warning messages. The job log records events that occur during the execution of a job. 86 .

87 .Message Details are Available 01/07/09 Sayrite Inc.

88 .Other Director Functions    Schedule job to run on a particular date/time Clear job log Set Director options .Row limits .Abort after x warnings 01/07/09 Sayrite Inc.