DataStage Training Day 1

DataStage
Training Day 1
1-1
Course Prerequisites
To gain full benefit from this class, you should have a strong working knowledge of:
Relational database management systems (RDBMS) Structured Query Language (SQL) Windows GUI Open Database Connectivity (ODBC)
1-2
What is DataStage?
Datastage is a comprehensive tool for the fast,easy creation and maintenance of data marts and datawarehouse.
1-3
DataStage Architecture
DataStage Administrator Repository Manager DataStage Designer DataStage Director
UniVerse Objects SERVER DataStage "Engine" Repository
Connections to data sources, e.g. ODBC, native API, FTP, etc.
1-4
Client Components
Designer:It creates Datastage jobs that are compiled into executable programs. Director: used to run and monitor the Datastage jobs. Manager:A user interface to edit and view contents of the repository and perform import and export of DataStage Components. Administrator: A user interface to perform administrative jobs like setting up users/adding projects.
1-5
DataStage Designer
1-6
Manager and Director
1-7
Repository
Proprietary file format. Repository is created for each project. Single Server can handle multiple repositories. Stores all source/target definitions. Can be imported/exported in an XML format. Repository Contents.
Data Elements. Jobs. Routines. Shared Containers. Stage Types. Table Definitions. Transforms.
1-8
DataStage Administrator
Creating & attaching a project to DataStage Executing a UniVerse Command Cleaning Up a Project License Permissions Auto-Purge of Log Files Inactivity Timeout User for Scheduled Jobs User Name Troubleshooting
1-9
DataStage Designer
Design Area
Stages repository
1 - 10
DataStage Server Stages

Is like a station which performs certain tasks. E.g. Read data, Aggregate Data Each Stage has its own editable properties Stages used most often are: ODBC Stage Sequential file Stage Transformation Stage Aggregator stage
1 - 11
ODBC Stage
ODBC stage can be used to extract,write and aggregate data ODBC stage can have any number of inputs and outputs. The three editable properties for this stage are Stage, Inputs, Outputs Note:DSN has to be defined on the server machine as a system DSN. Input: When data is added/Loaded to a table Output: When data is read/Extracted from a table
1 - 12
ODBC Stage - Stage Properties

Stage Name: Stage Name to identify that particular stage. You can add the DSN name from the drop down list. Specify valid username and password for the same. ( Can be used as a job parameters entered during run time.) Specify the database specific quote character. You can choose the character map for the ODBC support ( necessary for supporting various languages)
1 - 13
ODBC Stage - Output Properties (Extract)
Add multiple tables which have been imported before
Define SQL Type Generated Query: Depending upon the columns chosen User Generated Query ( SQL Over Ride) Stored Procedure Read.
1 - 14
Preview data
ODBC Stage - Output Properties Contd..
Specify joins if multiple tables are used or any clause to restrict data being extracted
Verify the SQL being generated to extract data. Can be user defined or generated
Isolation Level Specify a suitable transaction isolation level. This specifies how potential conflicts between transactions (i.e., dirty reads, nonrepeatable reads, and phantom reads) are handled. Choose from: None. The default isolation level for the database is used. Read Uncommitted. Dirty reads, nonrepeatable reads, and phantom reads are all possible.
Read Committed. Dirty reads are not possible. Nonrepeatable reads and phantom reads are both possible. Repeatable Read. Dirty reads and nonrepeatable reads are not possible, but phantom reads are possible. Serializable. Transactions are serializable. Dirty reads, nonrepeatable reads, and phantom reads are not possible. Versioning. Transactions are serializable, but have a higher concurrency than with Serializable. Versioning is implemented using a non-locking protocol. Auto commit. Each write is committed separately. Do not use this in conjunction with transaction groups
1 - 15
ODBC Stage - Extracting Data

Add table definitions before you can incorporate them in ODBC stage You can extract multiple tables from the same ODBC stage by specifying joins in the selection clause You can have user defined SQL or used stored procedures to extract data The columns loaded can be from multiple tables, but column names have to be unique You can preview data All the columns loaded in the stage are owned by the link which connects it to the next stage.
1 - 16
ODBC Stage - Input Properties (Load)

The input stage properties remain the same as the output properties. You choose the table name if the table definition has been imported before or create table in the target ODBC Choose the appropriate Update Action required.
Update action Specifies how the data is written. Choose the option you want from the drop-down list box: Clear the table, then insert rows. Deletes the contents of the table and adds the new rows. Insert rows without clearing. Inserts the new rows in the table. Replace existing rows completely. Deletes the existing rows, then adds the new rows to the table. Update existing rows only. Updates the existing data rows. If a row with the supplied key does not exist in the table then the table is not updated but a warning is logged. Insert new or update existing rows. New rows are added or, if the insert fails, the existing rows are updated. Update existing or insert new rows. The existing data rows are updated or, if this fails, new rows are added. Call stored procedure. The data is written using a stored procedure. When you select this option, the Procedure name field appears. User-defined SQL. The data is written using a user-defined SQL statement. When you select this option, the View SQL tab is replaced by the Enter SQL tab.
1 - 17
ODBC Stage -Input Properties Columns
Right click
Columns can be defined by adding a new row on the columns menu and editing the row contents You can Specify data types keys etc If the target table is already defined and imported you can load the table definition as you did before. Column definitions can also be used from the incoming link to this stage.
1 - 18
ODBC Stage - Create Table Option

You can define the ddl for table creation or it can be generated by the tool . You can choose to drop table before creating if necessary
1 - 19
ODBC Stage - Transaction Handling

Isolation levels are chosen as discussed before. Rows Per Transaction: This is the number of rows which are written before the data is committed to the data table. The default value is 0 signifying that al the rows are written before data is committed to the table Parameter Array Size: this is the number of rows written at a time. Default is 1signifying that each row is written in a separate operation . This should be changed/tuned depending upon the storage space required. Note: if the above two settings conflict the Rows per transaction takes precedence.
1 - 20
ODBC Stage - Recap Define the connection Define/import the tables and the table metadata. Choose one of the following to read or write data
User Defined Generated Stored procedure
1 - 21
Sequential File Stage

Used to extract/read data or write data to a Flat file. The text file can be read/created from any file on the server or any drive mapped on the server. Can have any number of outputs and inputs
1 - 22
Sequential File Stage - Defining Input Data

When you write data to a sequential file stage it has an input link. The properties are defined on the inputs page of the stage
File name
Update Action
Note: The file has to imported in the repository before using it in this stage. The procedure is similar to importing ODBC table definitions Backup existing file creates a back up before overwriting or appending to it. This can be used if a job is reset during execution
1 - 23
Sequential File Stage - Input Data Format

Format contains parameters that define the format of data in the file
Fixed width Column: If this is chosen the data is written in fixed width columns and the width is calculated according to the Display in column definition First line is column name: Omit last new line: Omits the last new line character in the file Flush after every row: Used for named pipes only. Records between reader and writer are handled one record at a time Delimiter:Specified if fixed width is not specified. Quote Characters: Default Padding Default Null String
1 - 24
Sequential File Stage - Output Data Format

The general page and the columns page remains the same for both read and write of data
Additional Properties
Missing Column Action: Allows you to specify the action for columns missing in the data being read. Options are Pad with Null Pad with empty string Map empty String Suppress for Truncation warnings: This is used if you are reading only three columns from a file which has more than three columns defined.
1 - 25
Aggregator Stage
Aggregator Stage classifies data rows from single input link into groups and computes totals or other aggregate functions. The sum/totals are output from the stage.You can only have on input link but many output links if the data has to be grouped in different ways.
Aggregator is an Active Stage Before/After Routines: These are routines which have been included in the repository and can be executed before or after the stage. E.g.: run a dos command or send email.
1 - 26
Aggregator Input Data

Data to be aggregated is passed from a previous stage link and that is used as input for the aggregator stage.The columns passing out of the previous stage are reflected on input columns tab. All the column header except Sort and Sort order are discussed previously
Sort:Aggregator stage performs the grouping of columns in memory. If the input data coming from an ODBC or Sequential file stage is sorted ( using order by clause), there are performance efficiencies which can be gained. This column defines the sort column position. E.g. order by PRODUCT_ID,DATE_ID Here the sort key for product column is 1 and for the date column is 2 Sort Order: Ascending( default)/Descending
1 - 27
Aggregator Output Data
Derivation editor
Note:Each column in the input stage must either be grouped or summarized. The standard columns for data type appear with two changes: Group: Specifies whether a give column is used in the group by clause. Derivation: Each input column needs to have a derivation for this stage.This will be discussed in the next slide.
1 - 28
Aggregator Stage - Derivation Editor

Iteratively choose each column and define whether to use a group by or use some aggregation function from the drop down list
Click if you want to group by a column
Note: The functions available change according the underlying column data type.
1 - 29
Transformation Stage
Handles extracted data to perform conversions and pass data to another transformation
stage or other DataStage stage Can have any number inputs or outputs. Has one primary data link and many reference links
Double click the stage to open the transformation editor
Link Area Metadata Area
1 - 30
Transformation Editor
Link Area: Is where all the column definitions.expressions/stage variables and constraints
are defined.The link area is divided in two parts the left pane shows you the input links and the right pane shows the output columns as well as their derivations,constraints and stage variables. Each of them will be explained in detail. Metadata Area: This shows the column metadata again split in two panes one for the input columns and one for the output columns. Here you can edit the columns as u seem necessary for their properties. Shortcut Area:
1 - 31
When you use the transformation stage you have already defined what input data has to be passed and changed. The input columns are transformed in the transformation editor on the output pane for each column chosen. Input Links: This stage has one primary link which is used to get bulk of the data to be transformed.It can also use many other links as reference links which will affect how the data is to be transferred. (Look Up Activity) Reference Links:You can define key expressions for columns on the input columns on the reference links. The most common expression is specifying an equii-join between the primary table and reference table
Primary Link
Transformer Concepts
Reference Link
Key Expression Linking two tables using a join

1 - 32
Transformation - Key Expressions

Key expressions are used to affect how the data flowing thru the stage is affected. They are defined for reference link key columns. A column has to be defined as a key column by changing that field to yes in the reference link metadata area
CAT_ID is changed to a key column and then its key expression is defined in the link area
A Join is specified between the primary link column CATEGORY_ID and the reference link key column CAT_ID The key expressions are validated by the expression editor. If they are incorrect they appear as red denoting error.
1 - 33
Transformation - Output Column Expressions

What can you do with the transformation stage?
Add delete Columns: You can add new columns by right clicking the output pane in the Link area and then defining the column metadata in the metadata area. The new column derivations can be custom strings/derivations/concatenations/standard functions applied on input columns or stage variables. We will discuss them shortly. Move/edit Columns: In the output pane in the link area you can move the columns up and down by dragging. Also in the metadata area you can edit the column definitions as per requirements. Before/After Routines: This is an active stage hence you can specify routines to be executed by passing suitable input values as discussed before.
1 - 34
Transformation Stage - Expression Editor

Expression editor allows you to define expression for columns which have to be modified in the transformer stage. What can you do with the expression editor?
Where can you open/use the expression editor?
Entering Expressions: Whenever the insertion point is an expression box you can right click to open the suggested operand menu. This allows you a rich support of defining various elements and using predefine transformations/routines and functions. Also allows you to define expressions based on stage variables. Continued
1 - 35
Transformation Stage -Expression Editor

Suggested Operand menu for Transformer stage DS Macro: They are provided with DataStage and require no arguments you can select them from a drop down list DS Functions:Provides a list of available DataStage functions which can be used. DS constant: A list of predefined DataStage constants
1 - 36
Transformation Stage -Expression Editor

Suggested Operand menu for Transformer stage DS Routines: They are defined in the repository. You can use your own routines or use the ones provided by DataStage. DS Transforms:Provides a list of available DataStage transforms. Replace the %Arg% with real arguments required by that particular transformation in the required format. They can be viewed in the Manager.
Please refer the manual for detailed description of each transform/routines

1 - 37
Transformation Variables
You can define local variables for a given transformation stage. These can be used to define key expressions or the output column derivations.They can be also uses as follows
Stage variables are shown in the output pane of the transformations stage.
1 - 38
Transformation Constraints
You can define limits for output data by specifying constraints in the transformation stage. Constraints are BASIC expression which can be defined for each output link of a transformer stage. You can also specify reject links for rows which havent been written to any output stage. To define constraints do the following
Similar dialogue box appears.
Specify the constraints expression as desired. This can be used while writing data for partitioned tables on certain keys. ( E.g. Key_ID> 10) or any other filtering criterion desired.
1 - 39
Folder Stage
Folder stages are used to read / write data as files in a directory located on a DataStage Server Folder stage can read multiple files from a single directory and can deliver the files to the job as rows on an output link
Plug-in Stage
Plugin stages are mostly used for access of external programs that can be leveraged by the DataStage applications. DataStage supports a number of plug-in stages to perform special tasks. They have to be installed separately on the server and have to be configured before use. After installing these stages they have to be registered on the server system before being used. E.g.: ORABULK: bulk loading of data in Oracle database. BCPLOAD: Bulk load for MS SQL Server. Each plug-in stage has its own configurable properties which can be checked from the manuals. Please refer them if you plan to use it.
1 - 40

DataStage Training Day 1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DataStage Training Day 1

Uploaded by

Copyright:

Available Formats

DataStage

UniVerse Objects SERVER DataStage "Engine" Repository

Connections to data sources, e.g. ODBC, native API, FTP, etc.

Manager and Director

DataStage Server Stages

ODBC Stage - Stage Properties

ODBC Stage - Output Properties (Extract)

Add multiple tables which have been imported before

ODBC Stage - Output Properties Contd..

ODBC Stage - Extracting Data

ODBC Stage - Input Properties (Load)

ODBC Stage -Input Properties Columns

ODBC Stage - Create Table Option

ODBC Stage - Transaction Handling

Sequential File Stage

Sequential File Stage - Defining Input Data

Sequential File Stage - Input Data Format

Sequential File Stage - Output Data Format

Aggregator Input Data

Aggregator Output Data

Aggregator Stage - Derivation Editor

Click if you want to group by a column

Link Area Metadata Area

Key Expression Linking two tables using a join

Transformation - Key Expressions

Transformation - Output Column Expressions

Transformation Stage - Expression Editor

Where can you open/use the expression editor?

Transformation Stage -Expression Editor

Transformation Stage -Expression Editor

Please refer the manual for detailed description of each transform/routines

Similar dialogue box appears.

You might also like