You are on page 1of 23

DataStage – Parameters – Schema Files

Job Properties
• View/Modify from Designer/Director

• General Properties
• Job Descriptions
• Enable RCP – To be discussed shortly
• Multiple Instances – To allow parallel execution of multiple instances. Job Design must ensure that there is no conflict, e.g. writing into the same file,
etc.
• Before/After Job Subroutine

August 7, 2021 2
Job Properties

• Before/After Job Subroutine


• Execute subroutine – Creation/definition of these to discussed later
• Does not return data values
• After job routine can be conditionally executed based on successful job execution

August 7, 2021 3
Recap Of Job Parameters

• Defining through Job Properties > Parameters


• Used to pass business & control parameters to the jobs
• Recap of sample usage:

#XXX#
Usage as stage parameter for string substitution

Default value optional

Direct usage for expression evaluation

August 7, 2021 4
Recap Of Job Parameters

Setting Parameter Values


• Passed by calling sequence**/script/program
• If value set is by calling program, this will overrides default value
• If no default value, calling sequence/script MUST set parameter, else job fails
Used For
• Flexibility – Change business parameters
• Reuse– Run same job with different parameters to handle different needs
• Portability – set path, user name, password, etc. according to the environment

• Some Common Parameters

• Run Date
• Business Date
• Path
• Filename suffix/prefix (input/output)
• User Name/password
• Database connection parameters, DSNs,
etc.
• Base currency, etc.

** - To be discussed later

August 7, 2021 5
Recap Of Job Parameters

• Can also set/override Environment Variables Values - valid only within the job

August 7, 2021 6
OSH – Orchestrate Shell Script

• Orchestrate Shell Script that is compiled by the engine

August 7, 2021 7
• Some DS Features

• Schema Files
• Schema Files & RCP

August 7, 2021 8
Schema Files
• Alternative way to specify column definitions for data used in EE jobs
• Written in a plain text file
• Can be imported into the DataStage Repository
• Creating a Schema
• Using a text editor
• Follow correct syntax for definitions
• Import from an existing data set or file set
• Manager import > Table Definitions > Orchestrate Schema Definitions
• Select checkbox for a file with .fs or .ds
• Import from a database table
• Create from a Table Definition
• Click Parallel on Layout tab

August 7, 2021 9
Schema Files

• Schema file for data accessed through stages that have the “Schema Files” property, e.g.
Sequential File
• Sample Use
• if source file format may change without functional impact to the DS code
• say columns inserted, reordered, deleted, etc.
• Job access the file only through the definition in the schema file
• Schema file may be changed without affecting the job(s)

August 7, 2021 10
RCP - Runtime Column Propogation

• Supports partial definition of meta data.


• Enable RCP to
• Recognize columns at runtime though they have not been used within a job
• Propagate through the job stream
• Design and compile time column mapping enforcement
• RCP is off by default
• Enable
• project level. (Administrator project properties)
• job level. (Job properties General tab)
• Stage. (Link Output Column tab)
• Always enable if using Schema Files
• To use RCP in a Sequential stage:
• Use the “Schema File” option & provide path name of the schema file
• When RCP is enabled:
• DataStage does not enforce mapping rules
• Danger of runtime error if incoming column names do not match column
names outgoing link
• Columns not used within the job also propagated if definition exists
• Note that RCP is available for specified stages

August 7, 2021 11
RCP & Schema Files Demonstrated

• Consider this requirement statement:


• Regional_Sales.txt is a pipe-delimited sequential file
• It will contain
• Region_ID
• Sales_Total
• Job must read this file and compute
• Sales_Total_USD = Sales_Total*45
• Write the data into
• data set Regional_Sales.ds

So far a simple job will do

August 7, 2021 12
RCP & Schema Files Demonstrated

• Refinement Case 1
• The input file may in the future
• include extra columns that are not relevant to the requirement, these must be
dropped/ignored by the job
• The record format may change, e.g. become comma delimited, order in which the fields
appear may change
• The job must be capable of accepting this input file without impact
• To Do
• Define a schema file to define the input file & point to it within the sequential file stage

record
{final_delim=end, record_delim='\n', delim='|', quote=double,
charset="ISO8859-1"}
(
REGION_ID:int32 {quote=none};
SALES_TOTAL:int32 {quote=none};
)

August 7, 2021 13
RCP & Schema Files Demonstrated
• To Do
 Column definition will define all columns that must be carried through to the next stage
 Column definition column name must match those defined in the schema file
 Ensure RCP is disabled for the output links

record
{final_delim=end, record_delim='\n', delim='|', quote=double, • When the input format changes
charset="ISO8859-1"}
( • ONLY the schema file must be modified!
REGION_ID:int32 {quote=none}; • Data Set will always contain the columns for
SALES_CITY:ustring[max=255]; which the definition is included within the stages
SALES_ZONE:ustring[max=255]; as well as the computed field
SALES_TOTAL:int32 {quote=none};
)

August 7, 2021 14
RCP & Schema Files Demonstrated
• Refinement Case 2
• The input file may in the future include extra columns BUT THESE MUST BE CARRIED ON into
the target DataSet as it is
• The job must be capable of accepting this input file without impact

• To Do
• Define & use schema file
• Ensure RCP is enabled at the project level as well as for all output link along which data is to be
propagated at run time
• Define all columns that require processing
• Other columns may of may not be defined
 In this case, Region_ID need not be defined in the stage
 But if a column is defined and found missing from schema &/or data file at run time, the job
will abort!

• When the input format changes


• ONLY the schema file must be modified!
• Data Set will contain ALL columns in the schema, unless explicitly accessed & dropped within the job plus the computed field
August 7, 2021 15
• Some DS Features

• Shared Containers
• Shared Containers & RCP

August 7, 2021 16
Shared Container

• Reuse of logic across jobs in a project


• Set of stages that provide a frequently performed sequence of operations
• Included at compile time within the logic of the job within which it is invoked
• On change, job(s) must be recompiled
• Accepts parameters passed by calling job
• Can be used along with RCP features to provide higher reuse
• Care must be taken to ensure that no deadlocks or other data integrity issues are introduced through
shared logic being invoked simultaneously
• Server job functionality can be embedded within the Parallel Job
• Invoked multiple times to allow the process to function in parallel
• Note that parallel containers cannot be invoked within a server job
• Local containers
• are for making the job look less complicated
• Cannot be invoked from other jobs
• Can be converted to shared containers
• Can be deconstructed to embed log with the job itself
• Shared containers
• Can be converted to a local container within a specific job, while still retaining the original
shared container definition
• Can be deconstructed

August 7, 2021 17
Shared Container

• Consider validation of geography


• Region_ID must exists in Region_Master.txt & Zone must exist in Zone_Master.txt
• This rule is applicable for various streams including Regional_Sales and Employee_Master
• Basic Solution:
• Create individual jobs to lookup each source against the master files

• Refined Solution
• Create a Shared container – “Validate Geography”
• Select the stages that are to be shared
• Select menu item Edit > Construct Container > Shared

August 7, 2021 18
Shared Container

• Refined Solution
• Create a Shared container – “Validate Geography”
• Select the stages that are to be shared
• Select menu item Edit > Construct Container > Shared
Stages replaced by a single icon
representing the shared container

• Input & Output stubs to represent the number


of links into & out of the container.
• Note that 2 output links are expected
• The parameters used within the stages
automatically inherited as the container’s
properties
This is sufficient if
• All input files/streams on which validation must occur have the same column metadata &
• All jobs using the container have the same link names going in & out of the container

August 7, 2021 19
Shared Container
• To make it truly reusable
• Within the Shared Container Definition
• Rename the link & columns names to generic names
• Ensure that the stage defines only the fields used within the processing, in this case, Zone & Region
• Ensure RCP is enabled on the output links. This ensures that all fields in the input are passed on to
the output

Input stage contains multiple fields.


Lookup stage within the container
contains only the required fields
RCP is enabled on the output link

August 7, 2021 20
Shared Container

• Within the job(s)


• Column names that are used within the container must be mapped or modified before/after the
shared container is invoked
• Output links of the shared container must have RCP set
• The shared container icon within the job must be opened & input/output in the job must be
mapped against the corresponding link name in the container
• Note that parameters required by the stages within the container must be set through the
container’s invocation stage in the calling job

August 7, 2021 21
Shared Container

Container can be reused as shown to validate the geography information of the employee-master file

Metadata different but handled through RCP.

Link Name Mapping


RCP Enabled on output links

Column name match with container

August 7, 2021 22
Shared Container

• If in the future

• Say if, geography validation does not validation of Zone then


• Change only the shared container
• Recompile all jobs that invoke the container

• Say if, the city must also be validated


• Provided all the inputs contain the required field,
• Change only the shared container
• Recompile all jobs that invoke the container

August 7, 2021 23

You might also like