This action might not be possible to undo. Are you sure you want to continue?
Document Reference Version Date of Issue Reason for Issue
DataStage Proposed Naming Standards 1.1
Updated for V8.1 new stages
Produced by Authorised
Data Integration Platform
05/02/2012 DataStage Development Standards and Guidelines Page 1 of 17
1. peer review etc) Initial Draft Version History 05/02/2012 DataStage Development Standards and Guidelines Page 2 of 17 .1 Date Author Suresh R Status (draft. Version History This table records the status and version history of this document. Version 0.
...... Introduction.............2 Table Definitions Naming Conventions..5 4................1 DataStage Objects Naming Standards......5 4.....................1 Introduction to SQL.........................16 05/02/2012 DataStage Development Standards and Guidelines Page 3 of 17 ..1...............................1..............................................6 4..................9 5................... Table of Contents.2 Introduction To Infosphere Datastage.............................. Table of Contents 1.......................3 Introduction To UNIX Operating System ........................................................................................................................... Version History..........4 4....7 5......................................1..............................................................3 3..............................2. DataWarehousing Training Schedule....... Development Standards........................................9 5........................................................................................................2 2..............................
While this document contains some recommendations specific to release 8.1. Introduction This document provides the set naming standards IBM DataStage jobs. 05/02/2012 DataStage Development Standards and Guidelines Page 4 of 17 . most of the topics will be appropriate for future releases and training schedule for DataWarehousing resources.3.
and views SQL Basic SQL Functions SQL Advanced SQL Like SQL Wildcards SQL In SQL Between SQL Alias SQL Joins SQL Inner Join SQL Left Join SQL Right Join SQL Full Join SQL Union SQL Not Null SQL Unique SQL Primary Key SQL Foreign Key SQL Drop Day 110 SQL Intro SQL Syntax SQL Select SQL Distinct SQL Where SQL And & Or SQL Order By SQL Insert SQL Update SQL Delete SQL avg() SQL count() SQL first() SQL last() SQL max() SQL min() SQL sum() SQL Group By SQL Having SQL ucase() SQL lcase() SQL mid() SQL len() SQL round() 05/02/2012 DataStage Development Standards and Guidelines Page 5 of 17 . DATAWAREHOUSING TRAINING SCHEDULE 4.4.1 Introduction to SQL Days Introduction to SQL What Can SQL do? • SQL can execute queries against a database • SQL can retrieve data from a database • SQL can insert records in a database • SQL can update records in a database • SQL can delete records from a database • SQL can create new databases • SQL can create new tables in a database • SQL can create stored procedures in a database • SQL can create views in a database • SQL can set permissions on tables.1. procedures.
Extracting and loading data . Day 5. contains job designs. Designing jobs .4. 11. SCD implementation in Datastage . Datastage jobs in Canada Lands Project . Day 3. Day 16. mappings. screenshots and sample data.1. Performing lookups in Datastage .datastage palette .step by step guide on how to implement the ETL process efficiently in Datastage. Covers ODBC input and output links. text files. Transforming and filtering data .a set of examples of job designs resolving real-life problems implemented in production datawarehouse environments in various companies. and how to develop and use the Containers.description and use of the ODBC and ORACLE stages (ORAOCI9) used for data extraction and data load.the Day illustrates how to implement SCD's (slowly changing dimensions) in Datastage.a sample Datastage job which processes a textfile organized in a header and trailer format. validations and data refining. Day 9. Extracting and loading data .the Day contains an overview of the datastage components and modules with screenshots. Day 2.use of transformers to perform data conversions. 10. 7. Day 4. 6. Datastage-modules .a list of all stages and activities used in Datastage. Oracle update actions and best practices. 14. Header and trailer file processing . 13.2 Introduction To Infosphere Datastage Day 1.ODBC and ORACLE stages .how to use database stages as a lookup source. CSV files) in datastage. 05/02/2012 DataStage Development Standards and Guidelines Page 6 of 17 . All the Slowly Changing Dimensions types are described in separate articles below: SCD Type 1 SCD Type 2 SCD Type 3 and 4 Day 15.sequential files . Day 8. Day 12.description and use of the sequential files (flat files. Contains tips on how to design and run a set of jobs executed on a daily basis. Implementing ETL process in Datastage . Design examples of the most commonly used datastage jobs.
• • • File system security (access rights) Changing access rights Processes and Jobs 05/02/2012 DataStage Development Standards and Guidelines Page 7 of 17 Listing files and directories Making Directories Changing to a different Directory The directories. And .3 Introduction To UNIX Operating System • • • • What is UNIX? Files and processes The Directory Structure Starting an UNIX terminal Day 1.4. • • • • • Day 3. Pathnames More about home directories and pathnames Copying Files Moving Files Removing Files and directories Displaying the contents of a file on the screen Searching the contents of a file Redirection Redirecting the Output Redirecting the Input Pipes Wildcards Filename Conventions Getting Help . • • • • Day 4.. • • • • • • Day 2.. • • • Day 5.1.
• • Day 6. • Listing suspended and background processes Killing a process Other Useful UNIX commands 05/02/2012 DataStage Development Standards and Guidelines Page 8 of 17 .
1 DataStage Objects Naming Standards DataStage Object: Projects Naming Standard XXXXXXXX-<purpose> Description Examples: • CGSBM-Dev_Phase1 • CGSBM-Test • CGSBM-Prod Project names may be up to 18 characters long & contain underscores and/or hyphens. Development Standards This section outlines the standards that must be followed for DataStage Job development on all Proje 5.5. DataStage Object: Categories Naming Standard Description 05/02/2012 DataStage Development Standards and Guidelines Page 9 of 17 .
Category Hierarchy (nonstandard sub-categories are broken out): • Data Elements • Jobs QA EXT EXT 1000 EXT 2000 Eg’s. Investment (INV). CHQ. CUS. Jobs increase in 10s and the first job in a module is job 10. DFN or SHD. however it is strongly recommended that they follow the module category convention as per the QA area. INV. Scheduler (SHD). Each Developer is free to use whatever subcategories they wish in their own category. Automated request (ARQ). The QA area is where jobs are placed to be logged into version control for system testing. LNK. Under the users area each developer has a subcategory named as their unix login. Cheque (CHQ). Standing Order (STO) . USERS and QA The USERS area is where all development is carried out. • NNNN is the job number which must start in the same 1000’s range as the module number. STO. The module names are the functional areas with a four letter numeric suffix in the thousands range Eg EXT1000. • DataStage Object: Jobs Naming Standard jbFFFNNNNxyz Description • “jb is a required prefix. ARQ. e. Once a developer has completed development and testing of their module they will move it into the QA area into the relevant subcategory. Customer (CUS). TAX. 05/02/2012 DataStage Development Standards and Guidelines Page 10 of 17 . • “FFF” is the functional area code of the job. INI.Tax (TAX).g. Each functional area is further subdivided into categories corresponding to the module names. Daily Function (DFN). EXT. In this example the QA area is subdivided into functional areas: Extraction (EXT). Developer unit testing is performed in the users area. Initialisation (INI). Link (LNK). EXT2000 would be used to contain jobs in modules EXT1000 and EXT2000 respectively. INI CUS LNK CHQ TAX INV ARQ STO DFN USERS User1 TestJob1 TestJob2 User2 There will be two top level job categories.
<source/target name> is applicable to passive stages and would consist of the table or file name. The sequence for an entire functional area would not require this. required for sub sequncers that run just the module. • Examples: sqEXT – the sequencer for extraction sqCUS2000 – the sub-sequencer for the second Customer module DataStage Object: Stages Stage Type Aggregator Basic Transformer Change Apply Change Capture Checksum Column Export Column Gen Column Import Combine Records Compare Stage ID ag bt ca cc cs cx cg ci cr cm <action verb> is applicable to active stages to indicates what action the stage <Stage ID> indicates the type of stage. e.• • xyz is the an optional descriptive purpose using Init Caps Notation.g “AccountsFile”. Description <Stage ID> + <source/target name OR action verb> + <”S” or “T”> 05/02/2012 DataStage Development Standards and Guidelines Page 11 of 17 . Examples: jbEXT1010AccountsFile – first extraction job in module EXT1000 (the first EXT module) DataStage Object: Sequencers Naming Standard sqFFFNNNN Description • “sq” is a prefix to identify Sequencers. A sequencer name in DataStage can only be started with an alphabetic character • • “FFFis the functional area NNNN is the optional module number.
05/02/2012 DataStage Development Standards and Guidelines Page 12 of 17 . Passive Stages: Source and/or Target Stages that connect to sources of data including relational database.Complex Flat file Compress Containers Containers Copy Dataset Decode Dedup Difference External Source External Target Fileset Filter FTP Funnel Generic Hashed Head Join Lookup Lookupfileset Make Subrecord Make Vector Merge Modify MQSeries OraBulk OraOCI Parallel Transformer Peek Pivot cf co lc local container sc shared container cp ds de dd di es et fs fi ft fu ge hf hd jn lu lf ms mv mg mo mq ob or xf pk pt is performing. Stages can be categorized into the following types: 1. 2. odbc. hashfiles. “S” or “T” is a suffix applicable to passive stages to indicate whether the stage is a source or target Examples: or_IssueRatingS (Oracle OCI Source) xf_FilterReformats (Filters Reformat Transformer). This type of stage is used principally as a translation mechanism for any transformation logic. Lookup or Reference Stages that connect to a Look up or Reference data source. It can also be used to pass information from one job to another. sequential files. Active Stages: Transformation Stages that contain the bulk of the transformation logic and business rules. datasets. Passive stages use verbs to describe their action and the source and/or target to which the action applies.
Examples: • lkSortAccountsData 05/02/2012 DataStage Development Standards and Guidelines Page 13 of 17 .Prompte Subrecord Remove Duplicates Row Gen Sample Sequential File Slowly Changing Dimension Sort Split Subrecord Split Vector Surrogate Key Generator Switch Tail TeraBulk Teradata Transformer Wr Range Map ps rd rg sm sq sd st ss sv sk sw tl tb td xf wr DataStage Object: Stage Variables Naming Standard sv<VARIABLENAME> Description sv to denote the Stage Variable. DataStage Object: Links Naming Standard lk<action verb>Yyyyyy Description “lk” is a prefix to identify Link. <action verb> describes what is happening to the data flowing through the link “Yyyyyy” is an optional name of the table definition or field or other specific instance of a general stage object flowing through this link. then the variable name in UPPERCASE.
• Reference Link: A link representing a table (data) lookup. DataStage Object: BUILD OPS 05/02/2012 DataStage Development Standards and Guidelines Page 14 of 17 . A routine accepts a series of arguments and returns an answer. All links are named with respect to the source of the data that is on the link. It is used to provide information that might affect the way data is changed. arg2. Argument names should be changed from the default arg1.. This link is represented by dotted line. arg3. In other words. Server jobs supports two types of links: • Stream Link: A link representing the flow of data from source stage to transform stage and target stage. If the routine that includes an Action argument then the Action argument should accept relevant names indicating the return answer. not the input stage. Routines should be used for complex transformations. DataStage Object: Routines Naming Standard <prefix>Xxxxxxxxx Description • “rt” is a prefix to identify a routine • • “rtx” is a prefix for an external routine “Xxxxxxxxx” is a description of what is the function of the routine. • Output/Input link to a passive stage should describe at a lower level (like table) the data that is flowing through it. all links will be named with respect to the output stage of that link. but does not supply the data to be changed. A routine may include an Action argument to indicate different return answers. etc. This link is represented by solid line. to names relevant to the data the argument supplies.Links or workflows connect stages and carry data from sources through any transform stages into a target stage(s).
Rest of parameter is upper case. Naming Standard pORACLE_SID/ pDB2_SID Description Sid for Oracle/DB2. and excluding the file extension. Description Note that. DataStage Object: Table Definitions Metadata/Table definitions are described in the next section. Datasets directory. DataStage Object: Job Parameters all have lowercase ‘p’ prefix. i. Dataset stages must have the same name as the filename. prefixed by “ds”. the pORACLE_USER/ pDB2_USER Username to connect to Oracle/DB2. pORACLE_PASSWORD/ pDB2_PASSWORD pDSPATH pTARGET_FILE_DIR pEFFECTIVE_DATE Password to connect to Oracle/DB2.e.: ds<FunctionalArea><ModuleNumber><purpose> Eg : dsEXT1000SourceAccountsData. They can be used to implement specialist functionality that cannot be easily developed using a combination of the standard operators available in the GUI. Build ops are custom operators developed in C++. Effective date for processing DataStage Object: Logic & Functions 05/02/2012 DataStage Development Standards and Guidelines Page 15 of 17 . DataStage Object: Dataset Names Naming Standard Dataset fie-naming conventions are described in later sections. Directory for saving sequential files.Naming Standard boGgggggggg Description Build Op name starts with bo and “Gggggggggg”is a short description of its function. on the DataStage canvas.
EmpName From Emp_Table Where EmpNo Is NOT NULL Logical operators should be in upper case. sequential file or file set definition 3. IsNull(). Each word in the function should start with upper case. NullToValue(). Example: If (InputDate = ‘’) Then Step1 Else Step2 SQL statements used in the Database stage should follow the following standards. The first letter should be in upper case and the following letters in lower case. Imported from an external source or target such as a databases table or csv file or cfd. ‘NOR’ SQL Statements Logical Operators DataStage Functions etc.2 Table Definitions Naming Conventions Table definitions define the format of the data to be used at each stage of a DataStage job. ‘NOT’. They are stored in the Repository and can be copied by all the jobs in a project. 05/02/2012 DataStage Development Standards and Guidelines Page 16 of 17 . Example: Trim(). NullToZero() etc. NullToEmpty(). Saved from a user defined set of columns from a Dataset. Example: Select EmpNo. The first letter should be in upper case and the following letters in lower case.Naming Standard Logical Statements Description All the logical statements should follow the following standards. Table definitions are created in the following ways: 1. Manually created The original source meta data will be imported using the utility in DataStage manager. ‘OR’. Str(). These column definitions must not be changed. 2. 5. DataStage functions should be in InitCaps. Example: Logical operators like ‘AND’.
05/02/2012 DataStage Development Standards and Guidelines Page 17 of 17 .\Datasets\staging – contains the table defs of datasets that are stored between jobs within a functional area. \Datasets\deliver .contains the table defs of datasets that are produced from the output of a functional area that are intended to be used by a downstream functional area. Staging datasets must not be used to pass data across functional areas or as direct source for data to be sent through connect:direct.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.