You are on page 1of 14

What is Compiler in Datastage | Compilation process in datastage

Compilation is the process of converting the GUI into its machine code .That is nothing but machine understandable language. In this process it will checks all the link requirements, stage mandatory property values, and if there any logical errors.And Compiler produces OSH Code.

What is Modeling of Datastage | Modeling of Datastage


Modeling is a Logical and physical representation of Source system. Modeling have two types of Modeling Tools They are ERWIN AND ER-STUDIO In Source System there will be a ER-Model andin the Target system there will be a ER-Model and Dimensional Model Dimension:- The table which was designed for the client perspective. We can see in many ways in the Dimension tables. And there are two types of Models. They are Forward Engineering (F.E) Reverse Engineering (R.E) F.E:- F.E is the process starting the process from the scratch for banking sector. Ex: Any Bank which was required Datawarehouse. R.E:- R.E is the process altering existing model for another bank.

Advantages of Datamart
Datamart is the access layer of the datawarehouse environment. That means we create datamart to retrieve the data to the users faster. The Datamart is the subset of Datarehouse. That means all the data available in the datamart will be available in datarehouse. This Datamart will be created for the purpose of specific business. ( For example telecom Database or banking Database etc) There are many reasons to create Datamart.There is lot of importance of Datamart and advantages. It is easy to access frequently needed data from the database when reuired by the client. We can give access to group of users to view the Datamart when it is required. Ofcourse performance will be good. It is easy to maintain and to create the datamart. It will be related to specific business. And It is low cost to create a datamart rather than creating datarehouse with a huge space.

What are the types of Errors in Datastage.


You may get many errors in datastage while compiling the jobs or running the jobs. Some of the errors are as follows a)Source file not found. :-If you are trying to read the file, which was not there with that name. b)Some times you may get Fatal Errors. c) Data type mismatches.:-This will occur when data type mismaches occurs in the jobs. d) Field Size errors. e) Meta data Mismach f) Data type size between source and target different g) Column Mismatch i) Pricess time out. :-If server is busy. This error will come some time.

What are the client components in Datastage 7.5X2 Version


In Datastage 7.5X2 Version, they are 4 client Components. They are 1) Datastage Designer 2) Datastage Director 3) Datastage Manager 4) Datastage Admin In Datastage Designer, We Create the Jobs Compile the Jobs Run the Jobs In Director, We can View the Jobs

View the Logs Batch Jobs Unlock Jobs Scheduling Jobs Monitor the JOBS Message Handling In Manager , We can Import & Export the Jobs Node Configuration And by using Admin , We can Create the Projects Organize the Projects Delete the Projects

Server Components of Datastage 7.5x2 Version


There are three Architecture Components in datastage 7.5x2They are a) Repository b)Server( Engine ) c) Datastage Package Installer Repository:-Repository is an environment where we create job, design, compile and run etc. Some Components it contains are JOBS,TABLE DEFINITIONS,SHARED CONTAINERS, ROUTINES ETC Server( engine):-- Here it runs executable jobs that extract , transform, and load data into a datawarehouse. Datastage Package Installer:-- It is a user interface used to install packaged datastage jobs and plugins.

What is Fact Tables in Datawarehousing ? Give Example? Features of Datastage | Datastage Feautures
Datastage Features are 1) Any to Any ( Any Source to Any Target ) 2) Platform Independent. 3) Node Configuration. 4) Partition Parallelism. 5) Pipeline Parallelism. 1)Any to Any:-That means Datastage can Extrace the data from any source and can loads the data into the any target. 2) Platform Independent:-The Job developed in the one platform can run on the any other platform. That means if we designed a job in the Uni level processing, it can be run in the SMP machine. 3 )Node Configuration Node Configuration is a technique to create logical C.P.U Node is a Logical C.P.U 4)Partition Parallelism Partition parallelim is a technique distributing the data across the nodes based on the partition techniques. Partition Techniques are a) Key Based b) Key Less a) Key based Techniques are 1 ) Hash 2)Modulus 3) Range 4) DB2 b) Key less Techniques are 1 ) Same 2) Entire 3) Round Robin 4 ) Random 5) Pipeline Parallelism Pipeline Parallelism is the process, the extraction, transformation and loading will be occurred simultaneously. Re- Partitioning: The distribution of distributed data is Re-Partitioning. Reverse Partitioning: Reverse Partitioning is called as Collecting. Collecting methods are Ordered Round Robin

Sort Merge Auto

What is OLTP ? Uses of OLTP?


OLTP is nothing but Online Transaction Processing. It will be characterized by a large number of short online transactions. The main emphasis for OLTP system is to put on very fast query processing.In order to get the data faster to the end-users. And we use the Online transaction process for the fast process. And Oltp system is used for data integrity in multi access environments, and effectiveness measured by number of transactions per second.

How does Hash Partition works in Datastage


Hash Partition technique is used to send the rows with same key column values to the same partition. In Datastage, partition techniques are usually distributed into two types. They are a) Key based partition Technique b) Key Less Partition Technique In Key based partition technique a)Hash Partition Technique b)Modulus c)Range d)DB2 In Key less partition technique, they are a)Same b)Entire c)Round Robin d)Random Example for Hash partition technique: If we use a hash partition technique and if we have a sample data as below e_id,dept_no 1,10 2,10 3,20 4,20 5,30 6,40 Data will partitioned as below In 1st partition 10,10 In 2nd Partition 20,20 In 3rd Partition 30 In 4th Partition 40

How to create group id in Sort Stage in Datastage


Group ids are created in two different ways. We can create group id's by using a) Key Change Column b) Cluster Key change Column Both of the options used to create group id's . When we select any option and keep true. It will create the Group id's group wise. Data will be divided into the groups based on the key column and it will give (1) for the first row of every group and (0) for rest of the rows in all groups. Key change column and Cluster Key change column used based on the data we are getting from the source. If the data we are getting is not sorted , then we use key change column to create group id's If the data we are getting is sorted data, then we use Cluster Key change Column to create Group Id's

To do the sorting without sorting stage.


Take You can do it as normal process first as follows. If we want to read the data using Sequential File Design as follows : ------------Seq. File ------------------------Dataset File To read the data in Sequential file Open Properties of Sequential file and give the file name. Now you can give the file path, by clicking on the browse for the file. And in Options select True ( If the first line is column name ) You can just leave the rest of the options as it is. Now go to Column and click on load, then select the file you like to read from the table definitions . ( This file should be same which you have given in the properties. ) Now in the Target Dataset - Give file name. Now for the sorting process. In the Target Open Dataset properties And go to Partitioning ---- Select Partitioning type as Hash In Available Columns Select Key Column ( E_Id for EXAMPLE) to be sorted. Click Perform Sort Click Ok Compile And Run The data will be Sorted inthe Target.

How to Remove Special Characters data and load rest of the data
Here we are going to know how to remove Special characters data rows and load rest of the rows into the target. Some times we get the data with special characters added for some of the rows. If we like to remove those special characters mixed rows in the column. We can use Alpha function. Alpha Function is used for this Job. If we use "Alpha" function. It will drop the special characters mixed rows and loads the rest of the rows into the target. So you can take sequential file to read the data and you can take Transformer stage to apply business logic. In Transformer stage in Constrain you can write the Alpha function. And Drag and Drop into the Target. . Then Compile and Run.

Alpha Function
Checks if a string is alphabetic. If NLS is enabled, the result of this function is dependent on the current locale setting of the Ctype convention

Syntax
Alpha (string)

string is the string or expression you want to evaluate.


Remarks
Alphabetic strings contain only the characters a through z or A through Z. Alpha returns 1 if the string is alphabetic, a null value if the string is a null value, and 0 otherwise.

Examples
These examples show how to check that a string contains only alphabetic characters:
Column1 = * the "%" Column2 = * Column2 "ABcdEF%" character is non-alpha (If Alpha(Column1) Then "A" Else "B") set to "B"

Column1 = "" * note that the empty string is non-alpha Column2 = (If Alpha(Column1) Then "A" Else "B") * Column2 set to "B"

DataStage Functions
Following is the list of DataStage String Functions

Function AlNum Alpha

Usage Cab be used to check if the given string has alphanumeric characters TRUE if string is completely alphabetic

CompactWhiteSpace all consective whitespace will be reduced to single space Compare ComparNoCase ComparNum Compares two strings for sort Compare two strings irrespective of Case in-sensitiveness Compare the first n characters of the two strings

CompareNumNoCase Compare first n characters of the two strings irrespective of case insensitiveness Convert Count Dcount DownCase DQuote Field Index Left Len Replace character in a string with the given character. Count number of times a given substring occurs in a string Returns count of delimited fields in a string Change all uppercase letters in a string to lowercase Enclose a string in double quotation marks Return 1 or more delimited substrings Find starting character position of substring Finds leftmost n characters of string Length of the string or total number of characters in a string

Function Num PadString

Usage Return 1 if string can be converted to a number Return the string padded with the optional pad character and optional length Finds Rightmost n characters of string Returns a string which identifies a set of words that are phonetically similar Return a string of N space characters Covers a string into single quotation marks Repeat a string Return the string after removing all whitespace Remove all leading and trailing spaces and tabs. Also reduce the internal occurrences of spaces and tabs into one. Remove all trailing spaces and tabs Remove all leading spaces and tabs Returns a string with leading and trailing whitespace removed Change all lowercase letters in a string to uppercase

Right Soundex

Space Squote Str StripWhiteSpace Trim

TrimB TrimF Trim Upcase

Surrogate Key in Datastage


Surrogate Key is a unique identification key. It is alternative to natural key . And in natural key, it may have alphanumeric composite key but the surrogate is always single numeric key. Surrogate key is used to generate key columns, for which characteristics can be specified. The surrogate key generates sequential incremental and unique integers for a provided start point. It can have a single input and a single output link.

What is Lookup stage? Uses of Look up stage?


The Look up stage is a processing stage that performs lookup operations. It performs on dataset read into memory from any other parallel job stage that can output data. The main uses of the lookup stage is to map short codes in the input dataset onto expanded information from a look up table which is then joined to the data coming from input. For example, some we get the data with customers name and address. Here the data identifies state as a two letters or three letters like mel for melbourne or syd for sydney. But you want the data to carry the full name of the state by defining the code as the key column. In this case lookup stage used very much. It will reads each line, it uses the key to look up the stage in the lookup table.

It adds the state to the new column defined for the output link. So that full state name is added to the each row based on codes given. If the code not found in the lookup table, record will be rejected. Lookup stage also performs to validate the row. Look Up stage is a processing stage which performs horizontal combining. Lookup stage Supports N-Inputs ( For Norman Lookup ) 2 Inputs ( For Sparse Lookup) 1 output And 1 Reject link Up to Datastage 7 Version We have only 2 Types of LookUps a) Normal Lookup and b) Sparse Lookup But in Datastage 8 Version, enhancements has been take place. They are c) Range Look Up And d) Case less Look up Normal Lookup:-- In Normal Look, all the reference records are copied to the memory and the primary records are cross verified with the reference records. Sparse Lookup:--In Sparse lookup stage, each primary records are sent to the Source and cross verified with the reference records. Here , we use sparse lookup when the data coming have memory sufficiency and the primary records is relatively smaller than reference date we go for this sparse lookup. Range LookUp:--- Range Lookup is going to perform the range checking on selected columns. For Example: -- If we want to check the range of salary, in order to find the grades of the employee than we can use the range lookup.
In Lookup stage properties, you will have constraints option. If you click on constraints button- you will get options like continue, drop, fail and reject If you select the option continue: it means left outer join operation will be performed. If you select the option drop: it means inner join operation will be performed.

Examples Aggregator Stage and Filter Stage with example


If we have a data as below table_a dno,name 10,siva 10,ram 10,sam 20,tom 30,emy 20,tiny 40,remo And we need to get the same multiple times records into the one target. And single records not repeated with respected to dno need to come to one target. Take Job design as

Read and load the data in sequential file. In Aggregator stage select group =dno Aggregator type = count rows Count output column =dno_cpunt( user defined ) In output Drag and Drop the columns required.Than click ok In Filter Stage ----- At first where clause dno_count>1 -----Output link =0 -----At second where clause dno_count<=1 -----output link=0 Drag and drop the outputs to the two targets. Give Target file names and Compile and Run the JOb. You will get the required data to the Targets.

Funnel Stage with Real time example


Some times we get data in multiple files which belongs to same bank customers information. In that time we need to funnel the tables to get the multiple files data into the single file.( table) For Example , if we have the data two files as below xyzbank1 e_id,e_name,e_loc 111,tom,sydney 222,renu,melboourne 333,james,canberra 444,merlin,melbourne xyzbank2 e_id,e_name,e_loc 555,,flower,perth 666,paul,goldenbeach 777,raun,Aucland 888,ten,kiwi For Funnel take the Job design as

Read and Load the data into two sequential files. Go to Funnel stage Properties and Select Funnel Type = Continous Funnel ( Or Any other according to your requirement ) Go to output Drag and drop the Columns ( Remember Source Columns Stucture Should be same ) Then click ok Give file name for the target dataset then compile and run th job

Column Generator with sample data example


Column Generator is a development stage/ generating stage that is used to generate column with sample data based on user defined data type . Take Job Design as Seq.File--------------Col.Gen------------------Ds Take source data as a xyzbank e_id,e_name,e_loc 555,flower,perth 666,paul,goldencopy 777,james,aucland 888,cheffler,kiwi In order to generate column ( for ex: unique_id) First read and load the data in seq.file Go to Column Generator stage -- Properties -- Select column method as explicit In column to generate = give column name ( For ex: unique_id) In Output drag and drop Go to column write column name and you can change data type for unique_id in sql type and can give length with suitable name Then compile and Run

Basic Job Example for Sequential Stage to Datastage


This is the basic job for Datastage Learners. You can understand, how we can read the data and how we can load the data into the target. If we want to read the data using Sequential File Design as follows : ------------Seq. File ------------------------Dataset File To read the data in Sequential file open Properties of Sequential file and give the file name. Now you can give the file path, by clicking on the browse for the file. And in Options select True ( If the first line is column name ) You can just leave the rest of the options as it is. Now go to Column and click on load, then select the file you like to read from the table definitions . ( This file should be same which you have given in the properties. ) Now in the Target Dataset - Give file name. Now Compile And run That's it You will get the output.

Transformer Stage to filter the data


If our requirement is to filter the data department wise from the file below samp_tabl 1,sam,clerck,10 2,tom,developer,20 3,jim,clerck,10 4,don,tester,30 5,zeera,developer,20 6,varun,clerck,10 7,luti,production,40 8,raja,priduction,40 And our requirement is to get the target data as below In Target1 we need 10th & 40th dept employees. In Target2 we need 30th dept employees. In Target1 we need 20th & 40th dept employees. Take Job Design as below

Read and Load the data in Source file In Transformer Stage just Drag and Drop the data to the target tables. Write expression in constraints as below dept_no=10 or dept_no= 40 for table 1 dept_no=30 for table 1 dept_no=20 or dept_no= 40 for table 1 Click ok Give file name at the target file and compile and Run the Job to get the Output

Multiple Join stages to join three tables


If we have three tables to join and we don't have same key column in all the tables to join the tables using one join stage. In this case we can use Multiple join stages to join the tables. You can take sample data as below soft_com_1 e_id,e_name,e_job,dept_no 001,james,developer,10 002,merlin,tester,20 003,jonathan,developer,10 004,morgan,tester,20 005,mary,tester,20 soft_com_2 dept_no,d_name,loc_id 10,developer,200 20,tester,300 soft_com_3 loc_id,add_1,add_2 10,melbourne,victoria 20,brisbane,queensland Take Job Design as below

Read and load the data in three sequential files. In first Join stage , Go to Properties ----Select Key column as Deptno and you can select Join type = Inner Drag and drop the required columns in Output Click Ok In Second Join Stage Go to Properties ---- Select Key column as loc_id and you can select Join type = Inner

Drag and Drop the required columns in the output Click ok Give file name to the Target file, That's it Compile and Run the Job

Aggregator Stage and Filter Stage with example


If we have a data as below table_a dno,name 10,siva 10,ram 10,sam 20,tom 30,emy 20,tiny 40,remo And we need to get the same multiple times records into the one target. And single records not repeated with respected to dno need to come to one target. Take Job design as

Read and load the data in sequential file. In Aggregator stage select group =dno Aggregator type = count rows Count output column =dno_cpunt( user defined ) In output Drag and Drop the columns required.Than click ok In Filter Stage ----- At first where clause dno_count>1 ----Output link =0 -----At second where clause dno_count<=1 -----output link=0 Drag and drop the outputs to the two targets. Give Target file names and Compile and Run the JOb. You will get the required data to the Targets.

How to create group id in Sort Stage in Datastage


Group ids are created in two different ways. We can create group id's by using a) Key Change Column b) Cluster Key change Column Both of the options used to create group id's . When we select any option and keep true. It will create the Group id's group wise. Data will be divided into the groups based on the key column and it will give (1) for the first row of every group and (0) for rest of the rows in all groups. Key change column and Cluster Key change column used based on the data we are getting from the source. If the data we are getting is not sorted , then we use key change column to create group id's If the data we are getting is sorted data, then we use Cluster Key change Column to create Group Id's

Merge Stage Example


Merge Stage is a Processing Stage which is used to perform the horizontal combining. This is one of the stage to perform this operation like Join stage and Lookup Stage. Only the difference between the stages are size variance an Input requirements between them. Example for Merge Stage Sample Tables MergeStage_Master

cars,ac,tv,music_system BMW,avlb,avlb,Adv Benz,avlb,avlb,Adv Camray,avlb,avlb,basic Honda,avlb,avlb,medium Toyota,avlb,avlb,medium Mergestage_update1 cars,cooling_glass,CC BMW,avlb,1050 Benz,avlb,1010 Camray,avlb,900 Honda,avlb,1000 Toyota,avlb,950 MergeStage Update2 cars,model,colour BMW,2008,black Benz,2010,red Camray,2009,grey Honda,2008,white Toyota,2010,skyblue Take Job Design as below

Read and load the Data into all the input files.

In Merge Stage Take cars as Key column. In Output Column Drag and Drop all the columns to the output files.

Give File name to the Target/Output file and If you want you can give reject links (n-1) Compile and Run the Job to get the required output.

Look up stage with example


Look Up stage is a processing stage and used to perform lookup operations and to map short codes in the input dataset into expanded information from a lookup table which is than joined to the incoming data and output. For example if we have the primary data as below. Table1 e_id,ename,e_state 100,sam,qld 200,jammy,vic 300,tom,Tas 400,putin,wa table1Ref e_state,full_state qld,queensland vic,victoria Take the job design as below

Read and load the two tables in sequential files. Go to lookup stage and Drag and drop the primary columns to the output. And Join e_state from primary table to the e_state in reference table And drag and drop the Full_state to the output. In properties select lookup failure as drop Now click ok Give Target file name and Compile & Run the Job here

What To Choose Join Stage or Lookup stage in Datastage


How to choose the stages. Join stage or Lookup stage We need to be careful when selecting the stages. We need to think about the performance of the Job before selecting the stages. Time is more precious to the

clients. That's why we need to get the Job for very less time. We need to try our best to get good performance to the Job. Both the stages Join stage and Look up stage performs same thing. That is they combine the tables we have. But why Lookup stage has been introduced. Look Up Stage have some extra benefits which will not come with the Join stage. Look up stage doest not required the data to be sorted. Sorting is mandatory with the Join stage. In Look Up stage the columns with different column names can be joined as well where it is not possible in the Join stage. That means Join stage, the column name must be similar. A Look Up Stage supports reject links , if our required demands reject links we cant go with Join stage. Because Join stage doesnt supports Reject Links. And Lookup stage has an option to fail the Job if the look up fails. It will be useful when the look up stage is expected to be successful. Look up stage keeps the reference data into the memory which yields better performance for smaller volume of data. If you have large amount of data, you need to go with Join stage.

You might also like