This action might not be possible to undo. Are you sure you want to continue?
DATA STAGE DATA STAGE
DATA STAGE DATA STAGE
1. Explain about Data stage Architecture? Datstage is an ETL Tool and it is client-server technology and integrated toolset used for designing, running, monitoring and administrating the “data acquisition” application is known as “job”. A job is graphical representation of dataflow from source to target and it is designed with source definitions and target definition and transformation Rules. The data stage software consists of client and server components
Data stage Designer Data stage Server Data stage Director TCP/IP
Data stage Manager
Data stage Repository
Data stage Administrator
When I was installed Data stage software in our personal PC it’s automatically comes in our PC is having 4 components in blue color like DATASTAGE ADMINISTRATOR, DATASTAGE DESIGNER, DATASTAGE DIRECTOR, DATASTAGE MANAGER. These are the client components. DS Client components:1) Data Stage Administrator:This components will be used for to perform create or delete the projects. , cleaning metadata stored in repository and install NLS.
DATA STAGE DATA STAGE
2) Data stage Manager:it will be used for to perform the following task like.. a) Create the table definitions. b) Metadata back-up and recovery can be performed. c) Create the customized components. 3) Data stage Director:It is used to validate, schedule, run and monitor the Data stage jobs. 4) Data stage Designer:It is used to create the Datstage application known as job. The following activities can be performed with designer window. a) Create the source definition. b) Create the target definition. c) Develop Transformation Rules d) Design Jobs. Data Stage Repository:It is one of the server side components which is defined to store the information about to build out Data Ware House. Data Stage Server:This is defined to execute the job while we are creating Data stage jobs. 2. What is a job? And Types of the Job? Ans:- Job is nothing but it is ordered series of individual stages which are linked together to describe the flow of data from source and target. There are three types of jobs can be designed. a) Server jobs b) Parallel Jobs c) Mainframe Jobs 3. Have you work either parallel jobs or server jobs? Ans:- I had been working parallel job since 3+ years onwards… 4. What is difference between server jobs and parallel jobs? Ans:Server jobs:a) In server jobs it handles less volume of data with more performance. b) It is having less number of components. c) Data processing will be slow.
c) Fast Name: it is server name.it is a collections of nodes. The parallel jobs support the following hardware system like SMP. Active stages:. What is stage? Explain different various stages in data stage? Ans:. Partition parallism:. II.All Database stages in palette window by designer. the remaining three partitions start processing simultaneously and parallel. Whenever the first partition starts. by using this name it was executed our ETL jobs. 6.it is logical processing unit which performs all ETL operations. Ex:. file and processing There are two types of stages. For example.A stage defines a database. EX:.These stages defines the extraction.it is a process to perform ETL task in parallel approach need to build the data warehouse. 7. A.DATA STAGE DATA STAGE d) It‟s purely work on SMP (Symmetric Multi Processing). MPP to achieve the parallism. All stages in the job are Operating simultaneously. Partition parallism. d) It follows MPP (Massively parallel Processing).in this parallism. the same job would effectively be run simultaneously by several processors. e) It is having more number of components compared to server jobs.All Processing Stages.:.which defines read and write access are known as passive stages. my source is having 100 records and 4 partitions. Pipeline parallism. For example. Each processors handles separate subset of total records. then all remaining records processing simultaneously. Pipeline parallism. it is having the information about the processing and storage resources that are available for usage during parallel job execution. The default configuration file is having like a) Node:. B.which defines the data transformation and filtering the data known as active stage. e) It is highly impact usage of transformer stage. Page 5 .the data flow continuously throughout it pipeline . Passive stages:. f) It‟s work on orchestrate framework 5. Transformation. What is configuration file? What is the use of this in data stage? It is normal text file. Parallel jobs:a) It handles high volume of data. b) It‟s work on parallel processing concepts. and loading These are also divided into two types I. The data will be equally partition across 4 partitions that mean the partitions will get 25 records. Explain parallism techniques? Ans:. c) It applies parallism techniques. my source is having 4 records as soon as first record starts processing. a) Built-in stages b) Plug-in stages. Built-in stages:. b) Pools:. There are two types of parallel parallism techniques.
This partition is based on key column module. Auto:.The records are randomly distributed across all processing nodes. 9) Explain each and every File stages? File stages Note:. It can support single input link or single output link and as well as reject link. e) Resource Scratch disk:-it is temporary memory area where the staging operation will be performed. What are the data partitioning techniques in data stage? In this data partitioning method the data splits into various partitions distribute across the processors. The data stage determines the best partition method to use depending upon the type of stage. This partition is similar to hash partition. Hash:.All file stage are passive stages means which defines just to read or write access only. Random:.The records with the same values for the hash-key field given to the same processing node.The related records are distributed across the one node . Range:. Modulus:.DATA STAGE DATA STAGE d) Resource disk:.the first record goes to first processing node. This method is useful for creating equal size of partition.This is most common method. second record goes to the second processing node and so on…. The range is specified based on key column. When I was go for properties of sequential file stage…. Sequential File stage: it is one of the file stages which it can be used to reading the data from file or writing the data to file.it is permanent memory area which stores all Repository components. 8. Page 6 . The data partitioning techniques are a) Auto b) Hash c) Modulus d) Random e) Range f) Round Robin g) Same The default partition technique is Auto. Round Robin:..
When I was go for properties of dataset stage…. Page 7 . So.DATA STAGE DATA STAGE Dataset:It is also one of the file stages which it can be used to store the data on internal format. it is related operating system. it will take less time to read or write the data.
DATA STAGE DATA STAGE File Set:It is also one of the file stage which it can be used to read or write the data on file set. Page 8 .fs”.. The file it can be saved with the extension of “. it operating parallel When I was go for properties of file set….
About File set 1)It stores data in the format similar to a sequential file.DATA STAGE DATA STAGE 10) What is exact difference between Dataset and File set? Dataset is an internal format of Data Stage the main points to be considered about dataset before using are: 1) It stores data in binary in the internal format of Data Stage so. 2) It preserves the partioning schemes so that you don't have to partition it again. 2) Only advantage of using file set over a sequential file is "it preserves partioning scheme". Page 9 . 3) You can view the data but in the order defined in partitioning scheme. it takes less time to read/write from dataset than any other source/target. 3) You cannot view data without data stage Now.
By using CFF we can read ASCII or EBCDIC (Extended Binary coded Decimal Interchange Code) data..in the case you need to use sequential file or file set stage... at the time of Compilation it will convert to native format from ASCII. when you go for using datasets.. We can flatten the arrays (COBOL files)....but selection is depends on the Requirement.suppose if you want to capture rejected data... We can select the required columns and can omit the remaining.....DATA STAGE DATA STAGE Q) Why we need datasets rather than sequential files? When you use sequential file as Source. by default sequential files we be Processed in sequence only. When I was go for properties of CFF stage… Page 10 . Seq file is used Extract the from flat files and load into flat files and limit is 2GB Dataset is a intermediate stage and it has parallism when load data into dataset and it improve the performance. We can collect the rejects (bad formatted records) by setting the property reject to save (other options: continue fail)... Also. Complex Flat File:This file is used to read the data form Mainframe file.all the above can me overcome using dataset Stage. Sequential files can accommodate up to 2GB only.sequentilal files does not support NULL values. conversion is not required.where as...
. When I was go for properties for aggregator stage.DATA STAGE DATA STAGE 11) Explain about various types of processing stages? Processing Stages Aggregator stage:It is one the processing stage which it can be used to perform the summaries for the group of input data. It can support single input link which carries the input data and it can support single out put link which carries aggregated data to output link.Double click on aggregate stage then it will show… Page 11 .
It can support single input link and number of output links. Double click on copy stage It will show like… Page 12 .DATA STAGE DATA STAGE Copy stage:It is also one of the processing stages which it can be used just to copy the input data to number of output links. When I was go for the properties of the copy stage.
DATA STAGE DATA STAGE Filter Stgae:it is also one of the processing stage which it can be used tooo perform the filter the data based on given condition. Double click on filter stage it will show… Page 13 . When I was go for propertief of filter stage. It can support single input link and „n‟ no of output links and optinally it support one reject link.
DATA STAGE DATA STAGE Switch stage:. It can support single input link and 128 output links ..it is also one of the processing stage which it can be used to filter the input data based on given conditions.. When I was go for properties of switch stage. double click on switch stage. Page 14 .
right outer. Full-outer join means to show the matched as well as unmatched records from both sides.DATA STAGE DATA STAGE Q) What exact difference filter stage and switch stage? Both stages functionality and responsibilities is same. we have to give the multiple conditions. It can support two or more input datasets and one output dataset. full-outer joins. we have to give the multiple conditions on single column. Join can be performing inner join. Left-outer join means to show the matched records from both sides as well as unmatched records from left side table. on multiple columns. Inner join means to display the matched records from both the side tables. left outer. Double click on join stage… Page 15 . but all data come form source only once and check all the condition in the switch stage and loads into target. But the difference way of execution like. Join stage:It is also one of the processing stages which can be used to combine two or more input datasets based on key field. In filter stage. Right-outer join means to show the matched records from both sides as well as unmatched records from Right side table. and doesn‟t support reject link. When I was go to the properties of join stage.. In switch stage. But every time data come form source system and filter the data and loads into target.
the first input link is called “master input link” and remaining links are called “Updated links”. Page 16 .DATA STAGE DATA STAGE Merge stage:It is also one of the processing stages which it can be used to merge the multiple input data. It can be perform inner join and left-outer join only. It can support multiple input links.
.Double click on merge stage… Page 17 .DATA STAGE DATA STAGE When I was go to the properties of the merge stage.
If Unmatched Master Mode= Drop then it will be perform inner join. Look-up stage:This is also one of the processing stages which can be used to look-up on relational tables. it will show like… Please look into below diagram Page 18 .DATA STAGE DATA STAGE Q) On which case inner join perform and on which case left-outer join perform? In merge stage it is having one property is here. When I was go for properties of look-up stage. This is simple job for regarding on explanation of look-up stage… This stage will be performing inner join and left-outer join. It can support multiple input links and single output link and support single reject link.. If Unmatched Master Mode= keep then it will be perform left-outer join. Double click on look-up stage. To see the above picture.
DATA STAGE DATA STAGE Q) On which case it will be perform either inner or left-outer join? To see observe picture it is having one icon is – constraints. Double click on that icon it will show one window is look like… Page 19 .
Input requirements 2. it doesn‟t catch the unmatched master records on mater link. it can support multiple input links and multiple output links and also support reject links same as updated links. right-outer. 2. Treatment of unmatched records 3. If the condition not met =“drop”. In case of merge it will support inner as well as left-outer only. Memory usage 1. in case of lookup. Lookup Failure. it will be perform inner join. Memory usage:If the reference dataset is larger than physical memory then we can go for join stage for better performance. If the reference dataset is smaller than physical memory then it recommended to use lookup. Treatment of unmatched records:Join doesn‟t get any unmatched records because of doesn‟t support reject link. it will be perform left-outer join. Default it having lookup Failure=Fail. In case of lookup. In case of merge. and full-outer. Default it is having fail. Q) What is main difference between join. Join support 4 types of joins like inner. Each and every update unmatched records go to corresponding update rejects links. left-outer. Input requirements:Join will be support two or more input links and single output link doesn‟t support reject link. it supports multiple input links and single output link as well as one reject link.DATA STAGE DATA STAGE It‟s having two options like: Condition not met. merge and lookup? It‟s mainly differ from 1. In case of merge. it catches the unmatched primary records only. In case of lookup. Page 20 . 4. it will be support inner as well as left-outer. If the condition not met option = continue. If it is having continue option will be support Reject link.
it is also one of Active processing stage which can be used to combined the multiple input datasets into single output datasets.all the input datasets is having same structure. When I was go for properties of funnel stage… Page 21 . Note:.DATA STAGE DATA STAGE Funnel stage:.
it is also one of processing stage which it can be used to remove the duplicates data based on key field. it will show… Sort stage:It is also one of the processing stage which can be used to sort data based on key field. When I was go for properties of sort stage… Page 22 . When I was double click on remove duplicate stage.DATA STAGE DATA STAGE Remove Duplicate Stage:. It can support single input link and single output link. either ascending order or descending order. It can be support single input link and single output link.
When I was go for properties of modify stage… Page 23 . And we can do some modification in length also.DATA STAGE DATA STAGE Modify Stage:It is also one of the processing stages which it can be used to when you are able to handle Null handling and Data type changes. It is used to change the data types if the source contains the varchar and the target contains integer then we have to use this Modify Stage and we have to change according to the requirement.
Another fact about the Pivot stage is that it's irreplaceable i. we can convert 10 rows into 100 columns and 100 columns into 10 rows 3) You can add more points here!! Let me first tell you that a Pivot stage only CONVERTS COLUMNS INTO ROWS and nothing else. no other stage has this functionality of converting columns into rows!!! So. Which can be used to make many people have the following misconceptions about Pivot stage? 1) It converts rows into columns 2) By using a pivot stage. lets take a file with the following fields: sno.. You would need the following Page 24 . m2. Some DS Professionals refer to this as NORMALIZATION. doesn't!!! Let's cover how exactly it does it.e. that makes it unique.i. m3 Basically you would use a pivot stage when u need to convert those 3 fields like m1..m2... For example.DATA STAGE DATA STAGE Pivot stage:it is also one of the processing stage. m1..m3 into a single field marks which contains a unique value per row.e. sname.
. Page 25 .DATA STAGE DATA STAGE output When I was go for properties of pivot stage….
in Surrogate key are generated by the system sequentially. Page 26 . It is a system generated key on dimensional tables. When I was go for properties of surrogate key generator stage… Q) What is difference between primary key and surrogate key? Surrogate key is an artificial identifier for an entity. In primary key are all the values are entered manually by the are uniquely identifier there will be no repletion of data.DATA STAGE DATA STAGE Surrogate key Generator:It is also one important stage on processing stage which it can be used to generate the sequence numbers while implementing slowly changing dimension. Primary key is a natural identifier for an entity.
This stage uses Microsoft . When I was go for properties of Transformer stage… v In this Editor it is having stage variables. Stage Variable .DATA STAGE DATA STAGE Transformer Stage:It is an active processing stage which allows filtering the data based on given condition and can derive new data definitions by developing an expression. It can have single input link and number of output links and also reject link. and Constraints. The transformer stage can be performing data cleaning and data scrubbing operation.An intermediate processing variable that retains value during read and doesn‟t pass the value into target column. Page 27 . Derivations.net framework environment for it‟s compilation.
Constraints . 2) Derivations.Expression that specifies value to be passed on to the target column. The source which is used as reference to capture the change is called after dataset. The source which we are looking for the change is called before dataset. So. by this change code will be recognizing delete.DATA STAGE DATA STAGE Derivation . 3) Constraints. Q) How can I define stage variables on Transformer Stage? When I was click stage properties on Transformer Stage. The change code will be added in output dataset. insert or update. Change Capture Stage:This is also one of the active processing stage which it can be used to capture the changes between two sources like After and Before.Conditions that are either true or false that specifies flow of data with a link. Page 28 . it will show one window look like… When I was click on constraints icon it will show one window… Q) What is order of the Execution in Transformer stage? Order of Execution is… 1) Stage Variables.
Page 29 ..DATA STAGE DATA STAGE When I was go for properties of change capture stage.
DATA STAGE DATA STAGE 15) Explain Development and Debug stages? Development and Debug Stages Row generator Stage:it produces set of data fitting specified meta data. When I was go for properties of row generator stage… Page 30 . It is useful where you want to test your job but have no real data available to process. It is having no input links and a single output link.
It can have single input link and single output link.DATA STAGE DATA STAGE Column generator stage:This stage adds the columns to incoming data and generates mock data for these columns for each data row processed. When I was go for properties of column generator stage…. Page 31 .
DATA STAGE DATA STAGE Sample job for column generator stage:- Input data:- Output data:- Page 32 .
When I was go for properties of head stage…. This stage selects BOTTOM N rows from the input dataset and copies the selected rows to an output datasets. It can have a single input link and single output link.DATA STAGE DATA STAGE Head Stage:This stage helpful for testing .and debug the application with large datasets. Page 33 . Tail Stage:This stage helpful for testing and debug the application with large datasets. It can have a single input link and single output link. This stage selects TOP N rows from the input dataset and copies the selected rows to an output datasets.
Page 34 . When I was go for properties of sample stage….DATA STAGE DATA STAGE When I was go for properties of tail stage… Sample Stage:This stage will be having single input link and any number of output link when operating percent or period mode.
It can be used to print the record column values to the job log view.DATA STAGE DATA STAGE Peek Stage:it can have a single input link and any number output link. When I was go for properties of peek stage… Page 35 .
DATA STAGE DATA STAGE Mock data:- Page 36 .
changes is occurred at source level then change capture stage gives the change code=3 . So. Join stage joins the updated records and target records by removing duplicate records using remove duplicate stage. If any updation is occurred at source. these two are connected to change capture stage which is connected to transformer stage which is having two output links like insert link and update link. The insert link is connected to stored procedure stage which is connected to transformer which is connected to target stage. For example. the source is having EMP table with 100 records. When I was run the job. There are two types variables are there. First compare two input datasets. for every update in the source. Environmental variables are also called as Global variables. that updation records will be stored to target side (TGT_UPDATE). how it means. So. The transformer stage transforms the records from source to target by generating sequence to the records by using stored procedure stage. the change capture stage gives the change code=1. How it will be store means. Local Variables 2. It is to maintain the history information for particular organization in target. While designing the job we set the properties for these variables. The output of the join stage to connected to transform stage which was transforming update records to target update stage. We create/declare these variables in DS Administrator. by using this change code. 17) What is Environment Variables? Basically Environment variable is predefined variable those we can use while creating DS job. it is having two input datasets like before and after datasets. The output link of the join stage which is connected to transformer stage which is connected to target update stage. the transformer stage transform the records to join stage through the update link. the records was initially loaded into target insert stage. 1. in first time there is no change in the records. first two compare two input datasets.Environmental variables/Global Variables Page 37 . And also other output link (update link) of the transformer stage which is joined with the target stage while removing records by using remove duplicate stage. it insert new record in target.DATA STAGE DATA STAGE 21) Can you explain Type-2 implementation? SCD type-2 is the common problems in DWH. In this implementation.
the manager client is embedded in designer client. 18) Explain Job parameters? There is an icon to go to Job parameters in the tool bar. In 8.In any job through out ur project in this some default variables r there and also we can define some user defined variables also.1 code is a file based system where as metadata is stored in database. Or you can press Ctrl+J to enter into Job Parameters dialog box. Its not necessary always to open the job to change the parameter value. Also when the job runs through script its just enough to give the parameter value in the command line of script. 5.0.2. in 8.Click on the "User Defined" folder to see the list of job specific environment variables. when a developer opens a particular job.1 we require operating system authentications and data stage authentications. In 7.0.5. In 7.0.5. 4.1 it can be possible when a developer opens a particular job and another developer wants to open the same job then it can be opened as read only job.0. Page 38 .2 we required operating system authentications..only for particular job only Env Variables:. that job can‟t be opened. In 7.5. In 7.1 a single join stage can support Multiple references. and another developer wants to open the same job.5.2 code and metadata is stored in file based system. By using this if there is any change in password or schema no need to worry about all the jobs." button.. in 8. Give me to you some example for environment variable. 2.2 to 8.0. in 8. 20) What is difference between version Data stage 7.2 quality stage has separate designer . How means. password and schema. In8.in 8. Once you enter give a parameter name and corresponding default value for it.5 and 8.1 we don‟t have any manager client.Start up Data Stage Administrator. 3.5.5. In data stage 7..2 we don‟t have range lookup.0.1 we have range lookup.2 we have manager as client. Use them where ever you are want with #Variable#. These are constant through out the project so they will be created as environment variables. This helps to enter the value when you run the job. In 7.DATA STAGE DATA STAGE Local Variables:. Creating project specific environment variables. Example is you want to connect to database you need use id .2 a single join stage can't support multiple references. Else you have to change the value in the job compile and then run in the script.On the General tab click the "Environment. So that it will be more clear for us.5.0. So its easy for the users to handle the jobs using parameters. In 7.0. 7.1 quality stage is integrated in designer. 6.1 1.Choose the project and click the "Properties" button. Change it at the level of environment variable that will take care of all the jobs. in 8.1? Main differences b/w data stage 7.5.0..
2 it is not possible 9.DATA STAGE DATA STAGE 8. Q) How did you handle Rejected Data? Reject-link is defined and reject data is loaded back into DWH. To Execute one job from other job. The following different types of routines. survivorship. You can use that stage to call your procedure in Data stage jobs. a state fiyle is used to store the maximum value of surrogate key. Q) Do you know about INTEGRITY/QUALITY stage? Quality stage can be integrated with data stage. in 7. like that we can do the quality related works and we can integrate with the data stage we need quality stage plug-in to achieve the task.1 surrogate key is incremented automatically. in 7. view or Edit. Stop the job using DSStopJob function Page 39 .0. In 8.0. Automatic increment of surrogate key is not in 7. one in development another is in production. But I know the routines.1 a compare utility is available to compare 2 jobs. In 7. Ex: Consider two Jobs XXX and YYY.2. Meta data is type of data we are handling. But in 8. Where you can create.2 not available 10.2 first time one job is run and surrogate key s Generated from initial to n value. following steps needs to be followed in Routines. match. Run the other job using DSRunjob function 3. Q) How can I call the Store procedure in data stage? There is a stage named Stored Procedure available in Data stage palette under Database category.5. In 8. 1) Transform Function 2) Before-after Sub routines 3) Job control routines.5. Attach job using DSAttachjob function.0.1 quick find and advance find features are available. Q) What are Routines and where/how are they written and have written any routine before? It didn‟t use Routines at any time in my project.5.1? It is used to handle the metadata which will be very useful for data lineage and data analysis later on. Next time the same job is compiled and run again surrogate key is generated from Initial to n. The Job YYY can be executed from Job XXX by using Data stage macros in Routines. 2. Routines are stored in the routine Branch of the DS Repository. Q) Explain METASTAGE in DS 8. In quality stage we have many stages like investigate. These data definitions are stored in repository and can be accessed with the use of Meta stage. So reject link has to be defined every output link you wish to collect rejected data.5. 1. Q) What is job control? How it is developed? Explain with steps? Controlling Data stage jobs through some other Data stage jobs. Reject data typically a bad data like duplicates of primary keys or null-rows where data is expected.0.
.DATA STAGE DATA STAGE Q) How to kill the job in data stage? By killing the corresponding process ID. Specify the columns on which u want to eliminate as the keys of hash ……………………………………………………………ALL THE BEST…………………………………………………………………………. Q) How do you eliminate duplicate rows? The Duplicates can be eliminated by loading the corresponding data in the Hash file. Page 40 .
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue listening from where you left off, or restart the preview.