You are on page 1of 25

DATASTAGE INTERVIEW QUESTIONS

1. How to run the job in command prompt in unix? A. dsjob -run -jobstatus projectname jobname. 2. what are the command line functions that import and export the ds jobs? A. Dsimport.exe - imports the datastage components
Dsexport.exe - exports the datastage components

3. What is data file & descriptor file in dataset stage? A. Descriptor file and Data File are the dataset files.
Descriptor file will contain the schema details and address of the data Data File will contain the data in native format (Eg: c:IBMInformationserverDatasetsFILENAME.ds) Also there are two more files related to datasets Control and Header.

4. How do you populate source files? A. There are many ways to populate one is writting SQL statment in oracle is one way.
And second is By using row generator extract the source file.

5. How do you fix the error "OCI has fetched truncated data" in DataStage? A. can we use Change capture stage to get the truncated data's.
Not only truncated data,it captures duplicates,edit,insert,unwanted data that means every changes in the before and after data. OCI= Oracle Call Interface

6. What is merge and how it can be done plz explain with simple example taking 2 tables ? A. Merge is used to join two tables. It takes the Key columns sort them in Ascending or descending
order. Let us consider two table i.e Emp,Dept.If we want to join these two tables we are having DeptNo as a common Key so we can give that column name as key and sort Deptno in ascending order and can join those two tables.

7.What is the difference between datastage and informatica? A. Main Difference is


In DataStage having Partition, parallelism, File Lookup, Merge/Funnel. Informatica: No concept of partition and parallelism File lookup caching is really horrible. It will take more time do a lookup with other tables. There is no concept of merge/Funnel options. Here only available union and union all function.

* I have one Source file


Eid ename 101 rao ---102 ramu ----103 eswar ----how can remove underline in names. write a query and implementing datastage. I want normal format. asked by TCS A. you can use unix command as cat file_name1|sed '/^-/d'>>file_name2 here file_name1 is your source name and file_name2 is our target name where you want to store output.

*What is the difference between symmetrically parallel processing, massively parallel processing? A. SMP is Symmetric Multi processing
MPP is Massively Parallel Processing Smp supports limited parallelism i.e 64 processors where as MPP can support N number of nodes or processors [high performance] Smp processing is SEQUENTIAL where As MPP Processing can be PARALLEL

A. Symmetric Multiprocessing (SMP) - Some Hardware resources may be shared by processor.


Processor communicate via shared memory and have single operating system. Cluster or Massively Parallel Processing (MPP) - Known as shared nothing in which each processor have exclusive access to hardware resources. Cluster systems can be physically disposed. The processor have their own operations system and communicate via high speed network.

*How do you do Usage analysis in datastage ?


A. 1. If u want to know some job is a part of a sequence, then in the Manager right click the job and
select Usage Analysis. It will show all the jobs dependents. 2. To find how many jobs are using a particular table. 3. To find how many jobs are using a particular routine. Like this, u can find all the dependents of a particular object. Its like nested. U can move forward and backward and can see all the dependents.

*What are the Repository Tables in DataStage and What are they? A. Dear User. A datawarehouse is a repository(centralized as well as distributed) of Data, able to answer
any adhoc,analytical,historical or complex queries.Metadata is data about data. Examples of metadata include data element descriptions, data type descriptions, attribute/property descriptions, range/domain descriptions, and process/method descriptions. The repository environment encompasses all corporate metadata resources: database catalogs, data dictionaries, and navigation services. Metadata includes things like the name, length, valid values, and description of a data element. Metadata is stored in a data dictionary and repository. It insulates the data warehouse from changes in the schema of operational systems.In data stage I/O and Transfer , under interface tab: input , out put & transfer pages.U will have 4 tabs and the last one is build under that u can find the TABLE NAME .The DataStage client components are:AdministratorAdministers DataStage projects and conducts housekeeping on the serverDesignerCreates DataStage jobs that are compiled into executable programs DirectorUsed to run and monitor the DataStage jobsManagerAllows you to view and edit the contents of the repository.Sould ypu need any further assistance pls revert to this mail id venkatdba2000@yahoo.com or venkata.veluri@gmail.comRegardsVenkat

A. A data warehouse is a repository(centralized as well as distributed) of Data, able to answer any


adhoc, analytical, historical or complex queries. Metadata is data about data. Examples of metadata include data element descriptions, data type descriptions, attribute/property descriptions, range/domain descriptions, and process/method descriptions. The repository environment encompasses all corporate metadata resources: database catalogs, data dictionaries, and navigation services. Metadata includes things like the name, length, valid values, and description of a data element. Metadata is stored in a data dictionary and repository. It insulates the data warehouse from changes in the schema of operational systems. In data stage I/O and Transfer , under interface tab: input , out put & transfer pages. U will have 4 tabs and the last one is build under that u can find the TABLE NAME .The DataStage client components are: Administrator Administers DataStage projects and conducts housekeeping on the server Designer Creates DataStage jobs that are compiled into executable programs Director Used to run and monitor the DataStage jobs Manager Allows you to view and edit the contents of the repository.

A. Repository resides in a specified data base. it holds all the meta data, raw data, mapping information
and all the respective mapping information.

* What is a project? Specify its various components? A. You always enter DataStage through a DataStage project. When you start a DataStage client you are
prompted to connect to a project. Each project contains: DataStage jobs. Built-in components. These are predefined components used in a job. User-defined components. These are customized components created using the DataStage Manager or DataStage Designer.

* How to implement type2 slowly changing dimenstion in datastage? give me with example? A. Slow changing dimension is a common problem in Dataware housing. For example: There exists a
customer called lisa in a company ABC and she lives in New York. Later she she moved to Florida. The company must modify her address now. In general 3 ways to solve this problem Type 1: The new record replaces the original record, no trace of the old record at all, Type 2: A new

record is added into the customer dimension table. Therefore, the customer is treated essentially as two different people. Type 3: The original record is modified to reflect the changes. In Type1 the new one will over write the existing one that means no history is maintained, History of the person where she stayed last is lost, simple to use. In Type2 New record is added, therefore both the original and the new record Will be present, the new record will get its own primary key, Advantage of using this type2 is, Historical information is maintained But size of the dimension table grows, storage and performance can become a concern. Type2 should only be used if it is necessary for the data warehouse to track the historical changes. In Type3 there will be 2 columns one to indicate the original value and the other to indicate the current value. example a new column will be added which shows the original address as New york and the current address as Florida. Helps in keeping some part of the history and table size is not increased. But one problem is when the customer moves from Florida to Texas the new york information is lost. so Type 3 should only be used if the changes will only occur for a finite number of time.

8. How can we write parallel routines in data stage PX? A. First you know what is routines: Routines are set of functions and its defined by DS manager and call
trough help with transformer stage. and you go to DS manager select routines left side of the window and click on routines then one pop up window open there is some options like Server Routines, Parallel Routines and Mainframe routines u have to select which routines you want then follow it.

9. What is the exact difference between Join, Merge and Lookup Stage? A. Join requires less memory usage
Lookup requires more memory usage and Merge requires less memory usage

10. What is job control? how can it used explain with steps? A. JCL defines Job Control Language it is used to run more number of jobs at a time with or without
using loops. steps: click on edit in the menu bar and select 'job properties' and enter the parameters as parameter prompt type STEP_ID STEP_ID string Source SRC string DSN DSN string Username unm string Password pwd string after editing the above steps then set JCL button and select the jobs from the list box and run the job Job control can be acquired using job sequence in data stage 8.0.1.with or without loops. from the menu select new->sequence job and get the corresponding stages in the palette.

11. How to kill the job? A. Integrity/quality stage is a data integration tool from ascential which is used to staderdize/integrate
the data from different sources. Quality stage is one the specialty of Datastage 8.5 version. From 8.5 version on wards we have the option for integrate the data from different kind of sources.

12. what is the difference between validated ok and compiled in datastage? A.When we Compile the job, Datastage engine checks whether all the required properties are given or
not. When we Validate the job, DS_Engine checks whether all given properties are valid or not.

13. How I can convert Server Jobs into Parallel Jobs? A. the main defference b/w parallel and server jobs is 1)parallel processing
2)partitioning we can achieve these two in server jobs also. 1) parallel processing we can achieve by using IPC Stage, by keeping this stage in between two passive stages. 2)partitioning we can achieve by using LINK PARTITIONER AND LINK COLLECTOR which splits the data into upto 64 link and collects the same data respectively to send to out put link.

14. What are the different types of lookups in datastage?


A. There are four types of lookups 1. Normal lkp 2. Sparse lkp 3.Casles 4.Range Normal lkp: To perform this lkp data will be stored in the memory first and then lkp will be performed bue to which it takes more execution time Sparse lkp: Sql query will be directly fired on the database related record due to which execution is faster than normal lkp For a single job we can use only one file with sparse lkp where as no of normal lkps can be used. but once we mention file to do sparse lkp it will not accept normal lkps after that. So make sure to mention all your normal lkps first and sparse lkp at the last.

15. What is the purpose of exception activity in data stage 7.5? A. The stages followed by exception activity will be executed whenever there is an unknown error
occurs while running the job sequencer.

16. How to handle Date convertions in Datastage? Convert a mm/dd/yyyy format to yyyy-dd-mm? A. Function to convert mm/dd/yyyy format to yyyy-dd-mm is
Oconv(Iconv(Filedname,"D/MDY[2,2,4]"),"D-YDM[4,2,2]") .

17. What is APT_CONFIG in datastage? A. APT_CONFIG is just an environment variable used to identify the *.apt file. Dont confuse that with
*.apt file that has the node's information and Configuration of SMP/MMP server. Apt_configfile is used for to store the nodes information, and it contains the disk storage information, and search information. and datastage understands the architecture of system based on this Configfile. for parallel process normally two nodes are required its name like 10,20.

18. If data is partitioned in your job on key 1 and then you aggregate on key 2, what issues could arise? A. Data will partitioned on both the keys ! hardly it will take more for execution . 19. Difference between Hashfile and Sequential File? A. Hash file stores the data based on hash algorithm and on a key value. A sequential file is just a file
with no key column. Hash file used as a reference for look up. Sequential file cannot.

20. What is difference between Merge stage and Join stage? A. Merge stage is being used the master link can't have duplicate values whereas the join stage has no
such constraints. Merge stage by default can do sorting, where join stage sorting is use sort stage. Merge have n-1 reject links and join have 1 reject link. Merge have perform 2 joins(inner join, left outer join), join have perform 4 joins (inner join, left outer join, right outer join, full outer join). We can capture unmatched records of reference table by using merge, but we cant capture unmatched records of primary table by using join.

21. Whats difference betweeen operational data stage (ODS) & data warehouse? A. A dataware house is a decision support database for organisational needs.It is subject oriented,non
volatile,integrated ,time varient collect of data. ODS(Operational Data Source) is a integrated collection of related information . it contains maximum 90 days information.

A. ODS(Operational datastage) is nothing but mini datawarehouse


it maintains the below one year data and dataware house maintains the whole business.

22. Can we use shared container as lookup in datastage server jobs? A. No. We cant use container as lookup.
Container is used to reduce the complexity view to simple view of stages or jobs, so there will be only jobs available inside and not source table or anything.

23. How we can call the routine in datastage job?explain with steps? A. Routines are used for impelementing the business logic they are two types 1) Before Sub Routines
and 2)After Sub Routinestepsdouble click on the transformer stage right click on any one of the mapping field select [dstoutines] option within edit window give the business logic and select the either of the options( Before / After Sub Routines)

24. How to find the number of rows in a sequential file? A. There is a system variable named @INROWNUM that can be used in a processing stage(like
transformer,etc). The data is processed record-by-record in the processing stage. @INROWNUM can be used to count the row number of the record being processed. For eg: Consider that a sequential stage is connected to the transformer stage. Include a stage variable

sv=@INROWNUM. Use this as the value for one of the output columns, countCol derivation is sv. Now at the output you can see the count for each record. The countCol value for the last record gives the total number of records in the sequential file.

25. What is NLS in datastage? how we use NLS in Datastage ? what advantages in that ? at the time of
installation i am not choosen that NLS option , now i want to use that options what can i do ? to reinstall that datastage or first uninstall and install once again ? A. NLS stands for national language support. It is used for including other country languages like French, German, Spanish, etc(whose scripts are similar to English) in the data that is processed using data warehousing. Just reinstall you can see the option to include the NLS.

26. How do u clean the datastage repository. A. Go to Datastage Director , Go to Job in the Menu Bar. Go to Clean up Resources . Then clean up the
repository . For Further removing logs and locks with respect to respective Job . Then first select your Job process Id and Log out .

27. What are OConv () and Iconv () functions and where are they used? A. iconv is used to convert the date into into internal format i.e only datastage can understand
example :- date comming in mm/dd/yyyy format datasatge will conver this ur date into some number like :- 740 u can use this 740 in derive in ur own format by using oconv. suppose u want to change mm/dd/yyyy to dd/mm/yyyy now u will use iconv and oconv. ocnv(iconv(datecommingfromi/pstring,SOMEXYZ(seein help which is iconvformat),defineoconvformat))

A.iconv and oconv are not only used for date conversions. They are used for many types of
format conversions like conversions of roman numbers, time, radix, numeral ASCII, etc., The difference between iconv and oconv are: 1) iconv is used to convert the input to system understandable format. 2) oconv is used to convert the system understandable format to user understandable format.

28. How can we implement Lookup in DataStage Server jobs? A. The DB2 stage can be used for lookups. A. In the Enterprise Edition, the Lookup stage can be used for doing lookups.
We can implement lookups in server jobs using Hash files and Transformer stage.Lookup stages are available in parallel jobs only.

A. Connect one input file directly to the transformer stage. This forms the primary link. Another input
file to hash stage and connect the hash stage to the transformer. This forms the reference link. This will function as lookup in server jobs.

29. How we use NLS function in Datastage? what are advantages of NLS function? where we can use
that one? explain briefly? A. By using NLS function we can do the following - Process the data in a wide range of languages - Use Local formats for dates, times and money - Sort the data according to the local rules If NLS is installed, various extra features appear in the product. For Server jobs, NLS is implemented in DataStage Server engine. For Parallel jobs, NLS is implemented using the ICU library.

30. What is hashing algorithm and explain breafly how it works? A. Hashing is key-to-address translation. This means the value of a key is transformed into a disk
address by means of an algorithm, usually a relative block and anchor point within the block. It's closely related to statistical probability as to how well the algorithms work. It sounds fancy but these algorithms are usually quite simple and use division and remainder techniques. Any good book on database systems will have information on these techniques. Interesting to note that these approaches are called "Monte Carlo Techniques" because the behavior of the hashing or randomizing algorithms can be simulated by a roulette wheel where the slots represent the blocks and the balls represent the records (on this roulette wheel there are many balls not just one).

31. how to convert datastage server shared container to parallel shared container? A. I have never tried doing this, however, I have some information which will help you in saving a lot of
time. You can convert your server job into a server shared container. The server shared container can also be used in parallel jobs as shared container. A. I think No..Way to change,But we can call the Server shared container into Parallel jobs.

32. How to drop the index befor loading data in target and how to rebuild it in data stage? A. This can be achieved by "Direct Load" option of SQLLoaded utily. A. If we are using OCI & database stage,we can use Before/After sql property in stage properties. 33. what is meaning of file extender in data stage server jobs.
can we run the data stage job from one job to another job that file data where it is stored and what is the file extender in ds jobs. A. File extender means the adding the columns or records to the already existing the file, in the data stage, we can run the data stage job from one job to another job in data stage.

34. what is meaning of file extender in data stage server jobs.


can we run the data stage job from one job to another job that file data where it is stored and what is the file extender in ds jobs. A. File extender means the adding the columns or records to the already existing the file, in the data stage. we can run the data stage job from one job to another job in data stage.

35. How will you call external function or subroutine from datastage? A. There is datastage option to call external programs . execSH 36. What is Hash file stage and what is it used for? A. We can also use the Hash File stage to avoid / remove dupilcate rowsby specifying the hash key on a
particular fileld.

37. What are Stage Variables, Derivations and Constants? A. Stage Variable - An intermediate processing variable that retains value during read and doesnt pass
the value into target column. Derivation - Expression that specifies value to be passed on to the target column. Constant - Conditions that are either true or false that specifies flow of data with a link.

38. It is possible to call one job in another job in server jobs? A. I think we can call a job into another job. In fact calling doesn't sound good, because you attach/add
the other job through job properties. In fact, you can attach zero or more jobs. Steps will be Edit --> Job Properties --> Job Control Click on Add Job and select the desired job.

39. How can you implement Complex Jobs in datastage? A. Complex design means having more joins and more look ups. Then that job design will be called as
complex job.We can easily implement any complex design in DataStage by following simple tips in terms of increasing performance also. There is no limitation of using stages in a job. For better performance, Use at the Max of 20 stages in each job. If it is exceeding 20 stages then go for another job.Use not more than 7 look ups for a transformer otherwise go for including one more transformer.

40. Why do you use SQL LOADER or OCI STAGE? A. When the source data is anormous or for bulk data we can use OCI and SQL loader depending upon
the source.

41. What are Static Hash files and Dynamic Hash files? A. There are two types of hashfiles are there
1.Static 2.dynamic dynamic will use only when we dont know howmuch data will coming from the source side, this will allow data loading grow automatically, only we use static when we know the fixed amount of data we are trying to load in the target DB, this is the scenario for use both types.

42. How to implement slowly changing dimentions in Datastage? A. 1)when the first time data u r loading to the target tables all inserts only with the effect start date is
current date and effective end data is the default date like 9999-12-31. 2)when second time data is coming to that slowly chaning table, then use the current data set and existng dataset join to the change capture stage , it will gives the change code based on the data, it will give like if the record is coming new it will be the code 1 if the record is duplicate code will be 2 if the recode is the updating code will be the 3 then if the code=3 then use the stage variables in transfomer stage like if the change code=3 then it should update that recode effective end date= current date =-1 and so on .

43. What is job control?how it is developed?explain with steps? A. Controlling Datstage jobs through some other Datastage jobs. Ex: Consider two Jobs XXX and YYY.
The Job YYY can be executed from Job XXX by using Datastage macros in Routines. To Execute one job from other job, following steps needs to be followed in Routines. 1. Attach job using DSAttachjob function. 2. Run the other job using DSRunjob function. 3. Stop the job using DSStopJob function.

44. What is the difference between sequential file and a dataset? When to use the copy stage? A. seq file::1.extract/load from/to seq file.max==2GB.
2.when used as a source at the time of compilation it will be converted into native format from ASCII. 3.By default it will be processed in sequence only. 4.doesn't support null values. 5.processed at the server. 6.supports .csv,.txt,.xls etc... data set::1.used as an intermediate stage. 2.at compile time conversion is not required. 3.datasets get processed in our local system.so performance is improved as the server is not loaded. 4.supports only .ds 5. 2GB limit is not thete

45. How to find errors in job sequence? A. using DataStage Director we can find the errors in job sequence.

46. What is a project? Specify its various components? A. You always enter DataStage through a DataStage project. When you start a DataStage client you are
prompted to connect to a project. Each project contains: DataStage jobs. Built-in components. These are predefined components used in a job. User-defined components. These are customized components created using the DataStage Manager or DataStage Designer

A. You always enter DataStage through a DataStage project. When you start a DataStage client you are
prompted to connect to a project. Each project contains:

47. What are constraints and derivation? A. Constraints are used to check for a condition and filter the data. Example: Cust_Id<>0 is set as a
constraint and it means and only those records meeting this will be processed further. Derivation is a method of deriving the fields, for example if you need to get some SUM,AVG etc

48. If a DataStage job aborts after say 1000 records, how to continue the job from 1000th record after
fixing the error? A. By specifying Checkpointing in job sequence properties, if we restart the job. Then job will start by skipping upto the failed record.this option is available in 7.5 edition.

49. How do we do the automation of dsjobs? A. We can call Datastage Batch Job from Command prompt using 'dsjob'. We can also pass all the
parameters from command prompt. Then call this shell script in any of the market available schedulers. The 2nd option is schedule these jobs using Data Stage director.

50. How do you eliminate duplicate rows? A. duplicate records were removed by using "remove duplicates " stage and also in "sorter stage" there
is option "allow duplicates ". if we select false then it remove duplicates records..

51. How i create datastage Engine stop start script.


Actually my idea is as below. !#bin/bash dsadm - user su - root password (encript) DSHOMEBIN=/Ascential/DataStage/home/dsadm/Ascential/DataStage/DSEngine/bin if check ps -ef | grep DataStage (client connection is there) { kill -9 PID (client connection) } uv -admin - stop > dev/null uv -admin - start > dev/null verify process

check the connection echo "Started properly" A. Go to the path /DATASTAGE/PROJECTS/DSENGINE/BIN/uv -admin -stopuv -admin -start A. case "$1" in start) /bin/su - db2inst1 -c 'db2start' /bin/su - dasusr1 -c 'db2admin start' cd /apps/IBM/WebSphere/AppServer/bin/ ./startServer.sh server1 -username wasadmin_XXX -password wasadmin_XXX cd /apps/IBM/InformationServer/ASBNode/bin/ ./NodeAgents.sh start /bin/su - dsadm -c 'uv -admin -start' ;; stop) /bin/su - dsadm -c 'uv -admin -stop' cd /apps/IBM/InformationServer/ASBNode/bin/ ./NodeAgents.sh stop cd /apps/IBM/WebSphere/AppServer/bin/ ./stopServer.sh server1 -username wasadmin -password wasadmin /bin/su - dasusr1 -c 'db2admin stop' /bin/su - db2inst1 -c 'db2stop' ;; *) echo "Usage: $0 {start|stop|status|restart}" exit 1 ;; esac

52. What are all the third party tools used in DataStage? A. Autosys, TNG, event coordinator are some of them that I know and worked with A. control-m,and auto-sys 53. What's the difference between Datastage Developers and Datastage Designers. What are the skill's
required for this? A. Datastage developer is one how will code the jobs.datastage designer is how will desgn the job, i mean he will deal with blue prints and he will design the jobs the stages that are required in developing the code A.Datastage designer is a client component in datastage7.5,8.0.1,etc but Datastage developer is an ETL developer or designed the jobs by using datastage designer client component. Is it right

54. How can you do incremental load in datastage? A. Incremental load means daily load.
when ever you are selecting data from source, select the records which are loaded or updated between the timestamp of lastsuccessful load and todays load start date and time.

for this u have to pass parameters for those two dates. store the last rundate and time in a file and read the parameter through job parameters and state second argument as currentdate and time.

55. Do u know about METASTAGE? A. MetaStage is used to handle the Metadata which will be very useful for data lineage and data
analysis later on. Meta Data defines the type of data we are handling. This Data Definitions are stored in repository and can be accessed with the use of MetaStage.

A. metastage is used to store medatata


*these definitions r stored in can be stored in repository and can be accessed using meta stage *it helps for data lineage and data analysis *it defines type of data we r handling.

56. Types of Parallel Processing? A. Parallel Processing is broadly classified into 2 types.
a) SMP - Symmetrical Multi Processing. b) MPP - Massive Parallel Processing.

A. Types of parallel processing:


1. Pipeline parallelism 2. Partition parallelism Datastage uses combination of both the types.

57. Did you work in UNIX environment? A. some times u need to write unix progrms in back round !
like batch programs ! bcz data stage can invoke a batch processing in every 24 hrs . soo.......unix must... so that we can run the unix prog in back round even min/ hrs.

58. It is possible to run parallel jobs in server jobs? A. No, It is not possible to run Parallel jobs in server jobs. But Server jobs can be executed in Parallel
jobs.

59. What is iconv and oconv functions? A. Iconv( )-----converts string to internal storage format
Oconv( )----converts an expression to an output format

60. If the size of the Hash file exceeds 2GB..What happens? Does it overwrite the current rows? A. can we retain the removed duplicate values...If yes, How can we do that?
( This answer exactly I dont know)

61. What Happens if RCP is disable ? A. Runtime column propagation (RCP): If RCP is enabled for any job, and specifically for those stage
whose output connects to the shared container input, then meta data will be propagated at run time, so there is no need to map it at design time. If RCP is disabled for the job, in such case OSH has to perform Import and export every time when the job runs and the processing time job is also increased.

62. What is the difference between Datastage and Datastage TX? A. Its a critical question to answer, but one thing i can tell u that Datastage Tx is not a ETL tool & this is
not a new version of Datastage 7.5. Tx is used for ODS source ,this much i know

A. DS TX is not a ETL tool. It comes in a category of EAI.


TX helps in connecting varios frontend to diff backends.it has a inbuilt plugin for many diff sources and targets.In short it can make ur application platform independent.It can also parse data.

63. Explain the differences between Oracle8i/9i? A. Mutliproceesing,databases more dimesnionsal modeling 64. Purpose of using the key and difference between Surrogate keys and natural key? A. We use keys to provide relationships between the entities(Tables). By using primary and foreign key
relationship, we can maintain integrity of the data. The natural key is the one coming from the OLTP system. The surrogate key is the artificial key which we are going to create in the target DW. We can use thease surrogate keys insted of using natural key. In the SCD2 scenarions surrogate keys play a major role A. natural key::A key that is formed of attributes that already exist in the real world. eg:SSN id. surrogate key::it can be thought of as a replacement of the natural key that has no business meaning.eg:sequence in oracle. ps:I think there is a bug in the datastage 8.0.1 surrogate key generator.It seems to be very much inconsistent.

65. Where we use link partitioner in data stage job?explain with example? A. We use Link Partitioner in DataStage Server Jobs.The Link Partitioner stage is an active stage which
takes one input and allows you to distribute partitioned rows to up to 64 output links. A. The 100 employees salaries in one organisation .i want to load >300 emps in one target and <250 and >450 in one target that time go for link partioner stage.

66. What are the Steps involved in development of a job in DataStage? A. The steps required are:
select the datasource stage depending upon the sources for ex:flatfile,database, xml etc select the required stages for transformation logic such as transformer,link collector,link partitioner, Aggregator, merge etc select the final target stage where u want to load the data either it is datawatehouse, datamart, ODS,staging etc.

67. How can we pass parameters to job by using file? A. you can do this, by passing parameters from unix file, and then calling the execution of a datastage
job. the ds job has the parameters defined (which are passed by unix).

68. What are the environment variables in datastage?give some examples? A. Theare are the variables used at the project or job level.We can use them to to configure the job
ie.we can associate the configuration file(Wighout this u can not run ur job), increase the sequential or dataset read/ write buffer. ex: $APT_CONFIG_FILE Like above we have so many environment variables. Please go to job properties and click on "add environment variable" to see most of the environment variables.

69. Will the data stage consider the second constraint in the transformer once the first condition is
satisfied ( if the link odering is given)? A. Will Datastage consider the second constraint in the transformer if the first constraint is satisfied (if link ordering is given)?" Answer: Yes.

70. How can we join one Oracle source and Sequential file?. A. Join and look up used to join oracle and sequential file. 71. What does separation option in static hash-file mean? A. The different hashing algorithms are designed to distribute records evenly among the groups of the
file based on charecters and their position in the record ids. When a hashed file is created, Separation and Modulo respectively specifies the group buffer size and the number of buffers allocated for a file. When a Static Hashfile is created, DATASTAGE creates a file that contains the number of groups specified by modulo. Size of Hashfile = modulus(no. groups) * Separations (buffer size)

72. How do you pass filename as the parameter for a job? A. 1. Go to DataStage Administrator->Projects->Properties->Environment->UserDefined. Here you can
see a grid, where you can enter your parameter name and the corresponding the path of the file. 2. Go to the stage Tab of the job, select the NLS tab, click on the "Use Job Parameter" and select the parameter name which you have given in the above. The selected parameter name appears in the text

box beside the "Use Job Parameter" button. Copy the parameter name from the text box and use it in your job. Keep the project default in the text box.

73. How do you track performance statistics and enhance it? A. Through Monitor we can view the performance statistics. 74. how to improve the job performance? A. by using partition technics we can improve the performance
like hash,modules,range,random etc .. 75. What are Routines and where/how are they written and have you written any routines before? A. RoutinesRoutines are stored in the Routines branch of the DataStage Repository,where you can create, view, or edit them using the Routine dialog box. Thefollowing program components are classified as routines:? Transform functions. These are functions that you can use whendefining custom transforms. DataStage has a number of built-intransform functions which are located in the Routines Examples Functions branch of the Repository. You can also defineyour own transform functions in the Routine dialog box.? Before/After subroutines. When designing a job, you can specify asubroutine to run before or after the job, or before or after an activestage. DataStage has a number of built-in before/after subroutines,which are located in the Routines Built-in Before/Afterbranch in the Repository. You can also define your ownbefore/after subroutines using the Routine dialog box.? Custom UniVerse functions. These are specialized BASIC functionsthat have been defined outside DataStage. Using the Routinedialog box, you can get DataStage to create a wrapper that enablesyou to call these functions from within DataStage. These functionsare stored under the Routines branch in the Repository. Youspecify the category when you create the routine. If NLS is enabled.

76. How to handle the rejected rows in datastage? A. We can handle rejected rows in two ways with help of Constraints in a Tansformer.1) By Putting on
the Rejected cell where we will be writing our constarints in the properties of the Transformer2)Use REJECTED in the expression editor of the ConstraintCreate a hash file as a temporory storage for rejected rows. Create a link and use it as one of the output of the transformer. Apply either ofthe two stpes above said on that Link. All the rows which are rejected by all the constraints will go to the Hash File.

77. What is the transaction size and array size in OCI stage?how these can be used? A. Transaction Size - This field exists for backward compatibility, but it is ignored for release 3.0 and
later of the Plug-in. The transaction size for new jobs is now handled by Rows per transaction on the Transaction Handling tab on the Input page. Rows per transaction - The number of rows written before a commit is executed for the transaction. The default value is 0, that is, all the rows are written before being committed to the data table. Array Size - The number of rows written to or read from the database at a time. The default value is 1, that is, each row is written in a separate statement.

78. What is SQL tuning? how do you do it ? A.Sql tunning can be done using cost based optimization
this parameters are very important of pfile sort_area_size , sort_area_retained_size,db_multi_block_count,open_cursors,cursor_sharing optimizer_mode=choose/role.

79. What is Modulus and Splitting in Dynamic Hashed File? A. The modulus size can be increased by contacting your Unix Admin. A. In a Hashed File, the size of the file keeps changing randomly.
If the size of the file increases it is called as "Modulus". If the size of the file decreases it is called as "Splitting". The modulus size can be increase/decrease by contacting your DataStage Admin.

80. Types of views in Datastage Director? A. There are 3 types of views in Datastage Director
a) Job View - Dates of Jobs Compiled. b) Log View - Status of Job last run c) Status View - Warning Messages, Event Messages, Program Generated Messages.

81. How can ETL excel file to Datamart? A. Open the ODBC Data Source Administrator found in the controlpanel/administrative tools.
under the system DSN tab, add the Driver to Microsoft Excel. Then u'll be able to access the XLS file from Datastage.

82. Did you Parameterize the job or hard-coded the values in the jobs? A. Always parameterized the job. Either the values are coming from Job Properties or from a
?Parameter Manager? ? a third part tool. There is no way you will hard?code some parameters in your jobs. The often Parameterized variables in a job are: DB DSN name, username, password, dates W.R.T for the data to be looked against at.

83. How to parametarise a field in a sequential file?I am using Datastage as ETL Tool,Sequential file as
source? A. We cannot parameterize a particular field in a sequential file, instead we can parameterize the source file name in a sequential file.

84. Defaults nodes for datastage parallel Edition? A. Actually the Number of Nodes depend on the number of processors in your system.If your system is
supporting two processors we will get two nodes by default.

85. Differentiate Database data and Data warehouse data? A. By Database, one means OLTP (On Line Transaction Processing). This can be the source systems or
the ODS (Operational Data Store), which contains the transactional data.

A.Database data only for transactional purpose & Data warehouse data only for analytical
purpose for decision making.

86. What is ' insert for update ' in datastage? A. I think 'insert to update' is updated value is inserted to maintain history.

87. How the hash file is doing lookup in server jobs? How is it comparing the key values? A. Hashed File is used for two purpose: 1. Remove Duplicate Records 2. Then Used for reference
lookups.The hashed file contains 3 parts: Each record having Hashed Key, Key Header and Data portion. By using hashed algoritham and the key valued the lookup is faster.

88. What does a Config File in parallel extender consist of? A. Config file consists of the following.
a) Number of Processes or Nodes. b) Actual Disk Storage Location.

A. The APT_Configuration file is having the information of resource disk,node pool,and scratch
information. node information in the since it contains the how many nodes we given to run the jobs, because based on the nodes only data stage will create processors at back end while running the jobs,resource disk means this is the place where exactly jobs will be loading,scratch information will be useful whenever we using the lookups in the jobs. A. APT Config file is the configuration file which defines the nodes for the specific project and actual disk storage.

89. It is possible to access the same job two users at a time in datastage? A. In Datastage 8 it is possible to open the job as Read-Only.In order to do some modification in the
read-only job,take a copy of the job as modification cannot be done in a read only job.This feature is not available in Datastage 7.5. A. On the access front i second ethelvina and if the same job has to be run by two different users at a time it is possible provided the job is a multiple instance job and both the users invoke the job with different instance IDs.

90. What is the difference between drs and odbc stage? A. To answer your question the DRS stage should be faster then the ODBC stage as it uses native
database connectivity. You will need to install and configure the required database clients on your DataStage server for it to work. Dynamic Relational Stage was leveraged for Peoplesoft to have a job to run on any of the supported databases. It supports ODBC connections too. Read more of that in the plug-in documentation.

ODBC uses the ODBC driver for a particular database, DRS is a stage that tries to make it seamless for switching from one database to another. It uses the native connectivities for the chosen target ...

91. How to remove duplicates in server job? A. 1)Use a hashed file stage or
2) If you use sort command in UNIX(before job sub-routine), you can reject duplicated records using -u parameter or 3)using a Sort stage.

92. How can u implement slowly changed dimensions in datastage? explain? A. SCDs are three types Type 1- Modify the change Type 2- Version the modified change Type 3Historical versioning of modified change by adding a new column to update the changed data. A. yeah u can implement SCD's in datastage SCD type1 just use 'insert rows else update rows' or ' update rows else insert rows' in update action of target SCD type2 u have use one hash file to look -up the target ,take 3 instance of target ,give diff condition depending on the process, give diff update actions in target ,use system variables like sysdate ,null. A. We can handle SCD in the following ways Type I: Just overwrite; Type II: We need versioning and dates; Type III: Add old and new copies of certain important fields. *Hybrid Dimensions: Combination of Type II and Type III.

A. yes you can implement Type1 Type2 or Type 3. Let me try to explain Type 2 with time
stamp. Step :1 time stamp we are creating via shared container. it return system time and one key. For satisfying the lookup condition we are creating a key column by using the column generator. Step 2: Our source is Data set and Lookup table is oracle OCI stage. by using the change capture stage we will find out the differences. the change capture stage will return a value for chage_code. based on return value we will find out whether this is for insert , Edit, or update. if it is insert we will modify with current timestamp and the old time stamp will keep as history.

93. can u join flat file and database in datastage?how? A. Yes, we can do it in an indirect way. First create a job which can populate the data from database into
a Sequencial file and name it as Seq_First1. Take the flat file which you are having and use a Merge Stage to join the two files. You have various join types in Merge Stage like Pure Inner Join, Left Outer Join, Right Outer Join etc., You can use any one of these which suits your requirements.

94. What are the differences between the data stage 7.0 and 7.5 in server jobs? A. There are lot of Differences: There are lot of new stages are available in DS7.5 For Eg: CDC Stage
Stored procedure Stage etc..

95. Dimension Modelling types along with their significance? A. Data modellings are two types that are
1 ER-modelling

2 Dimension modelling And Data modelling concepts are 3 types that are ! conceptual model !! logical model !!! phisical model

96. How do you pass the parameter to the job sequence if the job is running at night? A. Two ways
1. Ste the default values of Parameters in the Job Sequencer and map these parameters to job. 2. Run the job in the sequencer using dsjobs utility where we can specify the values to be taken for each parameter.

97. The difference between LOOK UP File Stage and LookUP stage ? A. LookUP stage is used to deal on reference data set with source data .
where as LOOK UP File Stage is used to create the reference data set for the look up stage for to perform the look up operation with the source data.

98. What are types of Hashed File? A. Hashed File is classified broadly into 2 types.
a) Static - Sub divided into 17 types based on Primary Key Pattern. b) Dynamic - sub divided into 2 types i) Generic ii) Specific. Default Hased file is "Dynamic - Type30.

98. What is DS Designer used for - did u use it? A. You use the Designer to build jobs by creating a visual design that models the flow and
transformation of data from the data source through to the target warehouse. The Designer graphical interface lets you select stage icons, drop them onto the Designer work area, and add links.

99. What are orabulk and bcp stages? A. ORABULK is used to load bulk data into single table of target oracle database.
BCP is used to load bulk data into a single table for microsoft sql server and sysbase.

100. Suppose if there are million records did you use OCI? if not then what stage do you prefer? A. Using Orabulk. 101. Tell me the environment in your last projects? A.

102. How do you merge two files in DS? A. Either used Copy command as a Before-job subroutine if the metadata of the 2 files are same or
created a job to concatenate the 2 files into one if the metadata is different.

103. What are the difficulties faced in using DataStage ?


or what are the constraints in using DataStage ? A. 1)If the number of lookups are more? 2) what will happen, while loading the data due to some regions job aborts? 104. How does DataStage handle the user security? A. We have to create users in the Administrators and give the necessary privileges to users.

105. How can I specify a filter command for processing data while defining sequential file output data? A. We have some time called as after job subroutine and before subroutine, with then we can execute
the unix commands. Here we can use the sort Command or the filter Command.

106. What is data set? and what is file set? A. File set:- It allows you to read data from or write data to a file set. The stage can have a single input
link. a single output link, and a single rejects link. It only executes in parallel modeThe data files and the file that lists them are called a file set. This capability is useful because some operating systems impose a 2 GB limit on the size of a file and you need to distribute files among nodes to prevent overruns. Datasets r used to import the data in parallel jobs like odbc in server jobs.

107. Please list out the versions of datastage Parallel , server editions and in which year they are
realized?

A. 108. What happends out put of hash file is connected to transformer ..


What error it throughs? A. If Hash file output is connected to transformer stage the hash file will consider as the Lookup file if there is no primary link to the same Transformer stage, if there is no primary link then this will treat as primary link itself. you can do SCD in server job by using Lookup functionality. This will not return any error code.

109. What is the order of execution done internally in the transformer with the stage editor having
input links on the lft hand side and output links? A. Stage variables, constraints and column derivation or expressions.

110. Dimensional modelling is again sub divided into 2 types? A. a)Star Schema - Simple & Much Faster. Denormalized form.
b)Snowflake Schema - Complex with more Granularity. More normalized form.

111. What is difference b/w stage and informatica ? A. Stages are what we call transformationcs in informatica except that we call them stages in Data
stage. There is no comparision between these two.

112. Is it possible to move the data from oracle ware house to SAP Warehouse using with DATASTAGE
Tool? A. We can use DataStage Extract Pack for SAP R/3 and DataStage Load Pack for SAP BW to transfer the data from oracle to SAP Warehouse. These Plug In Packs are available with DataStage Version 7.5.

113. What is DS Director used for - did u use it? A. Datastage Director is GUI to monitor, run, validate & schedule datastage server jobs. 114. Importance of Surrogate Key in Data warehousing? A. The concept of surrogate comes into play when there is slowely changing dimension in a table.
In such condition there is a need of a key by which we can identify the changes made in the dimensions. These slowely changing dimensions can be of three type namely SCD1,SCD2,SCD3. These are sustem genereated key.Mainly they are just the sequence of numbers or can be alfanumeric values also.

115. What is the default cache size? How do you change the cache size if needed? A. Default read cache size is 128MB. We can incraese it by going into Datastage Administrator and
selecting the Tunable Tab and specify the cache size over thereregardsjagan.

116. What are validations you perform after creating jobs in designer.
What r the different type of errors u faced during loading and how u solve them? A. Check for Parameters. and check for inputfiles are existed or not and also check for input tables existed or not and also usernames,datasource names,passwords like that.

117. What is DS Administrator used for - did u use it? A. The Administrator enables you to set up DataStage users, control the purging of the Repository, and,
if National Language Support (NLS) is enabled, install and manage maps and locales.

118. What r XML files and how do you read data from XML files and what stage to be used? A. In the pallet there is Real time stages like xml-input,xml-output,xml-transformer. 119. What is the flow of loading data into fact & dimensional tables? A. Here is the sequence of loading a datawarehouse?
1. The source data is first loading into the staging area, where data cleansing takes place. 2. The data from staging area is then loaded into dimensions/lookups. 3.Finally the Fact tables are loaded from the corresponding source tables from the staging area.

120. What will you in a situation where somebody wants to send you a file and use that file as an input
or reference and then run job? A. Under Windows: Use the 'WaitForFileActivity' under the Sequencers and then run the job. May be you can schedule the sequencer around the time the file is expected to arrive. B. Under UNIX: Poll for the file. Once the file has start the job or sequencer depending on the file.

121. What about System variables? A. DataStage provides a set of variables containing useful system information that you can access from
a transform or routine. System variables are read-only. @DATE The internal date when the program started. See the Date function. @DAY The day of the month extracted from the value in @DATE. @FALSE The compiler replaces the value with 0. @FM A field mark, Char(254). @IM An item mark, Char(255). @INROWNUM Input row counter. For use in constrains and derivations in Transformer stages. @OUTROWNUM Output row counter (per link). For use in derivations in Transformer stages. @LOGNAME The user login name. @MONTH The current extracted from the value in @DATE. @NULL The null value. @NULL.STR The internal representation of the null value, Char(128). @PATH The pathname of the current DataStage project. @SCHEMA The schema name of the current DataStage project. @SM A subvalue mark (a delimiter used in UniVerse files), Char(252). @SYSTEM.RETURN.CODE Status codes returned by system processes or commands. @TIME The internal time when the program started. See the Time function. @TM A text mark (a delimiter used in UniVerse files), Char(251). @TRUE The compiler replaces the value with 1.

@USERNO The user number. @VM A value mark (a delimiter used in UniVerse files), Char(253). @WHO The name of the current DataStage project directory. @YEAR The current year extracted from @DATE. REJECTED Can be used in the constraint expression of a Transformer stage of an output link. REJECTED is initially TRUE, but is set to FALSE whenever an output link is successfully written.

122. How can I extract data from DB2 (on IBM iSeries) to the data warehouse via Datastage as the ETL
tool. I mean do I first need to use ODBC to create connectivity and use an adapter for the extraction and transformation of data? Thanks so much if anybody could provide an answer? A. You would need to install ODBC drivers to connect to DB2 instance (does not come with regular drivers that we try to install, use CD provided for DB2 installation, that would have ODBC drivers to connect to DB2) and then try out.

123. What happens if the job fails at night? A. If job is fail..


there are some sending options to the mail also.. like from Job sequence/"NOTIFICATION ACTIVITY JOB"

124. What is the max capacity of Hash file in DataStage? A. Take a look at the uvconfig file:
# 64BIT_FILES - This sets the default mode used to # create static hashed and dynamic files. # A value of 0 results in the creation of 32-bit # files. 32-bit files have a maximum file size of # 2 gigabytes. A value of 1 results in the creation # of 64-bit files (ONLY valid on 64-bit capable platforms). # The maximum file size for 64-bit # files is system dependent. The default behavior # may be overridden by keywords on certain commands. 64BIT_FILES 0

125. What other ETL's you have worked with? A. Ab-initio


datasatge EE parllel edition oracle -Etl there are 7 ETL in market !

126. What is the meaning of the following..


1)If an input file has an excessive number of rows and can be split-up then use standard 2)logic to run jobs in parallel 3)Tuning should occur on a job-by-job basis. Use the power of DBMS.

A. If u have SMP machines u can use IPC,link-colector,link-partitioner for performance tuning


If u have cluster,MPP machines u can use parallel jobs.

127. what is the mean of Try to have the constraints in the 'Selection' criteria of the jobs itself. This will
eliminate the unnecessary records even getting in before joins are made? A. This means try to improve the performance by avoiding use of constraints wherever possible and instead using them while selecting the data itself using a where clause. This improves performace.

128. How is datastage 4.0 functionally different from the enterprise edition now?? what are the exact
changes? A. There are lot of Changes in DS EE. CDC Stage, Procedure Stage, Etc..........

129. How to implement routines in data stage? A. There are 3 kind of routines is there in Datastage.
1.server routines which will used in server jobs. these routines will write in BASIC Language 2.parlell routines which will used in parlell jobs These routines will write in C/C++ Language 3.mainframe routines which will used in mainframe jobs

130. Does Enterprise Edition only add the parallel processing for better performance?
Are any stages/transformations available in the enterprise edition only? A. ? DataStage Standard Edition was previously called DataStage and DataStage Server Edition. ? DataStage Enterprise Edition was originally called Orchestrate, then renamed to Parallel Extender when purchased by Ascential. ? DataStage Enterprise: Server jobs, sequence jobs, parallel jobs. The enterprise edition offers parallel processing features for scalable high volume solutions. Designed originally for Unix, it now supports Windows, Linux and Unix System Services on mainframes. ? DataStage Enterprise MVS: Server jobs, sequence jobs, parallel jobs, mvs jobs. MVS jobs are jobs designed using an alternative set of stages that are generated into cobol/JCL code and are transferred to a mainframe to be compiled and run. Jobs are developed on a Unix or Windows server transferred to the mainframe to be compiled and run. The first two versions share the same Designer interface but have a different set of design stages depending on the type of job you are working on. Parallel jobs have parallel stages but also accept some server stages via a container. Server jobs only accept server stages, MVS jobs only accept MVS stages. There are some stages that are common to all types (such as aggregation) but they tend to have different fields and options within that stage.

You might also like