You are on page 1of 13

1) Define Data Stage?

A data stage is basically a tool that is used to design, develop and execute various applications to fill multiple tables in data warehouse or data marts. It is a program for Windows servers that extracts data from databases and change them into data warehouses. It has become an essential part of IBM WebSphere Data Integration suite. 2) Explain how a source file is populated? We can populate a source file in many ways such as by creating a SQL query in Oracle, or by using row generator extract tool etc. 3) Name the command line functions to import and export the DS jobs? To import the DS jobs, dsimport.exe is used and to export the DS jobs, dsexport.exe is used. 4) What is the difference between Datastage 7.5 and 7.0? In Datastage 7.5 many new stages are added for more robustness and smooth performance, such as Procedure Stage, Command Stage, Generate Report etc. 5) In Datastage, how you can fix the truncated data error? The truncated data error can be fixed by using ENVIRONMENT VARIABLE ‘ IMPORT_REJECT_STRING_FIELD_OVERRUN’. 6) Define Merge? Merge means to join two or more tables. The two tables are joined on the basis of Primary key columns in both the tables. 7) Differentiate between data file and descriptor file? As the name implies, data files contains the data and the descriptor file contains the description/information about the data in the data files. 8) Differentiate between datastage and informatica? Datastage In datastage, there is a concept of partition, parallelism for node configuration. While, there is no concept of partition and parallelism in informatica for node configuration. Also, Informatica is more scalable than Datastage. Datastage is more user-friendly as compared to Informatica.

we should distribute the file systems to remove bottlenecks. Then we should isolate and solve the problems. It can be called via transformer stage. Join and Merge needs less memory as compared to the Lookup stage. as allow duplicate = false. we should work in increment. if any. we should not use only one flow for performance testing. Last but not the least. 11) What is the method of removing duplicates. Then. compare input requirements and how they treat various records. Thirdly. Such routines are also created in DS manager and can be called from transformer stage. without using any kind of loop. 12) What steps should be taken to improve Datastage jobs? In order to improve performance of Datastage jobs. we should understand and assess the available tuning knobs. Also. There are three types of routines such as. we should evaluate data skews. After that. one by one. we have to first establish the baselines. main frame routines and server routines. Merge and Lookup stage? All the three concepts are different from each other in the way they use the memory storage. It assists in integrating different types of data from various sources. This tool is used to execute multiple jobs simultaneously. without the remove duplicate stage? Duplicates can be removed by using Sort stage. parallel routines.9) Define Routines and their types? Routines are basically collection of functions that is defined by DS manager. we should not include RDBMS in start of testing phase. 13) Differentiate between Join. We can use the option. 15) Define Job control? Job control can be best performed by using Job Control Language (JCL). 16) Differentiate between Symmetric Multiprocessing and Massive Parallel Processing? . 14) Explain Quality stage? Quality stage is also known as Integrity stage. Secondly. 10) How can you write parallel routines in datastage PX? We can write parallel routines in C or C++ compiler.

e. while compiling a job.e.In Symmetric Multiprocessing. It is also used to store the node information. Therefore. 17) What are the steps required to kill the job in Datastage? To kill the job in Datasatge. While in Massive Parallel processing.”Another Date Format”). the data is saved in the memory first and then the lookup is performed. 23) How a server job can be converted to a parallel job? We can convert a server job in to a parallel job by using IPC stage and Link Collector. the hardware resources are shared by processor. 22) Name the different types of Lookups in Datastage? There are two types of Lookups in Datastage i. The processor has one operating system and it communicates through shared memory. 24) Define Repository tables in Datastage? . This type of processing is also known as Shared Nothing. It is faster than the Symmetric Multiprocessing.”Existing Date Format”). In Sparse lkp. 19) How to manage date conversion in Datastage? We can use date conversion function for this purpose i. Oconv(Iconv(Filedname. Normal lkp and Sparse lkp. 20) Why do we use exception activity in Datastage? All the stages after the exception activity in Datastage are executed in case of any unknown error occurs while executing the job sequencer. the Datastage engine verifies that whether all the given properties are valid or not. since nothing is shared in this. executing a job. the Datastage engine verifies whether all the required properties are provided or not. the Sparse lkp is faster than the Normal lkp. In other case. 21) Define APT_CONFIG in Datastage? It is the environment variable that is used to identify the *. the data is directly saved in the database. disk storage information and scratch information. we have to kill the respective processing ID. validating a job means. While validating. the processor access the hardware resources exclusively. In Normal lkp.apt file in Datastage. 18) Differentiate between validated and Compiled in the Datastage? In Datastage.

IConv () is basically used to convert formats for system to understand. OConv () and IConv() functions are used to convert formats from one format to another i.e. 25) Define OConv () and IConv () functions in Datastage? In Datastage. 28) Differentiate between Hash file and Sequential file? The only difference between the Hash file and Sequential file is that the Hash file saves data on hash algorithm and on a hash key value. ODS is a mini data warehouse. routines are of two types i. 30) How a routine is called in Datastage job? In Datastage. conversions of roman numbers. searching in Hash file is faster than in sequential file. OConv () is used to convert formats for users to understand. select Usage Analysis and that’s it. date. time. 31) Differentiate between Operational Datastage (ODS) and Data warehouse? We can say. It can be centralized as well as distributed. numeral ASCII etc. radix. An ODS doesn’t contain information for more than 1 year while a data warehouse contains detailed information regarding the entire business. 26) Explain Usage Analysis in Datastage? In Datastage. Before Sub Routines and After Sub Routines. While. Launch Datastage Manager and right click the job. we can use the System variable @INROWNUM. 27) How do you find the number of rows in a sequential file? To find rows in sequential file. Basis on this hash key feature. We can call a routine from the transformer stage in Datastage.e. while sequential file doesn’t have any key value to save the data. Then.In Datastage. the Repository is another name for a data warehouse. Usage Analysis is performed within few clicks. 29) How to clean the Datastage repository? We can clean the Datastage repository by using the Clean Up Resources functionality in the Datastage Manager. 32) NLS stands for what in Datastage? .

These languages have same scripts as English language. and Spanish etc. A Datastage project contains Datastage jobs. 38) Define Project in Datastage? Whenever we launch the Datastage client. 37) Name the third party tools that can be used in Datastage? The third party tools that can be used in Datastage.e. 34) How can one implement the slowly changing dimensions in Datastage? Slowly changing dimensions is not a concept related to Datastage. are Autosys. I have worked with these tools and possess hands on experience of working with these third party tools. in the data. built-in components and Datastage Designer or User-Defined components. It can be used to incorporate other languages such as French. Static Hash File and Dynamic Hash File. The dynamic hash file is used when we don’t know the amount of data from the source file. TNG and Event Coordinator. If you need to use more than 20 stages then it is better to use another job for those stages. 39) How many types of hash files are there? There are two types of hash files in DataStage i. we can drop the index before loading the data in target by using the Direct Load functionality of SQL Loaded Utility. we are asked to connect to a Datastage project. 33) Can you explain how could anyone drop the index before loading the data in target in Datastage? In Datastage. Datastage is used for ETL purpose and not for slowly changing dimensions. it is recommended. German.NLS means National Language Support. 36) How complex jobs are implemented in Datstage to improve performance? In order to improve performance in Datastage. not to use more than 20 stages in every job. required for processing by data warehouse. 35) How can one find bugs in job sequence? We can find bugs in job sequence by using DataStage Director. 40) Define Meta Stage? . The static hash file is used when limited amount of data is to be loaded in the target database.

In Datastage. we use Surrogate Key instead of unique key. 43) What is size of a transaction and an array means in a Datastage? Transaction size means the number of row written before committing the records in a table. We can either place the rejected rows in the properties of a transformer or we can create a temporary storage for rejected rows with the help of REJECTED command. Log View and Status View.e. Surrogate key is mostly used for retrieving data faster. An array size means the number of rows written/read to or from the table respectively. I have worked in UNIX environment. MetaStage is used to save metadata that is helpful for data lineage and data analysis. 47) Differentiate between ODBC and DRS stage? DRS stage is faster than the ODBC stage because it uses native databases for connectivity. It uses Index to perform the retrieval operation. Job View. 46) How rejected rows are managed in Datastage? In the Datastage. 44) How many types of views are there in a Datastage Director? There are three types of views in a Datastage Director i. the rejected rows are managed through constraints in transformer. This knowledge is useful in Datastage because sometimes one has to write UNIX programs such as batch programs to invoke batch processing etc. 41) Have you have ever worked in UNIX environment and why it is useful in Datastage? Yes. Transform and Load) and Datastage TX is a tool from EAI (Enterprise Application Integration). 42) Differentiate between Datastage and Datastage TX? Datastage is a tool from ETL (Extract. 45) Why we use surrogate key? In Datastage. 48) Define Orabulk and BCP stages? .

The BCP stage is used to load large amount of data in one target table of Microsoft SQL Server. Link Collector . Rejected data is typically bad data like duplicates of Primary keys or nullrows where data is expected. How did you connect to DB2 in your last project? Ans: Using DB2 ODBC drivers. Link Partitioner is used to divide data into different parts through certain partitioning methods. So Reject link has to be defined every Output link you wish to collect rejected data.Converts an expression to an output format.Used for partitioning the data. The following are different types of routines: 1) Transform functions 2) Before-after job subroutines 3) Job Control routines What are OConv () and Iconv () functions and where are they used? Ans: IConv() . Link Collector is used to gather data from various partitions/segments to a single data and save it in the target table.Orabulk stage is used to load large amount of data in one target table of Oracle database. . 50) Why do we use Link Partitioner and Link Collector in Datastage? In Datastage.Used for collecting the partitioned data. If worked with DS6. where you can create.Converts a string to an internal storage format OConv() . 49) Define DS Designer? The DS Designer is used to design work area and add various links to it. view or edit. More questions How did you handle reject data? Ans: Typically a Reject-link is defined and the rejected data is loaded back into data warehouse.0 and latest versions what are Link-Partitioner and LinkCollector used for? Ans: Link Partitioner . What are Routines and where/how are they written and have you written any routines before? Ans: Routines are stored in the Routines branch of the DataStage Repository.

What is DS Administrator used for? Ans: The Administrator enables you to set up DataStage users. if National Language Support (NLS) is enabled. Do you know about INTEGRITY/QUALITY stage? Ans: Qulaity Stage can be integrated with DataStage. and. we can go to datastage director from datastage designer it self. control the purging of the Repository. and add links. Explain the differences between Oracle8i/9i? Ans: Oracle 8i does not support pseudo column sysdate but 9i supports Oracle 8i we can create 256 columns in a table but in 9i we can upto 1000 columns(fields) How do you merge two files in DS? Ans: Either use Copy command as a Before-job subroutine if the metadata of the 2 files are same or create a job to concatenate the 2 files into one if the metadata is different.Explain METASTAGE? Ans: MetaStage is used to handle the Metadata which will be very useful for data lineage and data analysis later on. match. In general we use Type-30 dynamic Hash files. install and manage maps and locales. The Data file has a default size of 2Gb and the overflow file is used if the data exceeds the 2GB size. This Data Definitions are stored in repository and can be accessed with the use of MetaStage. drop them onto the Designer work area. In Quality Stage we have many stages like investigate. What is DS Director used for? Ans: datastage director is used to run the jobs and validate the jobs. What is Hash file stage and what is it used for? . What is DS Manager used for? Ans: The Manager is a graphical tool that enables you to view and manage the contents of the DataStage Repository What are Static Hash files and Dynamic Hash files? Ans: As the names itself suggest what they mean. Meta Data defines the type of data we are handling. survivorship like that so that we can do the Quality related works and we can integrate with datastage we need Quality stage plugin to achieve the task. The Designer graphical interface lets you select stage icons. What is DS Designer used for? Ans: You use the Designer to build jobs by creating a visual design that models the flow and transformation of data from the data source through to the target warehouse.

dsexport.exe. B. then Fact tables. It is Clear table blah blah and that is a delete from statement.000 rows.Ans: Used for Look-ups. Does the selection of 'Clear the table and Insert rows' in the ODBC stage send a Truncate statement to the DB or does it do some kind of Delete logic. How are the Dimension tables designed? Ans: Find where data for this dimension are located. How would call an external Java function which are not supported by DataStage? Ans: Starting from DS 6. Determine how to maintain changes to this dimension.exports the DataStage components. Tell me one situation from your last project. Why do we have to load the dimensional tables first. It is like a reference table. B. Change fact table and DW population routines.exe. OCI tables for better performance.imports the DataStage components. Have an option either cleaning/deleting the loaded data and then run the fixed job or run the job again from the row the job has aborted.0 we have the ability to call external Java functions using a Java package from Ascential. I had to stage the data before sending to the transformer to make the jobs run faster. What are the command line functions that import and export the DS jobs? Ans: A. It is also used in-place of ODBC. On an OCI stage such as Oracle. To make sure the load is proper we opted the former. dsimport. Ans: There is no TRUNCATE on ODBC stages. The job aborts in the middle of loading some 500. They are radically different in permissions (Truncate requires you to have alter table permissions where Delete doesn't). then load the Aggregator tables (if any). you do have both Clear and Truncate options. where you had faced problem and How did you solve it? Ans: The jobs in which data is read directly from OCI stages are running extremely slow. In this case we can even use the command line to invoke the . then fact tables: Ans: As we load the dimensional tables the keys (primary) are generated and these keys (primary) are Foreign keys in Fact tables. What is the utility you use to schedule the jobs on a UNIX server other than using Ascential Director? Ans: Use crontab utility along with dsexecute() function along with proper parameters passed. Figure out how to extract this data. How will you determine the sequence of jobs to load into data warehouse? Ans: First we execute the jobs that load the data into Dimension tables.

2/6. Export the whole project as a . Ans: A. It can be a collection of key values called as composite primary key.0 and latest versions what are Link-Partitioner and LinkCollector used for? Ans: Link Partitioner .02' .Used for partitioning the data.2 If worked with DS6. Under Windows: Use the 'WaitForFileActivity' under the Sequencers and then run the job. In some cases were we need to connect to DB2 for look-ups as an instance then we used ODBC drivers to connect to DB2 (or) DB2-UDB depending the situation and availability. What versions of DS you worked with? Ans: DS 7. May be you can schedule the sequencer around the time the file is expected to arrive.ODBC drivers to connect to AS400/DB2. repeats ] How did you connect with DB2 in your last project? Ans: Most of the times the data was sent to us in the form of flat files. 'iSeries Access ODBC Driver 9. Read the String functions in DS Ans: Functions like [] -> sub-string function and ':' -> concatenation operator Syntax: string [ [ start. Certainly DB2-UDB is better in terms of performance as you know the native drivers are always better than ODBC drivers.Used for collecting the partitioned data. Differentiate Primary Key and Partition Key? Ans: Primary Key is a combination of unique and not null. instance.0/5. What will you in a situation where somebody wants to send you a file and use that file as an input or reference and then run job. ] length ] string [ delimiter. Under UNIX: Poll for the file. Once the file has start the job or sequencer depending on the file. How do you rename all of the jobs to support your new File-naming conventions? Ans: Create a Excel spreadsheet with new and old names. Partition Key is a just a part of Primary Key How did you handle an 'Aborted' sequencer? Ans: In almost all cases we have to delete the data inserted by this from DB manually and fix the job and then run the job again. The data is dumped and sent to us.Link Collector .02.0.Java function and write the return values from the Java program (if any) and use that files as a source in DataStage job. What are Sequencers? Ans: Sequencers are job control programs that execute other jobs with preset Job parameters. B.00.

How did you connect to DB2 in your last project? Ans: Using DB2 ODBC drivers. Is it possible to calculate a hash total for an EBCDIC file and have the hash total stored as EBCDIC using Datastage ? Ans: Currently. Function to convert mm/dd/yyyy format to yyyy-dd-mm is Oconv(Iconv(Filedname. b) "Oconv" function . Does the selection of 'Clear the table and Insert rows' in the ODBC stage send a Truncate statement to the DB or does it do some kind of Delete logic.Internal Convertion.Symmetrical Multi Processing. How to handle Date convertions in Datastage ? Convert a mm/dd/yyyy format to yyyy-dd-mm? Ans: We use a) "Iconv" function . It is Clear table blah blah and that is a delete from statement. Write a Perl program.External Convertion.dsx.Massive Parallel Processing. even tho the individual records are stored as EBCDIC. It can be a collection of key values called as composite primary key. you do have both Clear and Truncate options. When should we use ODS? Ans: DWH's are typically read only. On an OCI stage such as Oracle. a) SMP . What are the types of Parallel Processing? Ans: Parallel Processing is broadly classified into 2 types. a) SMP .Massive Parallel Processing. trickle fed constantly What is the default cache size? How do you change the cache size if needed? Ans: Default cache size is 256 MB. Explain the types of Parallel Processing? Ans: Parallel Processing is broadly classified into 2 types. the total is converted to ASCII. batch updated on a scheduleODS's are maintained in more real time."D/M Differentiate Primary Key and Partition Key? Ans: Primary Key is a combination of unique and not null. b) MPP . which can do a simple rename of the strings looking up the Excel file. Ans: There is no TRUNCATE on ODBC stages. How do you merge two files in DS? Ans: Either used Copy command as a Before-job subroutine if the metadata of the 2 files are same or created a job to concatenate the 2 files into one if the metadata is different. b) MPP .Symmetrical Multi Processing. We can incraese it by going into Datastage Administrator and selecting the Tunable Tab and specify the cache size over there. . Partition Key is a just a part of Primary Key.

We can incraese it by going into Datastage Administrator and selecting the Tunable Tab and specify the cache size over there. So you have to make the necessary changes to these Sequencers. . Export the whole project as a dsx. Be cautious that the name of the jobs has also been changed in your job control jobs or Sequencer jobs.What is the default cache size? How do you change the cache size if needed? Ans: Default cache size is 256 MB. which can do a simple rename of the strings looking up the Excel file. What are Sequencers? Ans: Sequencers are job control programs that execute other jobs with preset Job parameters. Recompile all jobs. dsjob -run -jobstatus projectname jobname How do you rename all of the jobs to support your new File-naming conventions? Ans: Create a Excel spreadsheet with new and old names. How do you execute Datastage job from command line prompt? Ans: Using "dsjob" command as follows. Then import the new dsx file probably into a new project for testing. Write a Perl program.