Page 1 of 57

1. Main differences b/w DataStage 7.5.2 to 8.0.1 A. In DataStage 7.5.2 we have manager as client. In 8.0.1 we don’t have any manager
client. The manager client is embedded into designer client.

B. In 7.5.2 quality stage has separate designer. In 8.0.1, quality stage is integrated in C. In 7.5.2 code and metadata is stored in file based system. In 8.0.1 code is a file
based system where as metadata is stored in database. designer.

D. In 7.5.2 we required operating system authentications. In 8.0.1, we require operating E. In 7.5.2 we don’t have range lookup. In 8.0.1, we have range lookup. F. In 7.5.2 a single join stage can't support multiple references. In 8.0.1 a single join
stage can support multiple references. system authentications and DataStage authentications.

G. In 7.5.2, when a developer opens a particular job, and another developer wants to
open the same job, that job can't be opened. In 8.0.1, it can be possible when a developer opens a particular job and another developer wants to open the same job then it can be opened as read only job. H. In 8.0.1 a compare utility is available to compare 2 jobs, one in development another is in production. In 7.5.2 it is not possible. I. In 8.0.1 quick find and advance find features are available, in 7.5.2 not available. J. In 7.5.2 first time one job is run and surrogate key’s generated from initial to n value. Next time the same job is compiled and run again surrogate key is generated from initial to n. automatic increment of surrogate key is not available in 7.5.2 but in 8.0.1 surrogate key is incremented automatically. a state file is used to store the maximum value of surrogate key.


Data Modeling

1) E-R Modeling (Entity-Relationship Modeling) OLTP

a) Logical Modeling: Logical modeling deals with gathering the business requirements and converting them into a model. b) Physical Modeling: Physical modeling involves the actual design of a database according to the requirements that were established during logical modeling. 2) Dimensional Modeling Dimensional modeling is divided into 2 types. a) Star Schema - Simple & Denormalized form. Much Faster compared to snow flake. b) Snowflake Schema - Complex with more Granularities. More normalized form. Slow.


Importance of Surrogate Key in Data warehousing?

Surrogate Key is the Primary Key in a Dimension table. It is independent of the underlying database i.e. Surrogate Key is not affected by the changes going on with the source database. 3. Differentiate between Database data and Data warehouse data? Data in a Database is for OLTP. a) Detailed or Transactional b) Both Readable and Writable. c) Current. d) Volatile 4. Data in a DWH is for OLAP. a) for Analysis & BI. b) We can only read from the DWH c) Historical data d) non-volatile

What is the flow of loading data into fact & dimensional tables?

Page 2 of 57

First Data should be loaded into Dimension tables where the surrogate keys are generated and then to Fact tables. The surrogate keys are referenced as foreign keys in Fact tables.


Orchestrate Vs DataStage Parallel Extender?

Orchestrate itself is an ETL tool with extensive parallel processing capabilities and running on UNIX platform. DataStage used Orchestrate with DataStage XE (Beta version of 6.0) to incorporate the parallel processing capabilities. Now DataStage has purchased Orchestrate and integrated it with DataStage XE and released a new version DataStage 6.0 i.e. Parallel Extender.


What are Stage Variables, Derivations and Constraints?

Stage Variable - An intermediate processing variable that retains value during read and doesn’t pass the value into target column (in case of comparison, if-else). We can also use this for computing the result which is useful in multiple columns of the target table(s). Constraints - Constraint is like a filter condition which limits the number of records coming from Input according to business rule. Derivation - Expression that specifies the value to be passed to the target column.


DataStage Lookup types?

Normal Lookup: In this whenever DataStage wants to look up, it just places the target table data into buffer. It is used when the reference table or file contains less data. Sparse Lookup: To look up, it fires a SQL query to the database instead of placing into buffer. To use sparse look up your target database must be larger than source table and your target should be a database of any type. Range lookup: This will help you to search records based on particular range. It will search only that particular range records and provides good performance instead of searching the entire record set. 8. Explain about Error Handling in DataStage and best practices.

In DataStage sequence there is an "Exception Handler" activity. When you are calling your jobs from a DataStage Sequence you should do following : Step 1: Go to job properties of master sequence and check the checkbox "Add Checkpoints so sequence is restartable on Failure" and "Automatically handle activities that fail". Step2: In your sequence use an exception handler activity. After exception handler activity you may include an email notification activity. Here if the job fails the handle will go to the exception handler activity and an email will be sent notifying the user that a sequence has failed. 9. What are the different types of links in DataStage?

They are 3 different links in the DataStage. 1. stream link means straight link 2. Reference link it acts like a lookup. 3. Rejected link used in parallel jobs 10. How to use Excel file as input in DataStage?

Page 3 of 57

You can use excel file as input by importing the .xls file. step1 --> Go to Administrative Tools -> Data Source (ODBC) --> System DSN. Click on Add button and configure the corresponding .xsl file in your system DSN. Make sure that workbook contains the name of your excel sheet. Step2 --> Import the excel file into the DataStage as ODBC table definition. Step3 --> Use ODBC stage as input stage. You should be able to use excel file very effectively. Please let me know if you face any problem.


What is the default cache size? How do you change the cache size if needed?

Default cache size is 128 MB. We can increase it by going into DataStage Administrator and selecting the “Tunables” Tab and specify the cache size over there. 12. Differentiate Primary Key and Partition Key?

Primary key is the key that we define on a table column or set of columns (composite PK) to make sure that all the rows in that table column or columns are unique. Partition key is the key that we use while partitioning a table (in database) for processing the source records in ETL. We should define the partition based on the stages (in DataStage) or transformations (in Informatica) we use in a job (in DataStage) or mapping (in Informatica). To improve the target load process, we partition the data. 13. How to remove the locked jobs using DataStage? Go to Director -- Tools --- “Clear the Job Resources” option there u find the PID Number. Then select that PID and click a logout. Your job gets released.


How do you execute DataStage job from UNIX command line prompt?

/opt/Ascential/DataStage/DSEngine/bin/dsjob -server $ServerName \ -user $User \ -password $ServerPassword \ -run -wait -mode NORMAL \ -param FmtDirectory=$FMT_PATH \ -param ReFmtDirectory=$REFMT_PATH \ -param Twin=$TWIN \ -param Group=$GROUP \ -param PrcsDte=$PROCDTE \ -param Cmelnd=$CME_IND\ -param Mndlnd=$MND_IND \ IDL $DSSEQUENCE.${APPLCDE}


What are types of Hashed File?

Hashed File is classified into 2 types. a) Static - Sub divided into 17 types based on the Primary Key Pattern. b) Dynamic - Default Hash file is "Dynamic - Type Random 30 D". These are the three types of files will be created when we create a Hash file “.data .type .over “.

16. How to call a Stored Procedure which is in SQL Server database in DataStage job?

Page 4 of 57

In ODBC stage properties -- Click on OUTPUTS --- General --- select “Stored Procedure” -- browse to get the stored procedure. We can use a Stored Procedure stage while designing parallel jobs. 17. Explain what are SUMMARY TABLES and use? Summary tables contain the summarized or "Aggregated" data to tune up query performance. Example: Suppose that we have a table which contains following columns: a) Medicine_Name b) Sell c) Time Now the business requirement is to get the sales of medicine on monthly basis. Here if a query is fired to aggregate the medicine cell will have to use aggregation to get the monthly sales each time. Instead of that if a summary table is created which contains the monthly sales records the query cost will decrease as the query will directly get the aggregated data from the summary table. In this scenario the summary table will contain following columns: a) Medicine_Name b) Sell c) MONTH Hence for sell of all days of month only one aggregated record will come in Summary table. i.e. for each month one row will be there in summary table containing aggregated data. This will increase the performance of a query.

18. Containers: Usage and Types?
Container is a collection of stages used for the purpose of Reusability. There are 2 types of Containers. a) Local Container: Job Specific b) Shared Container: for any job within a project. Again 2 types. 1. Server shared container: Used in server jobs (can also be used in parallel jobs). 2. Parallel shared container: Used only in parallel jobs.

19. Where the DataStage hash files are stored when they are created?
There are two ways of specifying where the Hash Files will be created. a) Account: This is generally the project name. If this is selected the Hash file will be generated in the hash file path specific to the project/Account. b) Folder Location: If this option is selected the Hash file will be generated in the directory specified by the user. 20. How do you create a sequencer in DataStage parallel extender? Sequencer is an activity in the DataStage sequence which will run the output job connected to it in the sequence based on the input job status (ALL/ANY). Steps to create a sequence:

Page 5 of 57

a) Go to Menu. In the Menu select File->New. b) In the window that appears select "Job Sequence". The other way of running the jobs in a sequence is using "Job Control" options in DataStage. For this – a) Open a parallel job. b) Go to Job Properties-> Job Control Here you can select the jobs which should run before these parallel jobs. 21. How and where do we use or integrate Java with DataStage? Java can be integrated into DataStage using Java Pack. There are two stages which come in java pack a) Java Client: It is used for Sources or targets. b) Java Transformer: It is used for processing row by row. Both stages allow using the Java Classes in your DataStage jobs. 22. Where does director create its log files? The log files are stored in server. At the client side (windows) the log files are not stored. The logs are maintained in RT_Logn files on DataStage server. Deleting logs: You can set Auto purge of job logs from DataStage administrator to delete the log files automatically.

23. Compare and Contrast ODBC and Plug-In stages?
ODBC: a) Poor Performance. b) Can be used for Variety of Databases. c) Can handle Stored Procedures. 24. Use of Terminator Activity? A terminator Activity Sequence stage can be used in a job sequence to ensure that the sequence is stopped cleanly if certain situations arise. You can have multiple terminator activities and can place them anywhere in the sequence. They are connected to other stages by triggers which specify when a terminator will be invoked. 25. Types of Parallel Processing? DS offers 2 types of parallelism to take advantage of the above hardware: 1. Pipeline Parallelism 2. Partition Parallelism Hardware wise there are 3 types of parallel processing systems available: 1. SMP (Symmetric Multi Processing: multiple CPUs, shared memory, single OS) 2. MPP (Massively Parallel Processing: multiple CPUs each having a Personal set of resources Memory, OS …etc, but physically housed on the same Machine) 3. Clusters: same as MPP, but physically dispersed (not on the same box & Connected via high speed Networks). Plug-In: a) Good Performance. b) Database specific. c) Cannot handle Stored Procedures.

How can u see.ds to delete the dataset. orchadmin <delete | del | rm> <dataset>. This enables much more efficient access. When do you use separate sort stage and inbuilt sort utility in a stage? What is the difference? 29. 35. How can u run the job in sequential mode? By specifying 1 node in APT_CONFIG_FILE.definition of the view and rows resulted from execution of the view. If you want to abort your job based on the no. What will we do? Generally we can ignore warning messages shown in director. 33. Materialized View stores both . How can we run it in sequential? Transformer properties -- Advanced -- Execution mode -- there u can select default/ sequential/ parallel mode of execution.REFRESH 28. In DataStage. If a job compiled with "job completed successfully" but The DS director window of Log page shows warning messages. You can refresh Materialized View using DBMS_MVIEW. of nodes while running the job in UNIX? APT_CONFIG_FILE 30. A transformer stage is running in parallel. We can select the stages and define the functionality using them. at the cost of some data being potentially out-of-date. delete using the “dataset management” option in designer or director. 31. It is most useful in data warehousing scenarios. where frequent queries of the actual base tables can be extremely expensive. A sequential file has 4 duplicate records? How to get 3rd one? 32. delete dataset in UNIX and windows? You can use orchadmin dump <dataset>. Main disadvantage of materialized view is that its contents get outdated when underlying base table is modified. What is the normal view and materialized view? A view is a virtual table representing the result of a database query. of warning messages. A materialized view takes a different approach in which the query result is cached as a concrete table that may be updated from the original base tables from time to time. . 27. Whenever an ordinary view's table is queried or updated.Page 6 of 57 26.ds to see the data in the Dataset in UNIX. Explain the difference between Data Sets and File Sets? Where each one is used and how they are used? What is the size limit? The dataset and file set both are file stages. How can the server job be used in PX job? Through Server shared containers. How to check the no. you can specify that in DataStage Director --- Tools --- Options ---Limits tab ---- Rows/Warnings. the DBMS converts these into queries or updates against the underlying base tables. 34.

Server transformer is basic language compatibility & parallel transformer is c++ language compatibility. Its metadata is defined in Type Trees in Type Tree maker.2 is DSTX. The default size for both is 2GB. 36. Basic transformer should be used in Server Jobs. Basic transformer does not run on multiple nodes whereas a Normal Transformer can run on multiple nodes giving better performance. Version 8. In server jobs lookup has to be done in transformer stage whereas in parallel it has separate lookup stage for this. it has a designer where "maps" (equivalent of jobs) are written down/designed. IBM purchased Ascential Software and renamed the product as IBM-WTX (WebSphere Transformation Extender) e. Later on Ascential Software purchased it and named it as DataStage Transformation Extender (DSTX). What is DataStage DSTX? DataStage Transformation Extender was developed. Transformer stage in server jobs vs Transformer stage in Parallel jobs 1. It processes live data and is an EAI tool. 2. e. 3. Basic transformer takes less time to compile than the Normal Transformer. Usage: 1.2 is Mercator. Version 6.g. Difference: 1. Version 7.1. GIS. A Basic transformer compiles in "Basic Language" whereas a Normal Transformer compiles in "C++". How to implement the scenario: Add 10 days to a particular date in DataStage PX? Extract the day part of the date by using the string function and then add how many days you want and then again add all those parts into a single string and then convert the string into date object. e. Like DSPX. 37. owned and marketed by Mercator (an aviation company). Explain the difference between server transformer and parallel transformer? Or what is the difference between BASIC Transformer and NORMAL Transformer? When we will go for BASIC or NORMAL Transformer? The main difference is server Transformer supports basic transforms only but in parallel supports both basic and parallel transforms. Later on. Its major rival software is Gentran.1 is WTX DSTX/WTX is a middleware that is used in complex transformations and one of the major players in applications like EDI and SWIFT.5.g. . 3.Page 7 of 57 Dataset can have single input link or single output link whereas File set can have single input link or single output link and a single reject link. 38. In server jobs support single input stream link and multiple input reference link where in parallel support only single input stream link and no reference link.5. In parallel it has the capability to perform the surrogate key generation as an in-built functionality whereas no such functionality available in server job transformation stage.g. 2. File set can execute only in parallel mode whereas dataset can be configured to execute either in parallel or in sequential mode.

46. What is the difference between a Filter and a Switch Stage? A switch stage can have maximum 128 output links. Note: In the Hashed file stage you will have to select the option "Use Account Name" to create a hash file. Where do you specify nodes for DataStage jobs? In DataStage Manager->Tools->Configurations here one default configuration file exist name "default. Use remove duplicates for this. You will get a command interface window. 39. 43. For example: Let's say you have created a Hash file say "TestHashFile". Click on it. A filter stage can have any number of output links.apt" there we can add any no of nodes or we can add new configuration file. 45. You can see the runtime propagation option in most of the active stages in the output tab (if exists) in columns sub tab. Then put a condition for zero record then send it to reject link. Here you can write select queries to retrieve data from a hash file. 44. How can we filter in sequential stage? How do you remove duplicates in a flat file? We have filter command in input link properties tab where we can specify the UNIX commands as well. What is the difference between Filter and External Filter stages? Filter stage supports only SQL statements in where property where as External Filter supports UNIX commands in Filter command property. 42. How can I handle the before/after job subroutine in the transformer stage? . 41. Normal Transformer should be used in Parallel jobs as it will run on multiple nodes here giving better performance. In sort stage put "key change column true”. Then zero will be assigned to the duplicate records. If you enable this you can select runtime propagation to specify that columns encountered by a stage in a parallel job can be used even if they are not defined in the Metadata explicitly.Page 8 of 57 2. 40. How to enable the runtime column propagation in the DataStage? Where this option does exists? There is an option in DataStage administrator ---> projects tab ---> properties ---> general tab ---> enable runtime column propagation for parallel jobs. How to retrieve hash file data from administrator command? Open the DataStage administrator. To retrieve the contents you will have to fire following query: select * from TestHashFile. On the right hand side you will see "Command" button. Select your project. How to remove duplicate records from a file and capture the rejected records in a separate file? Sort the records using the key field.

Internal Conversion. Function to convert mm/dd/yyyy format to yyyy-dd-mm is Oconv(Iconv(Fieldname. we have an ExcecSH stage. How to run a Shell Script within the scope of a DataStage job? By using "ExcecSH" command at Before/After job properties. the IPC stage as well as Link partitioner and link collector will simulate the parallel mode of execution over the server jobs having single CPU. Functionality of Link Partitioner and Link Collector? Server jobs mainly execute the jobs in sequential fashion.External Conversion.Page 9 of 57 In Transformer Stage click on properties tab in tool bar (Left corner tab). There is no diagrammatical representation here. 53. Whenever you add the job it just places executable code there. a) Number of Processors or Nodes. 47. In the Sequence.. We can run schedule monitor as the normal job. 51. How to handle Date conversions in DataStage? Convert an mm/dd/yyyy format to yyyy-dd-mm? We use a) "Iconv" function . In this way you can get top 3 marks. b) Actual Disk Storage Location. D/MDY[2 2 4] ) D-YDM[4 2 2] ) 52. b) "Oconv" function . Sequencer can apply the control to jobs means we can run the job depending on the condition. through which we can run the shell scripts by providing the path and parameters for the script. There is no interaction of executable code here. There you can specify before/after stage subroutines. 50. use system variable @INROWNUM (@INROWNUM< 3). Sequencer: Sequencer is a diagrammatical representation of activities where we can connect different type of objects through activities. First sort the data in descending/ascending order after that use external filter stage & give UNIX command head -3 in the box. It displays stage properties page like the other job properties page.. Select the EDIT tab in the toolbar ---> choose job properties ---> Select the job parameters ---> choose the before/ after routines ---> select the EXCESH Command. 48. What is the difference between “job control” and sequencer? Job control: It just inserts the compilation code.what are their differences? 49. (OR) This can be achieved by sort & external filter stages. How to load the top 3 marks (column name) from sequential file to the target? First sort the data on Marks column using sort stage then use transformer stage. When do you use dataset stage/lookup file stage/file stage in a parallel job? What is the significance of these files. What does a Config File in parallel extender consist of? (APT_CONFIG_FILE) Config file consists of the following. In constraints. .

key values=b. Event messages & Program Generated messages. 3. For each and every element of the XML.Page 10 of 57 Link Partitioner: It receives data on a single input link and diverts the data to a maximum no. 54. The DataStage use "UCASE". we should give the XPATH expression in the XML input. informational messages. It doesn't give you user friendly error messages and informative messages. status. 2. Splitting and Merging in Dynamic Hashed File? In Hash Files. There are 3 types of views in DataStage Director. 3. 2. What are the difficulties faced in using DataStage? 1.e. DataStage provides us with a stage “Remove Duplicates” in Enterprise Edition. of 64 output links and the data processed by the same stage having same Meta data. Groups are numbered sequentially from 1. The number of groups is called as Modulus. c) Log View – log for last few runs of each Job with Warning messages. The "DataStage director job log error messages'. b) Schedule View – scheduling details of jobs. These both are active stages and the design and mode of execution of server jobs have to be decided by the designer. DataStage is like unique product in terms of functions ex: Most of the databases or ETL tools use “UPPER” for converting lower case data to upper case.key values. Error messages. What are XML files and how to read data from XML files and what stage to be used? We can use XML metadata importer to import the XML source definition and XML input to read the XML document. SQL: delete from tablename a where rowid > (select min (rowid) from tablename b) where a. . dates and time taken to run. 55. Types of views in DataStage Director? a) Status View – list of jobs. Give the OS of the Server and the OS of the Client of your most recent project? Server is on UNIX and Client machine i. We don’t have many date functions that are available in Informatica or in traditional Relational databases. What is Modulus. The operation of increasing the modulus is called Splitting and decreasing is called as merging. Link Collector: It will collect the data from 64 input links. The Duplicates can be eliminated by loading the corresponding data in to Hash file and Specify the column(s) on which u want to eliminate as the key(s) of hash. 57. We can also use sort stage (option: allow duplicates (True/False)). 58. Using that we can eliminate the duplicates based on a key column. 59. How do you eliminate duplicate rows? 1. the machine where you design a job is on Windows XP. 56. merges it into a single data flow and loads into the target. the size of the file keeps on changing randomly (according to the incoming data).

Page 11 of 57 60. Here you can see a grid. How do you catch bad rows from OCI stage? We will place some conditions like 'where' inside the OCI stage and the rejected rows can be obtained. Go to DataStage Administrator--->Projects--->Properties--->Environment--->User Defined. How do you pass the filename as a parameter for a job? 1. How do you pass the parameter to a job sequence if the job is running at night? Two ways 1. It is like a reference table. Suppose if we have millions of records then we will use Orabulk. You can define a job sequence to send an email using SMTP activity if the job fails. The selected parameter name appears in the text box beside the "Use Job Parameter" button. 3. Copy the parameter name from the text box and use it in your job. 61. click on the "Use Job Parameter" and select the parameter name which you have given in the above. 65. Set the default values of the Parameters in the Job Sequencer and map these Parameters to the job. When the source data is enormous or for bulk data extract/load we use OCI or SQL loader depending on the source. . Constraints in transformer. PE -- Oracle Enterprise stage. What is Hash file stage and what is it used for? Used for Look-ups. OR Log the failure to a log file using DSlogfatal/DSLogEvent from the job control tab in the job properties. 66. 2. OCI Tables for better performance. Go to the stage Tab of the job. 4. 2. 63. 62. We can also use the Hash File stage to remove duplicate rows by specifying the hash key on a particular field or column. 64. 2. What is the order of execution done internally in the transformer with the stage editor having input links on the left hand side and output links on the right hand side? 1. select the NLS tab. Derivations on the columns. Then it executes stage variables. Run the job in the sequencer using dsjobs utility where we can specify the values for each parameter. What happens if the job fails at night? Job Sequence will abort and an error exception handling job will send a mail to the corresponding member. It can also be used in-place of ODBC. First the left side tables will be joined. Keep the project default in the text box. Why do you use SQL LOADER or OCI STAGE? OCI STAGE/SQL LOADER -- Server Extract job name. where you can enter the Parameter name and the corresponding path of the file.

68. How do you track performance statistics and enhance it? You can right click on the server job and select the "view performance Statistics” option.Page 12 of 57 67. password. Type 30 is default dynamic hash file being used. 69.X. So Reject link has to be defined for every Output link you wish to collect the rejected data.R. when the group cannot accommodate a row it goes to overflow. Overflows should be minimized as much as possible to optimize the performance. if so tell us some of the steps you have taken in doing so? . Also if you want to add a file to an existing dataset/ Sequential file. Do you know about METASTAGE? Meta stage is a repository in which you can store the metadata (DDLs etc. 71. 74. of groups) and separation (group size) based on the incoming data. 73. Rejected data is typically bad data like duplicates of Primary keys or null-rows where data is expected. From Director also we can view this in “view monitor” option.) and perform analysis on data dependencies. The often Parameterized variables in a job are: DB DSN name. Either the values are coming from Job Properties or through shell script or from a ‘Parameter Manager’ – a third part tool. username. Since Static HFs do not create hashing groups automatically.T the data to be looked against. 72. Static Hash files do not adjust their modulus automatically and are best when data is Pre known/static.modulus (no. How will you call external function or subroutine from DataStage? U can call external functions/subroutines by using Before/After stage/job Subroutines: ExecSH  if it is UNIX machine ExecDOS  if it is Windows machine. then you can write to the same dataset/sequential with dataset/sequential file set to APPEND mode. dates W. How do you handle reject data? Typically a Reject-link is defined and the rejected data is loaded back into data warehouse or into a sequential file. Overflow groups are used when the data size is equal or greater than the specified Large Record size in dynamic HFs. What are Static Hash files and Dynamic Hash files? Dynamic Hash Files can automatically adjust their size . Have you ever involved in updating the DS versions like DS 5. change impact etc. Did you Parameterize the job or hard-coded the values in the jobs? We always parameterized the job. 70. This will show the output as the number of rows per second format when the job runs. How do you merge two files in DS? Use the funnel stage to merge the two files in PE.

77. Explain the differences between Oracle8i/9i? . 2. You can use 'Compile All' tool for this. 75. export/import of jobs (. Design the server jobs & parallel jobs Creating the local & shared containers Design the sequences Compile (single & multiple). create/update routines.did u use it? To manage the metadata.did u use it? The Administrator enables you to set up DataStage users.Page 13 of 57 Yes. 76. validate and running of the jobs. 3. This step is for moving DS from one machine to another. What is DS Director used for . What is DS Designer used for .did u use it? 1. The following are some of the steps I have taken in doing so: 1) Take a back up of the whole project(s) by exporting the project as a . What is DS Administrator used for . 5) In case if you are just upgrading your DB from Oracle 8i to Oracle 9i.dsx file. Install process collects the project information during the upgrade.0 server before the upgrade version 7. What is DS Manager used for . There is NO rework (recompilation of existing jobs/routines) needed after the upgrade. if National Language Support (NLS) is enabled. and. Assigning the users to the project can also be done here. control the purging of the Repository. 4) Make sure that all your DB DSN's are created with the same name as the old ones. then there is tool in DS CD that can do this for you. monitor and debug the warnings & errors by viewing the log file and to schedule the Jobs.did u use it? DataStage Director is used to validate. install and manage maps and locales. It is primarily used to create/delete the DataStage project. assign the user roles to the project and set the parameters for the jobs at project level.0. 6) Do not stop the 6. 78. 80. 79. What other ETL's you have worked with? ETL tools are 1) Informatica by Informatica Corporation 2) DataStage by IBM 3) Ab Initio by Ab Initio software Corporation 4) Data Junction by Pervasive software 5) Oracle Warehouse Builder by Oracle Corporation 6) MS SQL Server Integration by Microsoft 7) Decision Stream or Data Manager by Cognos. 3) After installing the new version import the old project(s) and you have to compile them again. functions. 4.dsx or . 2) See that you are using the same parent folder for the new version also for your old jobs using the hard-coded file path to work. run.xml) from Dev to Test and then to production environment.

3. These functions are stored under the Routines branch in the Repository. Default cache size for hash files is 128 MB. 2. or edit them using the Routine dialog box. The following are different types of routines: 1) Transform functions 2) Before-after job subroutines 3) Job Control routines Routines are stored in the Routines branch of the DataStage Repository. Converted some of the complex joins/business in DS to Stored Procedures on DS for faster execution of the jobs. 2. which are located in the Routines. What are Routines and where/how are they written and have you written any routines before? Routines are stored in the Routines branch of the DataStage Repository. DataStage has a number of built-in before/after subroutines. or before or after an active stage. ➤ Built-in: Before/After branch in the Repository. You can also define your own before/after subroutines using the Routine dialog box. 4. • Before/After subroutines: When designing a job. Oracle 8i does not support pseudo column SYSDATE but 9i supports. Using the Routine dialog box. Tuned the OCI stage for 'Array Size' and 'Rows per Transaction' numerical values for faster selects. You specify the category when you create the routine. you can specify a subroutine to run before or after the job. Tuned the 'Tunables' in Administrator for better performance for each project. These are specialized BASIC functions that have been defined outside DataStage. 5.Page 14 of 57 1. ➤ Examples: Functions branch of the Repository. Staged the data coming from ODBC/OCI stages or any database on the server using Hash/Sequential files for optimum performance – Performance of hash files/ sequential files is good because it avoids many trips to the database server. view or edit. If NLS is enabled. In process and inter process (for server jobs). you can get DataStage to create a wrapper that enables you to call these functions from within DataStage. 6. Worked with DB-admin to create appropriate Indexes on tables for better performance of DS queries. What are other Performance tunings you have done in your last project to increase the performance of slowly running jobs? Extraction Side: 1. Sorted the data as much as possible in DB and reduced the use of DataStage-Sort for better performance of jobs – use order by constraint. . where you can create. The following program components are classified as routines: • Transform functions: These are functions that you can use when defining custom transforms. • Custom Universe functions. databases more dimensional modeling 81. Oracle 8i we can create 256 columns in a table but in 9i we can create up to 1000 columns. You can also define your own transform functions in the Routine dialog box. We can modify it to improve the performance. where you can create. Row buffering also to improve the performance. inserts and updates. DataStage has a number of built-in transform functions which are located in the Routines. Multiprocessing. view. 82.

Before writing a routine or a transform. How can we implement Slowly Changing Dimensions in DataStage? By using lookup stage and change capture stage we can implement the SCD. type3: it will maintain the current and partial historical values. We need Quality stage plug-in to achieve this task. job3 etc. 85. Please check these options in your sequence. We have 3 types of SCD’s type1: it will maintain the current values only. you can restart this sequence from job 5 only. Do you know about INTEGRITY/QUALITY stage? Integrity/Quality Stage is a data integration tool from Ascential which is used to standardize/integrate the data from different sources and cleansing that. 83. Used sorted data for Aggregator stage. match. 9. the surrogate keys (primary) are generated which are foreign keys in the Fact tables. Use the power of RDBMS. 3 and 4. 2. Stored data in sequential/hash files before loading to the database for data recovery in case the job aborts. If an input file has an excessive number of rows and can be split-up then use standard logic to run jobs in parallel. This may be the case if the constraint calls routines or external macros but if it is inline code then the overhead will be minimal. so that we can do the Quality related works and we can integrate with DataStage. Try not to use a sort stage when you can use an ORDER BY clause in the database. Make every attempt to use the bulk loader for your particular database. 12. Tuning should occur on a job-by-job basis.Page 15 of 57 7. 84. job2. . Loading Side: 14. 11. 15. Constraints are generally CPU intensive and take a significant amount of time to process.) in a sequence and job 5 aborted. It will not run the jobs 1. How did you handle an 'Aborted' sequencer? By selecting the compilation options you can run the aborted sequence from the point where it was aborted. make sure that there is not the functionality required in one of the standard routines supplied in the SDK or DS utility categories. 86. survivorship…etc. then by checking "Add checkpoints so sequence is restartable on failure" and "Automatically handle activities that fail". 8. then fact tables? As we load the dimensional tables first. Quality Stage can be integrated with DataStage. you have 10 jobs (job1. type2: it will maintain the both current and historical values. Bulk loaders are generally faster than using ODBC or OLE. In Quality Stage we have many stages like investigate. Why do we have to load the dimensional tables first. Using a constraint to filter records is much slower than performing a SELECT … WHERE…. Transformations Side: 10. For example. 13.

A sequencer allows you to synchronize the control flow of multiple activities in a job sequence. May be you can schedule the sequencer around the time the file is expected to arrive. repeats] 89. Under UNIX: Poll for the file. 2. 93. output triggers can be fired if any of the sequencer inputs are TRUE. Use Crontab utility along with dsexecute () function with proper parameters passed. "AUTOSYS": Through Autosys you can automate the job by invoking the shell script written to schedule the DataStage jobs. A routine describes the Business logic and Transformer specifies the transform of 2. 90. In this case we can even use the command line to invoke the Java function and write the return values from the Java program (if any) and use those files as a source in DataStage job. 88. 2. ANY mode: In this mode. the data from one location to another by using transformation rules. How would call an external Java function which are not supported by DataStage? Starting from DS 6. Under Windows: Use the 'WaitForFileActivity' under the Sequencers and then run the job. string [delimiter.Page 16 of 57 87. What are Sequencers? Sequencers are the job control programs that execute other jobs based on the present Job parameters. instance.0 we have the ability to call external Java functions using a Java package from Ascential. The sequencer operates in two modes: ALL mode: In this mode all the inputs to the sequencer must be TRUE for any of the sequencer output triggers to fire. 91. Once the file has start the job or sequencer depending on the file. DB2 Plug-in Stage Dynamic Relational Stage. . 92. How did u connect to DB2 in your last project? Using DB2 ODBC drivers. What is the difference between routine and transform and function? 1. What will you in a situation where somebody wants to send you a file and use that file as an input or reference and then run job? 1. It can have multiple input triggers as well as multiple output triggers. Read the String functions in DS Functions like [] -> sub-string function and ':' -> concatenation operator Syntax: string [ [start] length]. What is the utility you use to schedule the jobs on a UNIX server other than using Ascential Director? 1. By using Routines we can return values but by transformers we cannot return values.

Local Container: available for that particular Job only 2. you do have both Clear and Truncate options.dsx file probably into a new project for testing. Current Directory (C:/Ascential/DataStage7. A sequential file is just a file with no key column. How do you rename all of the jobs to support your new File-naming conventions? Create an Excel spreadsheet with new and old names. Project name. Does the selection of 'Clear the table and Insert rows' in the ODBC stage send a Truncate statement to the DB or Delete statement? There is no TRUNCATE on ODBC stages. Duplicates can be removed in a hash file by specifying a key on a column. Export the whole project as a .imports the DataStage components. dsexport. The job aborts in the middle of loading some 500. Parameters: Username. In an OCI stage such as Oracle. Tell me one situation in your last project. 2. Hash file can be stored in DS memory (Buffer) but Sequential file cannot.5. 4. Steps to create a Local/Shared Container: Step1:Select the stages required true for sequential file.1/dsexport. 97. Write a Perl/Shell program. It won’t send any commands to the database. Then import the new . It just clears the data in the table when each new loop it encounters. Be cautious that the name of the jobs has also been changed in your job control jobs or Sequencer jobs. . The jobs in which data is read directly from ODBC/OCI stages are running extremely slow. Hostname. So you have to make the necessary changes to these Sequencers. 3. What are the command line functions that import and export the DS jobs? A. Recompile all jobs. How can we create Containers? There are Two types of containers 1. Difference between Hash file and Sequential File? 1.000 rows. where you had faced a problem and how you solved it? 1. Filename (Job name). There is a limit of 2GB for both Hash file and Sequential file. Shared Container: can be used anywhere in the project.exe. To make sure the load is proper we opted for the former. Sequential file cannot. which can do a simple rename of the strings looking up in to the Excel file. dsimport. 2. Hash file stores the data based on hash algorithm and on a key value. 95.Page 17 of 57 94. 99. They are radically different in permissions (Truncate requires you to have altered table permissions where Delete doesn't).exe).exe. 96.exports the DataStage components. It is not 5. Have an option either cleaning/deleting the loaded data and then run the fixed job or run the job again from the row the job has aborted. It is Clear table and that is a “delete from” statement. I had to stage the data before sending to the transformer to make the jobs run faster. B. Password. 98.dsx. Hash file can be used as a reference for look up.

And actually its data type is decimal (10. @TIME The internal time when the program started. 100. Char (252). The solution here is to either truncate or Round the data till 2 decimal positions. 1. @FALSE The compiler replaces the value with 0. @NULL. data type. Server shared containers (Can also be used in Parallel jobs) 2. Char (254).STR The internal representation of the null value.. What are the Job parameters? It is a runtime parameter which allows you to enter a value to restrict the data at the job run time. @FM a field mark. prompt. What about System variables? DataStage provides a set of variables containing useful system information that you can access from a transform or routine.. @LOGNAME The user login name. Parallel shared containers (Cannot be used in Server jobs). @IM an item mark. @PATH The pathname of the current DataStage project. default value and it is represented as follows #parameter name#. @SM A sub value mark (a delimiter used in Universe files). @SCHEMA The schema name of the current DataStage project. analysis and reporting. How do you fix the error "OCI has fetched truncated data" in DataStage? This error occurs when Oracle Stage tries to fetch a column like 34.CODE Status codes returned by system processes or commands. For use in constrains and derivations in Transformer stages.RETURN. Char (251). 103. Char (255). Char (253). For use in derivations in Transformer stages. @DATE The internal date when the program started. @YEAR The current year extracted from @DATE. 104.2). @WHO The name of the current DataStage project directory. @SYSTEM. @USERNO The user number. 101. @INROWNUM Input row counter. @TM A text mark (a delimiter used in Universe files). System variables are read-only. @DAY The day of the month extracted from the value in @DATE. Because the data originates . How can we join one Oracle source and Sequential file? Join and look up stages can be used to join oracle and sequential file.55676776. A job parameter is created with name. Char (128). @TRUE The compiler replaces the value with 1. What’s difference between operational data store (ODS) & data warehouse? An operational data store (or "ODS") is a database designed to integrate data from multiple sources to facilitate operations. @NULL The null value. @MONTH The current extracted from the value in @DATE. @VM A value mark (a delimiter used in Universe files). @OUTROWNUM Output row counter (per link). They are of 2 types.Page 18 of 57 Step2:Edit--->Construct Container--->Local/Shared Shared containers are stored in the Shared Containers branch of the Tree Structure. See the Time function. 102. See the Date function.

.. job 3 and job 4). Suppose job sequencer synchronizes or control 4 job but job 1 have problem.. in this condition u should go to director and check what type of problem it is showing either data type problem. What’s the difference between DataStage Developers and DataStage Designers? What are the skills required for this. If job 1 has 10. DataStage developer is one who will code the jobs and DataStage designer is one who will deal with the blue prints and he will design the jobs the Stages those are required in developing the code. and specifically for those . An ODS is usually designed to contain low level or atomic (indivisible) data such as transactions and prices as opposed to aggregated or summarized data such as net contributions. Continue.Page 19 of 57 from multiple sources. Continue . Aggregated data is usually stored in the Data warehouse. Continue (ii) On Skip -. after run the job only 5000 data has been loaded in to target table remaining are not loaded and your job going to be aborted then. the integration often involves cleaning. How many places u can call Routines? Four Places u can call (i) Transform of routine (A) Date Transformation (B) Upstring Transformation (ii) Transform of the Before & After Subroutines (iii) XML transformation (iv)Web base transformation 108. 105. 109. What happens if RCP is disabled? Runtime column propagation (RCP): If RCP is enabled for any job. redundancy resolution and business rule enforcement. Again Run the job defiantly u gets successful massage.000 rows. So u should go to Run window--->Click---> Tracing--->Performance or In your target table --->general---> action---> select this option here two option (i) On Fail -. How can sort out the problem. How to create batches in DataStage from command prompt? 107. If job fail means data type problem or missing column action..Commit.. First u check how much data is already loaded and after that select on skip option then continue and what remaining position data not loaded then select On Fail . job 2. job fail or job aborted. Suppose that 4 jobs controlled by the sequencer like (job 1. warning massage. The data in ODS is volatile and the data which is nonvolatile.Commit . historical and time variant is DWH data. 106.

What are the third party tools used in DataStage? Autosys. how I can do that while processing the files it should fetch files automatically. data type descriptions. Metadata is data about data. 114. Where does UNIX script of DataStage executes whether in client machine or in server? DataStage jobs are executed in the server machines only. able to answer any adhoc. analytical. and description of a data element.. If the metadata for all the files is same then create a job having file name as parameter. U will have 4 tabs and the last one is build under that u can find the TABLE NAME.Page 20 of 57 stages whose output connects to the shared container input. 116. I want to process 3 files in sequentially one by one. 117. If RCP is disabled for the job. 110. What is version Control? Version Control is used to i) ii) iii) Store/view different versions of DS jobs run different versions of same job reverts to previous version of a job 118. data dictionaries. It insulates the data warehouse from changes in the schema of operational systems. The repository environment encompasses all corporate metadata resources: database catalogs. Metadata is stored in a data dictionary and repository.. Metadata includes things like the name. output & transfer pages. then use same job in routine and call the job with different file names. Examples of metadata include data element descriptions. in such case OSH has to perform Import and export every time when the job runs and the processing time job is also increased. and navigation services.OR You can create sequencer to use the job. 115. In DataStage I/O and Transfer. What happens if output of hash file is connected to transformer? What error it . attribute/property descriptions. the ds job has the parameters defined (which are passed by Unix) 113. then meta data will be propagated at run time. Usually 2 ** n. valid values. 112. so there is no need to map it at design time. under interface tab: input. How can we pass parameters to job by using file? You can do this by passing parameters from Unix file and then calling the execution of a DataStage job. which locks the hashed file for updating when the search key in the lookup is not found. Default nodes for DataStage parallel Edition? The Number of Nodes depends on the number of processors in your System. Control-M. event coordinator. TNG. What are the Repository Tables in DataStage and what are they? A data warehouse is a repository (centralized as well as distributed) of Data. historical or complex queries. What is ' insert for update ' in DataStage? There is a lock for update option in Hashed File Stage. length. compile all are some of them. range/domain descriptions and process/method descriptions. 111.

121. How can you do incremental load in DataStage? Incremental load means daily load. You can implement SCD in server jobs by using Lookup functionality. 119. If we want to join these two tables we are having DeptNo as a common Key so we can give that column name as key and sort Deptno in ascending/descending order and can join those two tables. How can I connect my DB2 database on AS400 to DataStage? Do I need to use ODBC 1st to open the database connectivity and then use an adapter for just connecting between the two? You need to configure the ODBC connectivity for any database in the DataStage. This will not return any error code. What is NLS in DataStage? How we use NLS in DataStage? What advantages in that? At the time of installation I am not chosen that NLS option. If data is partitioned in your job on key 1 and you aggregate on key 2. What is Merging and how it can be done? Merge stage is used to join two tables. what issues could arise? Data will partition on both the keys and it will take some more time to execute. 120. then this will be treated as a primary link itself. Dept. What is the mean of Try to have the constraints in the 'Selection' criteria of the jobs itself? This will eliminate the unnecessary records even getting in before joins are made? This means try to improve the performance by avoiding use of constraints wherever possible and instead using them while selecting the data itself using a where clause. 125. now I want to use that options what can I do? To reinstall that DataStage or first uninstall and install once again? . Then you will be able to access the XLS file from DataStage. Whenever you are selecting data from source. 124. It’s a kind of lowest level of Oracle being used for loading the data. This improves performance. the hash file will be consider as a Lookup file. If there is no primary link to the same Transformer stage. Let us consider two tables Emp. 122.Page 21 of 57 thoughts? If Hash file’s output is connected to a transformer stage. What is the OCI? And how to use the ETL Tools? OCI uses the "Oracle Call Interface" of the oracle to load the data. Select the records which are loaded or Updated between the time stamp of the last successful load and today’s load starts Date and Time. How can ETL excel file to Data mart? Open the ODBC Data Source Administrator found in the control panel/administrative Tools. It takes the Key columns and sorts them in ascending/descending order. 123. Store the last run’s date and time in a file and read the parameter through job parameters and state second argument as current date and time. 126. Under the system DSN tab. For this you have to pass the parameters for those two dates. add the Driver to Microsoft Excel.

Once you install the DS you will get NLS. link-partitioner for performance tuning. which would have to be managed manually (if at all) in the server environment.. In server and mainframe you tend to do most of the work in/using Transformer stage.Page 22 of 57 NLS (National Language Support) is basically a Local language setting (character set). MPP machines u can use parallel jobs. 127. This file contains information about node names. Here performance is low (Speed). If u have SMP machines u can use IPC. 129. Use the power of DBMS. Don’t confuse that with *. creating partitions one can improve the performance of SQL’s used in the jobs.. 130. C++ and mainframe generates COBOL. For Server jobs. . MPP machines... parallel generates Orchestrate shell script (OSH). They can also run on a separate z/OS (USS) machine if required.. In parallel you tend to use specific stage types for specific tasks (and the Transformer stage doesn't do lookups). By using the power of Database like Analyzing. Just login into Admin and you can set the NLS of your project based on the requirement (need to map the NLS with your project).apt file that has the node's information and Configuration of SMP/MMP server. times and money .. JCL. while running the job DataStage will recognize those special characters. NLS is implemented in DataStage Server engine. MPP. Suppose if you know you have a file with some Greek characters. If u have cluster. Parallel jobs: These are only available if you have installed Enterprise Edition. The third point specifies about tuning the performance of job. connecting to other data sources as necessary. These run on DataStage servers that are SMP. For Parallel jobs. CDC Stage. By using NLS function we can do the following . Parallel jobs can be run only on cluster machines.Sort the data according to the local rules If NLS is installed. what is difference between server jobs & parallel jobs Server jobs: These are available if you have installed DataStage Server Edition. What is APT_CONFIG in DataStage DataStage understands the architecture of the system through (APT_CONFIG_FILE). creating index.0 functionally different from the enterprise edition now? What are the exact changes? There are lots of Changes in DS EE. Stored Procedure Stage.apt file. if u set the NLS for Greek. How is DataStage 4.. various extra features appear in the product. link-collector. Here performance is high (Speed). Server jobs can be run on SMP.Process the data in a wide range of languages ..Use Local formats for dates. Etc. 128.. They run on the DataStage Server. APT_CONFIG is just an environment variable used to identify the *.. disk storage information. Server generates DataStage BASIC. NLS is implemented using the ICU library. or cluster systems.. There are many stages like the automatic partitioning and collection of data in the parallel environment.etc. What is the meaning of the following? 1) If an input file has an excessive number of rows and can be split-up then use standard logic to run jobs in parallel 2) Tuning should occur on a job-by-job basis.

How to implement routines in DataStage? Three kinds of routines are there in DataStage. 133. U cannot run parallel jobs in server edition. Parallel routines used in parallel jobs will be written in C/C++ Language. 135. we can use shared container as lookup in server jobs. data source names. on that time we will develop lookup in shared containers. For better performance. Use at the Max of 20 stages in each job. What is trouble shooting in server jobs? What are the diff kinds of errors encountered while running any job? . Single-byte file conversions in Mainframe jobs? How we use those conversions in DataStage? What is UTF 8? What’s the use of it? 132. There is no limitation in using stages in a job.Page 23 of 57 131. But you can run parallel jobs as server jobs (sequential) by changing APT_CONFIG file and setting node = 1 or setting node map constrain with node1. check whether the input files/tables exist or not and also check for the Usernames. Wherever we use same Lookup in multiple places. Can we use shared container as lookup in DataStage server jobs? Yes. What are the validations you perform after creating a job in designer? What are the different types of errors you faced during loading and how did you solve them? Check for the Parameters. 1. Server routines used in server jobs will be written in BASIC Language. 136. 2. passwords …etc 134. Is it possible to run parallel jobs in server jobs? No. 137. If it is exceeding then go for another job. We can easily implement any complex design in DataStage by following simple tips in terms of increasing the performance also. and then we can use those shared containers as lookup. How can you implement Complex Jobs in DataStage? Complex design means having more joins and look ups. What is DataStage Multi-byte. Don’t use more than 7 look ups for a transformer stage otherwise go for including one more transformer stage. Mainframe routines used in mainframe jobs will be written in COBOL/JCL Language. 3.

Yes. 139. how many processes does DataStage create? It depends on the number of active stages and how they are linked. There are some stages that are common to all types (such as aggregation) but they tend to have different fields and options within that stage. Designed mainly for Unix. we can join a flat file and database in an indirect way. Parallel jobs have parallel stages but also accept some server stages via a container. parallel jobs. Only active stages can create a process. we can implement SCD in DataStage. MVS jobs are jobs designed using an alternative set of stages that are generated into Cobol/JCL code and are transferred to a mainframe to be compiled and run. MVS jobs. If you’re running 4 ways parallel and you have 10 stages on the canvas. sequence jobs. Based on return value we will find out whether this is for insert. • DataStage Enterprise MVS: Server jobs. • DataStage Enterprise Edition was originally called Orchestrate. BCP is used to load bulk data into a single table for Microsoft SQL server and Sybase. First create a job which can populate the data from database into a Sequential file and name it as Seq_First. the total no. Step 1: Time stamp we are creating via shared container. How can you implement slowly changing dimensions in DataStage? Can u join flat file and database in DataStage? How? Yes.g. The change capture Stage will return a value for chage_code. Server jobs only accept server stages. By using the Changes capture stage we will find out the differences. What are OraBulk and BCP stages? ORABULK is used to load bulk data into single table of target oracle database. . The first two versions share the same Designer interface but have a different set of design stages depending on the type of job you are working on. For satisfying the lookup condition we are creating a key column by using the column generator. The enterprise edition offers parallel processing features for scalable high volume solutions. MVS jobs only accept MVS stages. Jobs are developed on a Unix or Windows server and then transferred to the mainframe to be compiled and run. 141. Left Outer Join and Right Outer Join.Page 24 of 57 138. if there are 6 active stages (like transforms) linked by some passive stages. It returns system time and One key. parallel jobs. Take the flat file which you are having and use a Merge Stage to join these two files. Linux and Unix System Services on mainframes. or update. it now supports Windows. Does Enterprise Edition only add the parallel processing for better performance? Are any stages/transformations available in the enterprise edition only? • DataStage Standard Edition was previously called as DataStage Server Edition. 140. You have various join types in Merge Stage like Pure Inner Join. For e. sequence jobs. Row Merger and Row splitter are only present in parallel Stage. of processes will be 6x4=24. Edit. and then renamed to Parallel Extender when purchased by Ascential. You can use any one of these which suits your requirements. Step 2: Our source is Data set and Lookup table is oracle OCI stage. Let me explain SCD Type 2 for Time stamp. if it is insert we will modify With current timestamp and the old time stamp will keep as history. • DataStage Enterprise: Server jobs.

select this and then go for required page (inputs/outputs). 143. You can use orchadmin dump <dataset>. In general tab u has the check box option . What is the exact difference between Join.fs extension. 144. or Gunzip-uncompress in Unix. What is data set? And what is file set? Dataset: DataStage parallel extender jobs use data sets to manage data within a job. Merge and lookup is The three stages differ mainly in the memory they use. 145.Page 25 of 57 142. Merge and Lookup Stage? The exact difference between Join. Means. The file collection metadata is in a fileset file with an . A Join does a HighSpeed sort on the driving and reference datasets. You can convert all the heterogeneous sources into sequential files & join them using Merge stage (OR) You can write a user defined query in the source itself to join them. so cannot make an informed choice whether to combine data using a join stage or a merge stage or a lookup stage. 147.Filter Command option . Sequential file . Most probably this activity is @ starting of the job sequence.stage uses filter command. These variables are available for all the activities in that sequence. example 1 sequential file. stage uses filter command option | Inputs/Outputs (page) . we can convert by attaching one hierarchy to lowest level of another hierarchy. This can involve I/O if the Data is big Enough. How to extract data from more than 1 heterogeneous Source. Sybase. A fileset is a mechanism DataStage uses to manage a collection of sequential files. You will find in page's General tab Filter command enabled. the Merge stage allows you to specify several Reject links as many as input links. How can I specify a filter command for processing data while defining sequential file output data? When u opens a sequential file to edit. enter required filter commands such as gzipcompress command in Unix. use a Join. The Data Set stage allows you to store data being operated on in a persistent form.ds to view dataset in UNIX.enter/browse required command in it.edit | General tab . used? What is user variable activity? When it is used? How it is used? Where it is By using this User variable activity we can create some variables in the job sequence. Unlike Join stages and Lookup stages. DataStage doesn't know how large your data is. Here's how to decide what to use: If the reference datasets are big enough to cause trouble. . Once the sort is over the Join processing is very fast and never involves paging or other I/O. which can then be used by other DataStage jobs. Oracle in a single Job. Can you convert a snow flake schema into star schema? Yes. but the I/O is all highly optimized and sequential. You can think of each link in a job as carrying a data set.

Other things in general that could be looked in: 1. I would suggest you to write a query instead of doing several look ups. 2. The hashed file contains of 3 parts: each record have Hash Key. lookup file sets. By using the hash algorithm and key value the lookup is faster. which by the way is always a Union-all. 150. for massive transaction. DataStage supports hash files. 3. You can also re-partition the data whichever way you want. Key Header and Data portion. It is important to use appropriate objects between stages for performance. It is much inferior as compared to DataStage. hash look up). but when it comes to scalability in performance.Page 26 of 57 Data selection is very easy in join and it is difficult in merge. There are three different types of user-created stages available for PX. 151. What is the difference between DataStage and Informatica? Informatica has more Developer-friendly features. So finally performance wise join is better than merge. Join doesn’t support reject links where as merge support reject links.Informatica does not support full pipeline parallelism. How the hash file is doing lookup in server jobs? How is it comparing the key values? Hashed File is used for two purposes: 1. What are they? Which would you use? What are the disadvantages for using each type? These are the three different stages: i) Custom ii) Build iii) Wrapped 149. ensure that you have appropriate Indexes while querying. Partitioning . If there are too many look ups that are being done. There would be a lot of overhead and the performance would degrade drastically. What is the difference between build opts and subroutines? . DataStage from Staging to MDW (Master Data Warehouse) is only running at 1 row per second! What do we do to remedy? In general. 3. set hashing size and buffer size to appropriate values to perform as much as possible in memory and there is no I/O overhead to disk. ensure that you use proper elimination of data between stages so that data volume don’t cause overhead.Informatica supports flat file lookup. of i/p & o/p’s. Join supports only 2 files or 2 databases where as merge supports no. If you have too many stages (especially transformers. File Lookup .DataStage PX provides many more robust partitioning options than Informatica. Parallelism . datasets for much more efficient lookup. To remove duplicate records 2. Lookup is used just for comparison purpose when the data is less. Here are a few areas where Informatica is inferior – 1. If you don’t want to write the query and use intermediate stages. 2.DataStage has a very rich functionality of merging or funneling the streams. In Informatica the only way is to do a Union. there might be a re-ordering of stages needed for good performance. 148. Merge/Funnel . Enable row buffering and set appropriate size for row buffering. 152. For reference lookups. but the caching is horrible. 4. So.

In parallel jobs surrogate key stage. These algorithms are usually quite simple and use division and remainder techniques. Is it possible to access the same job by two users at a time in DataStage? No. 154. These Plug In Packs are available with DataStage Version 7. 157. In fact. What are the differences between the data stage 7. It's closely related to statistical probability as to how well the algorithms work. 155. how can I access it in DataStage? You can view all the environment variables in designer. How much would be the size of the database in DataStage? What is the difference between Inprocess and Interprocess? Regarding the database size. What is hashing algorithm and explain briefly how it works? Hashing is key-to-address translation.: CDC Stage.. you can attach zero or more jobs.5 For E. command stage and generate report option in file tab. Is it possible to call one job in another job in server jobs? 153. Steps will be Edit --> Job Properties --> Job Control  Click on Add Job and select the desired job.5 in server jobs? There are lots of new stages available in DS7. In job sequence. 159. I think we can call a job into another job.. terminate loop activity and user variable activities were introduced. it varies and depends upon the project. In server jobs we have stored procedure stage. Inprocess is the process where the server transfers only one row at a time to the target and Interprocess means that the server sends group of rows to the target table. You can add and access the environment variables from Job properties. stages like start loop activity. 156. Stored procedure Stage etc.5. Is it possible to move the data from oracle warehouse to SAP Warehouse using DATASTAGE? We can use DataStage Extract Pack for SAP R/3 and DataStage Load Pack for SAP BW to transfer the data from oracle to SAP Warehouse.g. In fact calling doesn't sound good. it is not possible to access the same job by two users at the same time. DS will produce the following error: "Job is accessed by other user". These both are .Page 27 of 57 Build opts generates C++ code (oops concept) and Subroutine is a normal program and you can call anywhere in your project. 158. You can check them in Job properties. stored procedure stages were introduced.0 and 7. This means the value of a key is transformed into a disk address by means of an algorithm. Interesting to note that these approaches are called "Monte Carlo Techniques" because the behavior of the hashing or randomizing algorithms can be simulated by a roulette Wheel where the slots represent the blocks and the balls represent the records. usually a relative block and anchor point within the block. because you attach/add the other job through job properties. end loop activity. If I add a new environment variable in Windows.

Whenever the engine is on then only we can develop the jobs. ex: $APT_CONFIG_FILE Like above we have so many environment variables. What is the purpose of exception activity in DataStage 7. Quality Stage and Audit Stage in DataStage? Profile stage is used to analyze the data and their relationship. You need to install and configure the required database clients on your DataStage Server for it to work. 2. 164. select the final target stage where you want to load the data either into data warehouse. link partitioner. What is the difference between DRS and ODBC stage? DRS stage should be faster than the ODBC stage as it uses native database connectivity. Select the data source stage depending on the sources. E. 162. Dynamic Relational Stage was leveraged for PeopleSoft to have a job to run on any of the supported databases. What is a project? Specify its various components? . staging etc 161. Please go to job properties and click on "add environment variable" to see more of the environment variables. ODBC uses the ODBC driver for a particular database. data mart. Quality stage is used to check for the quality. It uses the native connectivity for the chosen target. Aggregator. xml. Audit Stage ------------- 163. select the required stages for transformation logic such as transformer. ODS. 165. 166. While designing the job we set the properties for these variables.g.5? The stages followed by exception activity will be executed whenever there is an unknown error occurs while running the job sequencer. merge etc 3. We create/ declare these variables in Administrator. What are Profile Stage.Page 28 of 57 available at the Tunables tab page of the administrator client component. DS Server will interact with the Client components and Repository. database. It supports ODBC connections too. DRS is a stage that tries to make it seamless for switching from one database to another. 160. What is DataStage engine? What is its purpose? DataStage sever contains DataStage engine. We use DRS stage in parallel jobs. link collector. What are the Environmental variables in DataStage? Environment variable is a predefined variable that can be used while creating a DS job. What are the Steps involved in development of a job in DataStage? The steps required are: 1. For Example Database Uname/Pwd we declare these variables in Admin and set the values Scott and Tiger values in the properties tab in the Menu (Parameters tab).: flat file.

3. In the SCD2 scenarios Surrogate keys play a major role. You validate a compiled job to make sure that all the connections are valid. We can run the DataStage job from one job to another job in DataStage. mappings are correct. Where actually the flat files store? What is the path? Flat files stores the data and the path can be given in general tab of the sequential file stage. To find how many jobs are using a particular routine. How do you do Usage analysis in DataStage? 1. Each project contains: 1. When we say "Validating a Job".SQL SELECT statements are prepared. DataStage jobs. or ODBC stages that uses the local data source is created. The surrogate key is the artificial key which we are going to create in the target DW. . . Normally flat files will be stored at FTP servers or local folders.EXL and . It will show all the jobs dependents. The natural key is the one coming from the OLTP system.CSV. 171.Page 29 of 57 You always enter into the DataStage through a DataStage project. 170. User-defined components. . if they do not already exist.TXT are the file formats available for the Flat files. The following checks are made : . 167. The entire job Parameters are set and a valid output can be expected after running this job. It’s like nested. We can use these surrogate keys instead of using natural key. Like this. 2. it ensure that basic things like all the important stage Parameters have been set.Connections are made to the data sources or data warehouse. . 3. By using Primary and Foreign key relationship. you can find all the dependents of a particular object. We can move forward and backward and can see all the dependents. etc. Built-in components. Universe. 168. These are customized components created using the DataStage Manager or DataStage Designer. 2. Intermediate files like Hashed File. we are talking about running the Job in the "check only" mode. What is the difference between validated ok and compiled in DataStage? When you compile a job. and then it creates an executable Job. To find how many jobs are using a particular table. These are predefined components used in a job.Files are opened. . key? Purpose of using the key and difference between Surrogate key and natural We use keys to provide relationships between the entities (Tables). If you want to know some job is a part of a sequence. It is like a dry run where you don't actually play with the live data but you are confident that things Will work. What is meaning of file extender in DataStage server jobs? Can we run the DataStage job from one job to another job that files data where it is stored? File extender means adding the columns or records to the existing file. then in the Manager right click on the Job and select Usage Analysis. When you start a DataStage Client you are prompted to connect to a project. we can maintain integrity between the data. 169.

full). right. In Type1 the new one will over write the existing one that means no history is maintained. a new column will be added which shows the original address as New York and the current address as Florida. In Type2 New record is added. They are two types 1) Before Sub Routines and 2) After Sub Routines Steps: double click on the transformer stage right click on any one of the mapping field select [dsroutines] option within edit window give the business logic and select the either of the options ( Before / After Sub Routines). Advantage of using this type2 is. etc. the customer is treated essentially as two different people. therefore both the original and the new record Will be present. Type 3: The original record is modified to reflect the changes. storage and Performance can become a concern. 173. all other joins support Joining of more than two inputs. . simple to use. 2) Data Selection (Join) There are various ways in which data is being selected. outer (left. you have different selection criteria for dropping/ selecting a row. For example. In Type3 there will be 2 columns one to indicate the original value and the other to indicate the current value. (Merge) Data in Master record and update records are merged only when both have same value for the merge key columns. cross join. The company must modify her address now. we have different types of joins like inner. the new record will get its own primary key. In general 3 ways to solve this problem Type 1: The new record replaces the original record. How to implement type2 slowly changing dimension in DataStage? Give me with example? Slow changing dimension is a common problem in Data ware housing. Historical information is maintained But size of the dimension table grows. How we can call the routine in DataStage job? Explain with steps? Routines are used for implementing the business logic. no trace of the old record at all. 1) Number Of Reject Link (Join) does not support reject link.g. History of the person where she stayed last is lost. Helps in keeping some part of the history and table size is not increased.). Later she moved to Florida. Therefore. Type 2: A new record is added into the customer dimension table. So. E. For example: There exists a customer called Lisa in a company ABC and lives in New York. (Merge) has as many reject links as the input links (if there are n-input links then 1 Will be master link and n-1 will be the update link). 174. Type2 should only be used if it is necessary for the data warehouse to track the historical changes. But one problem is when the customer moves from Florida to Texas the new york information is lost.Page 30 of 57 172. So Type 3 should only be used if the changes will only occur for a finite number of times. What is difference between Merge stage and Join stage? Other than full outer join (more than two links are not supported.

Then import the job again and override or rename the existing job to have both of the form. What is the meaning of instance in DataStage? Explain with examples? 179. We cannot parameterize a particular field in a sequential file. 176. While using this stage the I/P data’s are partied & convert into Internal Dataset format. steps: click on edit in the menu bar and select 'job properties' and enter the parameters as parameter prompt type STEP_ID stringSource SRC stringDSN DSN string Username unm stringPassword pwd stringafter editing the above steps then set JCL button and select the jobs from the listbox and run the job. It has a single i/p & many o/p’s. If u want to add a new stage in ur job at that time it is very easy otherwise u have to modify the whole job. 180. mainly it is used before the target Stage. 177. Its extension is . instead we can parameterize 181. 178. Stop the job using DSStopJob function JCL defines Job Control Language it is used to run more number of jobs at a time with or without using loops. Dataset: (. following steps needs to be followed in Routines.ds) it act as a temporary storage stage. How to drop the index before loading data in target and how to rebuild it in DataStage? This can be achieved by "Direct Load" option of SQL Loaded utility. What is the difference between sequential file and a dataset? When to use the copy stage? Sequential file: It acts as a source & permanent storage for target. To Execute one job from other job.dsx format and change the attribute which store the read-only information from 0 ( 0 refers to editable job) to 1 ( 1 refer to the read only job). Copy: it acts as a placeholder. How can we create read only jobs in DataStage? A job can be made read only by the following process: Export the job in the . . The Job YYY can be executed from Job XXX by using DataStage macros in Routines. How to parameterize a field in a sequential file? I am using DataStage as ETL Tool.Page 31 of 57 175. Then it is easy to load the data into target stage. Sequential file as source. 2. Run the other job using DSRunjob function 3.txt. Attach job using DSAttachjob function. 1. Ex: Consider two Jobs XXX and YYY. How to kill the job in data stage? You should use kill -14 jobid. Sometimes using kill -9 jobid leaves things in a bad state. Dataset supports parallelism but sequential file doesn't. What is job control? How it is developed? Explain with steps? Controlling Datstage jobs through some other DataStage jobs.

But we can use inter process communication only when the target is in passive stage. of groups except through manual resizing whereas Dynamic hash file automatically change their no. But in PX target should be in UNIX.etc) can be on windows environment or one of the systems should be in UNIX/Linux environment? In server edition you can have both in Windows. DATASTAGE creates a file that contains the number of groups specified by modulo. When a Static Hash file is created. We can't connect two active stages but it is accepted only for this stage to connect to the transformer or aggregator stage. groups) * Separations (buffer size). What does separation option in static hash-file mean? The different hashing algorithms are designed to distribute records evenly among the groups of the files based on characters and their position in the record ids. How does u check for the consistency and integrity of model and repository? 186. So in order to avoid the connection of active stage from the transformer to the link collector we will be using inter process communication. As this is a passive stage. . 183. This is also an active stage. 187. by using this data can be collected by the link collector. What is the difference between static hash files and dynamic hash files? Static hash file cannot change their no. of groups (modulus) in response to the amount of data stored in a file.Page 32 of 57 the source file name in a sequential file. Can both Source systems and Target Data warehouse (may be oracle. We can make a maximum of 64 partitions and this is an active stage. 185. Separation and Modulo respectively specifies the group buffer size and the number of buffers allocated for a file.. What do you mean by active stages and passive stages in DataStage environment? Active Stage: It is the "T" of ETL and Passive Stage : It is the "E & L" of ETL 184. When a hash file is created. The data sent from the link partitioner will be collected by the link collector at a max of 64 partitions. How do you clean the DataStage repository? Remove the log files periodically. Give one real time situation where link partitioner stage used? If we want to send more data from the source to the targets quickly we will use the link partitioner stage in the server jobs.. 188. Size of Hash file = modulus (no. 189. What is Runtime Column Propagation and how to use it? If your job has more columns which are not defined in metadata if runtime propagation is enabled it will propagate those extra columns to the rest of the job. #FILENAME# 182. SQL Server.

FILE &PH&. What is the difference between symmetrically parallel processing. An Experienced Thought Process should be applied when u setup these MPP and one should have a good in-depth knowledge to partition the database among these Processors and how to assign the work to these processors. There are cases in which up to 200 processors which run for a single application.root password (encrypt) DSHOMEBIN=/Ascential/DataStage/home/dsadm/Ascential/DataStage/DSEngine/bin if check ps -ef | grep DataStage (client connection is there) { kill -9 PID (client connection) } uv -admin . The processors have their own operating system and communicate via high speed network. It displays all the process currently running . An MPP system can also Be called as a “loosely coupled system”. massively parallel processing? Symmetric Multi Processing (SMP) . How I create DataStage Engine stop start script.start > dev/null verify process check the connection echo "Started properly" run it as dsadm 193. A Single copy of the Operating System is In-charge for all the Processors running in an SMP. !#bin/bash dsadm . These Different Processors which run communicate with each other through message interfaces. Massively Parallel Processing (MPP) is the processing of programs by multiple processors that work on different parts of the program and share different operating systems and memories. How to find the process id? Explain with steps? You can find it in UNIX by using ps -ef command.stop > dev/null uv -admin . Cluster systems can be physically dispersed. 192. The Setup for MPP is more complicated than SMP. SMP is better than MMP systems when Online Transaction Processing is Done. use ExecTCL to execute the following command CLEAR.user su .Some Hardware resources may be shared by Processors. 191. One main advantage of SMP is its ability to dynamically balance the workload among computers (As a result Serve more users at a faster rate). in which many users can access the same database to do a search with a relatively simple set of common transactions. Cluster or Massively Parallel Processing (MPP) . How can I schedule the cleaning of the file &PH& by dsjob? Create a job with dummy transformer and sequential file stage. Processors communicate via shared memory and have single operating System. This SMP is also Called as "Tightly Coupled Multiprocessing". An MPP is considered to be better than an SMP for applications that allow a number of databases to be searched in parallel. This SMP Methodology doesn’t exceed more than 16 Processors.Known as shared nothing in which each processor has exclusive access to the hardware resources.Page 33 of 57 190. In Before Job subroutine. An Interconnect arrangement of data paths allows messages to be sent between different processors which run for a single Application or product. Symmetric Multi Processing (SMP) is the processing of programs by multiple processors that share a common operating system and memory.

196. Write caching options  it makes data written into cache before being flushed to disk. 199.The number of rows written to or read from the database at a time. Pre-allocating  Estimating the approx size of the hash file so those files need not to be splitted too often after write operation. You can enable this to ensure that hash files are written in order onto cache before flushed to disk instead of order in which individual rows are written. Can you tell me for what purpose . 194. Preloading hash file into memory  this can be done by enabling preloading options in hash file output stage. · We are having the multi-row lookup facility in this stage. 195. A standard usage for the same can be that. How to improve the performance of hash file? You can improve performance of hashed file by 1. 198. the file is exported in the . Array Size . · We are having the facility to write multiple queries before (Oracle OCI/Output/SQL/Before) or after (Oracle OCI/Output/SQL/After) executing the actual query (Oracle OCI/Output/SQL/Before) · We don’t have multi-row lookup facility in this stage. 3.The transaction size for the new jobs is now handled by Rows per transaction on the Transaction Handling tab on the Input page.dsx format. ETL = Extract >>> Transform >>> Load ELT = Extract >>> Load >>> Transform 197. Whenever we export a job or a sequence. What is the difference between OCI stage and ODBC stage? Oracle OCI: · We can write the source query in this stage but we can’t write lookup query in this stage instead of this we are using hash file stage for the lookup. all the rows are written after being committed to the data table.dsx files are used in the DataStage? . The default value is 1.Page 34 of 57 on the system along with the process ids. ODBC: · We can write both source query as well as lookup query in this stage itself · We are not having the facility to write multiple queries in this stage. The default value is 0.The number of rows written before a commit is executed for the Transaction. How to find the number of rows in a sequential file? Using Row Count system variable. that is. What is the transaction size and array size in OCI stage? Transaction Size . Rows per transaction . we develop the job in our test environment and after testing we export the file and . What is difference between ETL and ELT? ETL usually scrubs the data then loads into the Data mart or Data Warehouse where as ELT Loads the data then use the RDMBS to scrub and reload into the Data mart or DWH. that is.dsx is the standard file extension of all the various DataStage jobs. each row is written in a separate statement. 2.

203. If we have two sources with same Meta data. What are the changes need to be done on the database side.Page 35 of 57 save it as x. What are the new features of DataStage 7. How to write and execute routines for PX jobs in c++? You define and store the routines in the DataStage repository (ex: in routine folder) and these routines are executed on c++ compilers. How can we generate a surrogate key in server/parallel jobs? In parallel jobs we can use surrogate key generator stage. 205. Output it into a Transformer. Write a routine to abort the job which is initiated at the Function. 202. You can also generate the surrogate key in the database using the sequence generator.dsx. @INROWNUM = 1. In server jobs we can use an inbuilt routine called “KeyMgtGetNextValue”. How do you call procedures in DataStage? Use the Stored Procedure Stage. What are various processes which start when the DataStage engine starts? 2. 201. So if the data is not matching it is passed in the transformer and the job is aborted. Disadvantages of staging area? Disadvantage of staging area is disk space as we have to dump data into a local area. 208. how to check the data in 2 sources is same or not? If the data is not same I want to abort the job? How can we do this? Use a change Capture Stage.1? 207.1 from DataStage 6. if I have to use DB2 stage? . What is the difference between DataStage and DataStage Scripting? 206. What is the difference between reference link and straight link? The straight link is the one where data are passed to next stage directly and The reference link is the one where it shows that it has a reference (reference key) to the main table. This can be done using DataStage Manager. 200. 1. 204.

Page 36 of 57 3. How can we create Containers? 12. testing and debugging its components. How we use the DataStage Director and its run-time engine to schedule running the solution. How can we implement Slowly Changing Dimensions in DataStage? 5. What does the reject option in transformer do? 20. Which version of DataStage is used in the project? 18. What is iconv and oconv functions? 3. DSRPC 2. How can we improve the performance of DataStage jobs? 11. How is the scheduling done in the project? 17. DataStage engine is responsible for compilation or execution or both? There are three processes start when the DataStage engine starts: 1. What are the Job parameters? 9. Different ways a project can be moved to production? Ans: export-import and Version control. What is Plug-in? 10. How can we implement Lookup in DataStage Server jobs? 6. 209. 211. 1. What is the performance tuning required while dealing with large data? 19. What is the use of Usage analysis? 14. What are the third party tools used in DataStage? 7. 212. What is the architecture of DataStage? . 15. DataStage telnet Services. Why hash file is faster than sequential file and ODBC stage? Hash files store in local server and cache. and monitoring the resulting executable versions on ad hoc or scheduled basis? 210. What is the difference between routine and transform and function? 8. Where can you output data using the Peek Stage? The output of peek stage can be viewed in DataStage director LOG and also can be saved as a text file. What is Ad-Hoc access? What is the difference between Managed Query and Ad-Hoc access? 213. How can we join one Oracle source and Sequential file? 4. DataStage Engine Resources 3. Difference between Hash file and Sequential File? What is modulus? 2. What database is the DataStage repository use? Answer: Universe Database 16. What about System variables? 13.

Get breakdown of the resource usage? Is the hardware config SMP. What is Active and Passive stage? 41. available memory. What is before job subroutine/After job subroutine? When do you use them? 32. 1) Types of parallel processing 2) What is the SMP (Symmetric Multi Processing) and MPP (Massively Parallel Processing)? 3) What are CPU limited. of processes. no. Pools. dos 14) When are buildups created? Ans: More functionality. are there any configurations restrictions? E. Resource 12) What are the points that needs to be considered while creating the config file? Ans: Available nodes. If you are doing any changes in shared-container will it reflect in all the jobs wherever you used this shared container? 29. How do you define and use the job parameters? 22. what other process to be executed on same nodes. Please explain any ETL process that you have developed? 27. What is hash file? Types of hash files. What is the difference between Transform and routine? 45. How do you get log info into a file? 31. DB only runs on certain nodes and ETL cannot run on them. Full outer 11) What are the components of APT_CONFIG_FILE? Ans: Nodes. What is sequencer? These are some of DataStage PX Jobs questions which can be asked in interviews.g. 10) What are the four types of joins possible in Joiner stage? Ans: Inner. CPU time. How to remove blank spaces from data? 40. Memory limited and Disk I/O limited jobs? 4) Can one combine pipeline & partition parallelism? 5) Advantages of PX over server job 6) Is it possible to create user-defined stage in PX? 7) Can I use hash file in PX? 8) What is surrogate key stage? 9) What is the use of APT_DUMP_SCORE? Ans: To get messages in logs such as no. What is Cleanup Resources and when do you use it? 35. What is difference between shared-container and local-container? 25. What all are the stages you have used in your project? 42. What are stage variables. Cluster or MPP? 13) When are wrappers created? Ans: only for executable commands for UNIX. How do you backup and restore the project? 33. Left Outer. Can I join a flat file and oracle and load into oracle? Is this possible? 36. What are the various reports that could be generated using this DataStage? 39. Right Outer. Have u written any custom routines in your project? If so explain? 30. How to use routines in DataStage? 24.Page 37 of 57 21. complex logic needed. How do you connect to Oracle? 26. What are the general problems that u face in DataStage? 38. What is difference between hash file and sequential file? 44. 28. system variables and environment variables? 23. While loading some data into target suddenly there is a problem loading Process stopped how can u start loading from the records that were left? 37. of nodes used. Could DataStage generate test cases? 43. Fast node. 15) When are custom stage created? . What is Clear Status File and when do you use it? 34.

First. What is pivot stage? Why are u using? What purpose that stage will be used? Pivot stage is used to make the horizontal rows into vertical. What is Star schema? 4. What is the difference between OLTP and OLAP? 9. data for the job needs to be created to test the functionality. Each iteration of code change should be accompanied by a testing iteration. What is surrogate key? 7. What are Degenerated Dimensions? 13. all job designs should include an error correction and fail over support so that the code will be robust. What is page size/array size in oracle? 5. What are Bitmap and B-Tree indexes? Explain Local Vs Global variables 3. Are you involved in data modeling? If yes which tool/tech you are using? 10. What is materialized view 4. What do you mean by summary table? 12. Source Table . merge. How can one tune SQL’s in Oracle? 214. Which schema modeling techniques you ever used? 11. What is difference between Data Warehouse and Data Mart? 3. What Normalization? Explain 3rd Normalized form? 8. and lookup stage? 23) What is dataset? 24) Differences between dataset. fileset and lookup file set? Questions on Data Warehousing concept 1. Performance tests basically involve load tests and see how well the existing code Perform in a finite period of time. What are integrity constraints? 6. What is Data Warehouse? 2. 215.Page 38 of 57 Ans: new operators need which are not in EE 16) What are different job sequencer stages? 17) What is iconv and oconv functions? 18) Can we implement Slowly Changing Dimensions in DataStage? What are Job parameters? 19) What does the reject option in transformer do? 20) How is parallelism executed? 21) What is RCP? 22) Difference between join. Oracle question based on data warehouse? 1. What is fact less fact? Fact table that contains no measures or facts are called "factless fact tables" or "junction tables". By changing the data we will see whether the requirements are met by the existing code. In addition. SIT (System Integration Testing) and UAT (User Acceptance Testing) Phases. What is Snow-flake schema? 5. How can we Test jobs in DataStage? Testing of jobs can be performed at many diff levels: Unit testing. Performance tuning can be performed on SQL or the job design or the basic/OSH (Orchestrate Shell) code for faster processing times. Testing basically involves functionality and performance tests. What is parallel execution 2. What is fact and dimension? 6.

217. Once we set specific variable that will be available into the project/job. view or edit. The following are different types of routines: 1) Transform functions 2) Before-after job subroutines 3) Job Control routines. (once you imported or typed metadata. How to read the data from EXCEL FILES? Explain with steps? Reading data from Excel file is 1) Save the file in . 2) Use a flat file stage in DataStage job panel. As a Designer.csv (comma separated files). where you can create. it will be much faster. So when a look up is made. click view data to check the data values) Then do the rest transformation as needed 221. 220. 4) Import metadata for the file. For that we can got to DS Admin. Also it eliminates the duplicate rows. How can we create environment variables in DataStage? This mostly comes under Administrator part. We can also define new environment variable. 218.csv file (which u stored). 3) Double click on the flat file stage and assign input file to the . we can add directly by Designer -- view -- job properties -- parameters -- add environment variable -- under user defined -- then add.Page 39 of 57 ProdID 1010 Target Table ProdID 1010 1010 216. We can set this either at Project level or at Job level. What is the use of Hash file? Why can’t we use sequential file? The primary use of the hash file is to do a look up. What’s the meaning of performance tuning technique. What is a routine? Routines are stored in the Routines branch of the DataStage Repository. Q1_Sales 123450 Quarter_Sales 123450 234550 Q2_Sales 234550 Quarter Q1 Q2 What is an environment variable? Environment variable is a predefined variable where we can use while creating a DS job. 219. You can use a sequential file for look up but you need to write your own routine to match the columns. Example? Performance tuning means we have to take some action to increase the performance of the . Coding time and execution time will be more expensive. But when you generate a hash file the hash file indexes the key by an inbuilt hashing algorithm.

in parallel. How do we create index in DataStage? CREATE INDEX emp_ename ON emp(ename) TABLESPACE users STORAGE (INITIAL 20K NEXT 20k PCTINCREASE 75)... You can compare this with 3 sections of pipe as water enters the pipe it starts moving into all the sections of pipe.. How can we improve performance in aggregator stage? We need to sort the data before sending to it. Unit Testing is the technique of testing the Individual DataStage jobs for their functionality... . 226. For example.Consider 3 CPU’s connected in parallel and being fed with data at the same time thus reduces the load and efficiency. Server jobs can compile and run with in the DataStage server but parallel jobs can compile and run in the DataStage Unix server. 2.pipeline parallelism 2.. 223. Integrating Testing: When the two or more jobs are collectively tested for their Functionality that is called Integrating testing. 5) Do not use IPC stage at target side...partition parallelism 1. As water is being fed to them it consumes large quantity in less time. Partition parallelism will maintain more than one node within source and target.etc. Server jobs extract total rows from source to other stage and then only that stage will be activate and passing the rows into target level or DWH. 222. inserts and updates. Differentiate between pipeline and partition parallelism? Consider three CPU’s connected in series.. What are the main diff between server job and parallel job in DataStage? There are lot of differences in using same stages in server and parallel. a sequential file or any other file can have either an input link or an output ink. we extract some rows from source to another stage and at the same time the stage will be activate and passing the rows into target level or DWH. When data is being fed into the first one. Partition Pipeline.. Based on statistical performance.. You can compare a single big pipe having 3 inbuilt pipes. But in parallel jobs it is two types 1. simultaneously is being transferred into the second CPU and so on. What is Integrated & Unit testing in DataStage? Unit Testing: In DataStage scenario.Page 40 of 57 slowly running job by 1) using link partitioner and link collector to speed up the performance 2) using sorted data for aggregation 3) using sorter at source side and aggregation at target side 4) Tuning the OCI stage for 'Array Size' and 'Rows per Transaction' numerical Values for faster selects. It is time taking.. 225. it starts processing.. 224. It will maintain only one node within source and target. but in server it can have both (that too more than 1).

230. With the help of scripting. Why job sequence is use for? What are batches? What is the difference between job sequence and batches? Job Sequence allows you to specify a sequence for server or parallel jobs to run. We can collect the rejects (bad formatted records) by setting the property of rejects to "save" (other options: continue. for e. The sequence can also have control information. the sequencer triggers the third job (J3) only after the 2 input linked jobs (J1 & J2) completes and if it is set to "ANY". 233. What are different types of file formats? Some are comma delimited csv files. in batches we cannot provide the control information. For what purpose is the Stage Variable is mainly used? Stage variable is a temporary storage memory variable in transformer stage. If it is set to "ALL". If we are doing calculations repeatedly. 231. What is a sequential file that has single input link? Sequential file always has a single link because it cannot accept the multiple links or threads. It appears in the DataStage Repository and in the DataStage Director client as a job. Once you defined a job sequence. it can be scheduled and run using DataStage Director. We can select the required columns and can omit the remaining. We can flatten the arrays (COBOL files). What are different types of star schema? Multi star schema or galaxy schema is one of the types of star schema. we can set parameters for a caller job.e. 228. you can specify different courses of action to take depending on whether a job in the sequence succeeds or fails. . What is complex Flat File stage? In which situation we are using this one? A complex flat file can be used to read the data at the initial level. 232. Data in sequential file always runs sequentially. Difference between Sequencers and Batches: Unlike as in sequencers. It is a special type of job created using DataStage director which can be scheduled to run at a specific time. It can be used in situations where you want to Store a Previous record value in a variable and compare that with current record value and use in if then else conditional statements. By using CFF. The sequencer can be set to "ALL" or "ANY". 234... What is the sequencer stage? Let’s say there are two jobs (J1 & J2) as the input links and one job (J3) as output link for a sequencer stage in a DS job sequencer. tab delimited text files. A Batch is a collection of jobs grouped together to perform a specific task. do error handling etc tasks. we can read ASCII or EBCDIC (Extended Binary coded Decimal Interchange Code) data.Page 41 of 57 227. we can store that result in a stage variable. execute it. I. 229. fail).. it just waits for any of the job (J1 or J2) to complete and then triggers the third one.g. What is the use of job control? Job control is used for scripting.

You can use OCI stage to read the source file. Default size of cache size is 128MB. In the OCI Stage write user defined SQL query as "Select max(cola) from the table" which will fetch the maximum value available in the table then load the data to Target Table. Yes. What is the difference between RELEASE THE JOB and KILL THE JOB? Release the job is to release the job from any dependencies and run it. 238. it is possible. Is it possible to query a hash file? Justify your answer. 236. even though the individual records are stored as EBCDIC. Then we can use same partition on Aggregate stage. 240. 242. A hash file is like a database table for all practical purposes. What is the alternative way where we can do job control? Job control can be done using : DataStage job Sequencers DataStage Custom routines Scripting Scheduling tools like Autosys Is it possible to calculate a hash total for an EBCDIC file and have the hash total stored as EBCDIC using DataStage? Currently. Hence based on the aggregate logic we should sort the incoming data by using hash partition on keys.Page 42 of 57 235. those are used in the sequence in future. What is the use of tunable? Tunables is a tab in project property in DataStage Administrator. Suppose you have table "sample" & three columns in that table sample: Cola Colb Colc 1 10 100 2 20 200 3 30 300 Assume: cola is primary key How will you fetch the record with maximum cola value using DataStage tool into the target system. This is most commonly used. . the total is converted to ASCII. Which partition we have to use for Aggregate Stage in parallel jobs? It is always preferable & appropriate that we must use a sort stage before aggregate stage. Kill the job means kill the job that's currently running or scheduled to run. 241. 237. in that we can change the value of cache size between 000 to 999 MB. What is user activity in DataStage? The User variable activity stage defines some variables. 239. Hence you can login to the tssh/uvsh/TCL prompt and query it like any other database table.

245. multiple output links will be exist. What is the difference between IBM Web Sphere DataStage 7. There are so many things available in Quality stage. and some other advanced stages. Data needs to be sorted. 4. There is an option named 'Unique'. It holds all the meta data. Your system memory will be occupied by the data that is going to aggregator.5. Aggregators – What does the warning “Hash table has grown to ‘xyz’ …. Large amount of memory usage. single rejects link. 3. It has more features like Quality Stage & MetaStage. in this there are only 3 client tools (administrator. 2. there is an option to sort the input data where you can select the key on the basis of which you can perform the sort operation. My solution to that is using multiple chunks of data and multiple aggregators. 247.” mean? Aggregator cannot store the data onto disk like Sort stage do the data landing. 246. Several reject links. it will return the unique record after the sort operation. Data on input links or reference links need NOT to be sorted. How do you extract job parameters from a file? Through user variable activity in sequencer Job. which is used for data validation which is very important for DWH. raw data. .0 was released with IBM Websphere DataStage 8.5 was released with the name of Ascential DataStage and only 8. Used to perform lookups. here they have removed the manager.0. What is repository? Repository resides in a specified data base.0.0.5 (Enterprise Edition) & Standard Ascential DataStage 7. Through calling a routine. They have included the Quality Stage. mapping information and all the respective mapping information. It maintains its repository in DB2 unlike files in 7. it is released by IBM itself and given the version as 8. What is the difference between Merge Stage and Lookup Stage? Lookup stage: 1. Multiple reference links. 3. What is process to remove duplicate records using Copy Stage? In the copy stage. Also it has a stage specifically for implementing the SCD 1 & 2 (SCD Stage). Less memory usage. single input link. If your system memory is full then you get that kind of weird messages. 2. it is included in designer itself (for importing and exporting) and in this they have added some extra stages like SCD stage. IBM Information Server is also known as DS 8. 248. director). Combines the sorted datasets with the update datasets. I dealt with that kind of error once. DataStage 7. designer.5 Version? IBM acquires DataStage from Ascential. single output link.Page 43 of 57 243. 4. Select this. 244. Merge stage: 1. by using this we can implement scd1 and scd2 directly. The advanced edition of DataStage is Websphere DataStage and Quality stage.1. Because paging is required.

but I don't know how it will work and where we will be used? WebSphere® Transformation Extender (WTX) is the universal transformation engine for WebSphere that addresses these complex data challenges to integration. if the source contains the varchar and the target contains integer then we have to use this Modify Stage and we have to change according to the requirement. And we can do some modification in length also. write range map. killing hung jobs. It's Partitioning by Column wise. graphical approach to development. Suppose you have a source and target and a transformer in between that does some processing. When you are able to handle Null handling and Data type changes in ODBC stages why you need Modify Stage? Used to change the data types. peek. sample. it tackles the "hard. 253. row generator. maintaining rights of users. IPC stage is also intended to speed up processing. What are the different Debug stages available in DataStage PX? Head. apply the Business logic and then collect the data back using link collector and pump it into output. What is TX and what is the use of this in DataStage? As I know TX stand for Transformer Extender. 256. Explain a specific scenario where we would use range partitioning? If the data is large and if you cannot process the full data in one time process you will generally use the Range partitioning. ugly challenges" in integrating systems and information across the enterprise through a codeless. 251. What is the difference between the Dynamic & Static RDBMS Stages? 252. 250. column generator. Link Collector & IPC? Whether in Server Jobs or in Parallel Jobs? And SMP is a Parallel or Server? We use Link partitioner and link collector stages in server jobs to speed up processing. How to enable the DataStage engine? . applying fns etc. 255. Through its unique ability to speak to and process any data type in its native format. You can speed it up by using link partitioner to split the data from source into different links. 254. Have you ever contact with the DS Administration people? What are the situations and what problems you phased? Creating projects. Where we can use the Stages Link Partitioner.Page 44 of 57 249. tail.

So such columns can be extracted from the data source and end up on your data target without explicitly being operated on in between. If RCP is enabled for a project. 2. Example is you want to connect to database you need userid. 260.. When we only know about the columns to be processed and we want all other columns to be propagated to target as they are. How you will define global variables in DS? The user variable stage allows you to define global variables within a sequence. we check enable RCP option in administrator or output page columns tab or stage page general tab and we only need to specify the schema of tables we are concerned with. password and schema. means through this we can pass the parameters. Once you set these variables in DataStage you can use them in any job as a parameter. Attach a reject . Can write a SQL query depending upon the fields. By using this if there is any change in password or schema no need to worry about all the jobs. do not have inherent column definitions. How to find duplicate records using transformer stage in server edition? 1. These are constant throughout the project so they will be created as environment variables. 261. For final/incremental processing we keep all the jobs in one diff sequence and we run the jobs at a time by giving some triggers.. 258. You can only use RCP on sequential files if you have used the Schema File property (see “Schema File” on page 5-8 and on page 5-31) to specify a schema which describes all the columns in the sequential file. Use them where ever you want with #Var#. unlike most other data sources. You can use a hash file. Job sequence is used to run the group of jobs based upon some conditions. some conditions. What is the difference between Job Control and Job Sequence? Job control specially used to control the job. Stages that will require a schema file are:• Sequential File• File Set• External Source• External Target• Column Import• Column Export ". These variables can be used anywhere in the sequence. dashboard information.. The values of the user variables are set by expressions in the stage’s properties. Change it at the level of environment variable that will take care of all the jobs. What is the use of environmental variables? Environment Variables are the ones which sets the environment.Page 45 of 57 257. you can just define the columns you are interested in using in a job. but ask DataStage to propagate the other columns through the various stages. some log file information. load recover etc. we are using RUN TIME COLUMN PROPAGATION option? Run time column propagation is used in case of partial schema usage. Sequential files. You need to specify the same schema file for Sequential File Stage 5-47any similar stages in the job where you want to propagate columns. and so DataStage cannot always tell where there are extra columns that need propagating. by nature which doesn’t allow duplicates. 259. According to documentation “Runtime column propagation (RCP) allows DataStage to be flexible about the columns you define in a job. In which situation.

the stage/job. an error message is written and these errors are called phantom errors. In DS Server edition. Aggregate. All the internal stages generally called Internal Operators in PX. How do you configure api_dump? 267. All files in DataStage are Passive stage like hash file. What are Orchestrate options in generic stage. What is phantom error in DataStage? For every job in DataStage. These phantoms write logs reg. what are the option names? Value? Name of an Orchestrate operator to call? What are the orchestrate operators available in DataStage for AIX environment? Orchestrate is the old name for PX and EE. You can write Custom operators if there is any necessity of enhancing the functionality that is not available in Orchestrate. a phantom is generated for the job as well as for every active stage which contributes to the job. Transformer stage to identify and remove duplicates from one output. The stage which does some process in it is called active stage. Ex: Transformer. sequential file etc. It’s a very complex process and only few people actually do this kind of coding. These logs are stored at &ph& folder. 265. Does type of partitioning change for SMP and MPP systems? 263. Is Hashed file an Active or Passive Stage? When will be it useful? Has files is a passive stage. where there is a need to store data in an intermediate table format. Hash files are generally used in intermediate jobs in DS. Please verify the Custom Operators Docs in EE or PX versions. This approach requires sorted input.Page 46 of 57 link to see the duplicates for your verification. How to attach an MTR file (Map Trace) via email and the Map Trace is used to . 262. Sort. If there is any abnormality occurs. hash files are used in jobs where lookups are required. 266. You can write this code in C++ and add the complied code to DataStage. Phantoms can be killed through DataStage Administrator or at server level. 3. and direct all Input rows to another output (the "rejects"). 264.

How to work with Repeating fields in DataStage? We should create some staging variable & define the value for repeating fields there & then just map these to respective columns. Update the file as you insert the record. If job failed in the middle then read the number from the file and process the records from there only ignoring the record numbers before that try @INROWNUM function for better result. lookups provide easy option. What are the important considerations while using join stage instead of Lookups? 1. 269. If the volume of data is high then we should use Join stage instead of Lookup. If you want to do this manually. How can we increment the surrogate key value for every insert in to target database? We have Surrogate key generator stage in DataStage enterprise edition which will generate keys incrementally. If you need to capture mismatches between the two sources. If you see the Directory called %PH% this folder captures the log of phantom process. How to find out individual counts for a particular table? While implementing the SCD type-2. 272. How do you load partial data after job failed source has 10000 records. 271. 2.Page 47 of 57 record all the execute map errors? 268. due to maintaining of history. Some threads will keep running without any source process they are called Phantom Process. What is Phantom error in the DataStage? How to overcome this error? Phantom Process is Orphaned process. Keep track of number of records in a hash file or test file. 270. how can I resume the load? There are lots of ways of doing this. Generally only load job never fails unless there is a data issue. Otherwise job will abort with message phantom error. Job failed after 5000 records are loaded. Transform and Load process separately. 273. All data issues are cleared before in transform only. There are some DB tools that do this automatically. In the transformer stage the order of the output links should be such that the dependant links should come after independent links. we have to use stage variables or pre- . But we keep the Extract. the primary key constraint will be violated. To have a primary key for the table for the sake of unique key we use surrogate key. This status of the job is abort. Instead of removing 5000 records from target. Sometimes some processes will be still running in the server even though you kill the actual process. In Server edition.

What type of error is this and how to fix it? 276. What are the errors you experienced with DataStage? . but it is not in dataset. Multi-instance jobs extend that and allow you to have multiple instances of that job running (hence the name). you can run multiple 'copies' of the same job as long as the currently running invocation ids are unique.Page 48 of 57 defined variables like @inrownum. Dataset has no limit and sequential file has a size limit. i. Dataset can be used as a hash file (just like in server hash file). you can only have one instance of it running at any given time. So. What is the baseline to implement partition or parallel execution method in DataStage job? E. 274. With normal jobs. 278. more than 2 million records only advised? 275. By using UNIX sed command we can delete header and footer. for header sed -n '1|p' and footer sed -n '$|p' 277. it improves the performance.g. How do you delete header and footer on the source sequential file and how do you create header and footer on target sequential file using DataStage? In Designer Palette Development/Debug we can find Head & tail. By using this we can do.e. A batch is running and it is scheduled to run in 5 minutes.. @outrownum. 279. it can be used as temporary target. Performance is very good in dataset when compared to sequential file because of parallelism. we need to pass some value. What is the difference between Sequential Stage & Dataset Stage? When do you use them? Sequential file can be used as a target file but Dataset can't be used as a final target. Sequential file does not show the null value means it never accepts the null. But after 10 days the time changes to 10 minutes. What is Invocation ID? An 'invocation id' is what makes a 'multi-instance' job unique at runtime.

Log file must be cleared with no warnings also. there you put alias name and change the column name by selecting the columns tab.. then only go for that. 282. 281. We have to take source (file or db) and dataset as a ref link (for look up) and then LOOKUP stage.Page 49 of 57 If there is any fatal error means the job got aborted but if there are any warnings are there means the job not aborts but we have to handle those warnings also. Decrease usage of transformer stage. Use tunable tab in Administrator for cache memory. but it is for only scd1. 2. in this we have to compare the source with dataset and we have to give condition. 280.. lookup or merge (depends upon the data in input dataset and reference link data). 5. Filter the data in source itself (if it is oracle database.. How to connect two stages which do not have any common columns between them? If suppose two stages don’t have the same column name then in between use one Transformer stage and map the required column. 4. 283. 284. keeping the same partition which is in previous stage.. If it is not the case means better go for some key partition techniques in sort. otherwise u will get some problems like deadlocks…). after that we have to take two targets for insert and update. How to develop the SCD using LOOKUP stage? We can implement SCD by using LOOKUP stage. link (for lookup). So many errors will come in diff jobs like Parameter not found in job load recover.. Hi friends.. 3. Would .etc. there we have to manually write the sql insert and update statements. I know while using lookup stage if the lookup table has huge amount of data we will go for sparse lookup or else will use normal lookup. 6. If those are sources then you better use copy stage after one stage and change the column name according to your requirement. Control job is failed because of some. Use sorted input for aggregator. Is there any option which can increase the performance with in the sort stage? See if it is oracle db then u can write user def query for sort and remove duplicates in the source itself. I am running huge amount of data through a sort stage and it is very slow. buffer size (if u have knowledge. How can we improve the performance of the job while handling huge amount of data? 1. remove duplicates and give unique partition key. then join these two sources. According to the requirement you use join.. Don’t allow the duplicates. If it is oracle database then select the user defined sql. After that in transformer we have to give the condition.. use dataset for ref. not for scd2. Child job is failed because of some. And maintaining some key partition techniques u can improve the performance. write user def sql).. Use partition methods for different stages.

289. Then you can go for a Sparse Lookup. In case of a Normal Lookup. OR source as dataset and transformer (in this give constraint) and target. 290.Page 50 of 57 anybody explain how the memory will be allocated for both types of lookups? The lookup type is not decided by the data on the reference link but by the amount of data on the input link. and only when you believe a job has hung or aborted. peak. it means job is successfully completed but the data is not loaded in to the target. 286. Or Instead of join stage use Merge Stage to capture the reject records. First we have to take dataset. So memory comes into picture only for normal lookup's and not for Sparse lookup's. If you have a huge volume of data to be referenced. that is there are no performance issues in the source side. data type length problem…etc 288. And a SQL where clause is directly fired on the reference link database for fetching the lookup key value. What is the use of clear status file in DS? How we will use it? Explain with examples? When you clear a job’s status file you reset the status records associated with all stages in that job. If a record is duplicated 3 times then how to get middle duplicated record? Is it advisable to use basic Transformer in Parallel jobs? I think we have to use remove duplicate stage here to remove duplicates. after that remove duplicates stage and target. 285. If you want to capture Reject records then you can use transformer (or filter) after Join Stage and capture the null values. You should therefore use this option with great care. head etc) are to monitor the jobs and to generate mock data when we don’t have real time data to test. In particular. How can I load a flat file into target as fast as I can? Assuming that the source bottleneck is not there. But showing errors in director window. tail. before you clear a status file you should: 1) Try to reset the job 2) Ensure that all the job’s processes have ended. There is no memory allocation for this type of lookup. Lookup stage requires all the data to be loaded in memory 1st & then only it can start matching process with inputs. which stage will you use? Join or Lookup stage? Why? For huge size of data we must use join. If and only if your input link has few records. What is the purpose of Debugging stages? In real time where we will use? The main uses of Debugging Stages (row gen. . In DataStage job successfully finished. 287. may be some miss matched data. The entire reference link data is read into the memory and then a lookup is done in the memory itself. Hence It is advisable to use join stage than Lookup stage. How do we fix and find the cause for the errors? I think in designer u find the green color but in Director -> log view --> u find some fatal errors and job is aborted.

How to connect MERGE STAGE while source as two tables? 5.Page 51 of 57 291. Is the Hash file is active or passive? If we take as source? 2. In hash file dynamic 30. What is the purpose of MERGE? 6. How can DS job scheduled in UNIX? 7. How do u know how many rows rejected? is the path where the script was saved. Shared Containers replace server jobs into parallel job? 14. How do you decide when to go for join or lookup? Following is the description of the schedule: • 30 – 30th Minute • 08 – 08 AM • 10 – 10th Day • 06 – 6th Month (June) • * – Every day of the week /home/nikhilanshuman/runjob. another one is 1000. What is routine? . Diff between hash & range & entire Partion keys? 17. or create one if it doesn’t already exist. Diff between DS routine. 1. is possible to extract EXCEL sheet by using SEQ file? 15. Use of universe stage? 9. Which one we take as lookup? 12. After 70 records job is abort? How to load reaming? 13. Now enter following line into the file : 30 08 10 06 * /home/nikhilanshuman/runjob. In lookup stage in parallel which Partion key u select? 18. DS transform. Can u take sequential file as look up? 3. 293. If one table contains 100 rows. there r two types: 1) generic 2) specific what is the meaning? 4. DS function in transform stage? 11. In this situation how to assign the one path of file to PROJDEF using the job parameters? 292. If a file has 100 records. At the time to incremental loading I need to take the reusable files' path. Write the shell script using UNIX to run the 2 jobs? How to schedule the 2 jobs using UNIX crontab to run for particular time? Use crontab -e to edit your crontab file. What is SEQ file buffer? 10.

Diff copy. Def of system variables? Where we use it? What are system variables in your project? 21. Diff funnel. What is version control in DataStage? 34. How to connect DB source? I. I want to process 3 files in sequentially at a time how can we do in job sequence? 33. if I have million records? 64. How can we join one oracle & flat files? 28.Page 52 of 57 19. How to find multiple jobs process id’s without using cleanup resources in DS director? 60. Where the hash file store. If source is OCI & flat files then why u need ODBC in u r project rather than OCI? 24. Diff between file set. We use OCI or some another stage? 29. How do you do automation job? 27. I am using oracle database? But I am not using OCI stage? Without OCI or ODBC HOW CAN I EXTRACT THE DATA? 45. What is an environmental variable? Where is it stores? 48. I have 1000 records? How to load last 100 records? 41. What are buildup and audit stages? What is the purpose of it? 59. Data migration? 40. How to cleansing your DataStage job? 42. Diff between server sequence stage & parallel sequence stage? 50. How to execute job sequence? 44. Can we take SEQ file as source to Hash file. Diff between hash & dataset? 66. if I take hash is target? 20. One file contains 1. difference. Diff between hash & SEQ file in server? 56. it works or not? If not what error it will give? 25. how to set configure file node. Is it aborted why? Is it validating why? 47. Diff between change apply & change capture stage? 52. Is it possible to load target? 65. How to filter data by using SEQ? 31. Can we use AGG & Transform stage use for sorting? How? 26.000 records and another one contains 1000 records. Where we write the routines? 43. I have 4 jobs. Diff between parallel jobs and parallel shared containers? 36. external filter in parallel? 54. I have two types of metadata in SEQ file. What is purpose of filter option in SEQ file in server jobs? 30. How to give scheduling time? 37. If there is one million records. dataset. is it run at a time? Is it possible? What is status? 46. Where we find the process id’s from last 5 execution jobs? 61. Diff between server SEQ & parallel SEQ files? 58.00. transform stage? 53. Reaming I run later? How can u do that? 38. Diff between clear log file & clear stage file? 32. How to release job? 35. filter. Diff between dataset & SEQ file in parallel? 51. Diff between static & dynamic types in hash file? 57. In job sequence I have 5 jobs? I want to run 1 & 4 only. path? 49. Can u convert server jobs into parallel jobs? 63. Diff between lookup file set & file set? 39. SEQ file? 55. What are plug-ins? 22.e. lookup file set. How many format types to export data by using DS manager? 62. I have two files. What is the purpose of fact less fact table? 23. Then how will we set nodes? .

. Tell me something about that? DataStage Engine commands are otherwise called as Universe Commands. cc-3 dd-4 ddd-5 here for ba and bg. Change Capture is used to capture the changes between the two sources. Change apply stage is used along with change capture stage. By this change code one can recognize which kind of change this is like whether it is delete.. The source in which we are looking for the change is called before dataset.g. 68. Using server job. The following commands can be taken as DS Engine commands. when we select Upsert mode? Diff between server seq file & parallel seq file? Diff between server transform stage & parallel transform stage? In OCI stage when we go for insert update OR update insert? What is the Disadvantage of surrogate key stage? 294.. how to transform data in XML file into sequential file? I have used XML input. 296. XML transformer and a sequential file? . we don’t have sequence key because they have holes. 70.Page 53 of 57 67.e. E. insert or update. They are used to retrieve the details that are stored in the DataStage repository in the form of files and such situations this surrogate key generates holes(simply skips the ordering) ex: aa-1 bb-2 ba.. 295. used to start and stop the DS Engine DSHOME/bin/uv -admin -start DSHOME/bin/uv -admin –stop 297. Change Apply will apply those changes in the output file. We can also access the hash files and VOC contents by using those commands. The source which is used as a reference to capture the changes is called after dataset.: EVERY. LISTL are some of the commands. 69. What is the difference between change capture and change apply stages? Change capture stage is used to get the difference between two sources i.. I am eager to know about DataStage Engine commands. It takes change code from the change capture stage and apply all the changes in the before dataset based on the change code. bg. In oracle enterprise edition. This change capture will add one field called "change code" in the output from this stage. after dataset and before dataset. If the data is not properly processed on nodes.

Page 54 of 57 Import the XSD for the xml file via manager-metadata importer and store it. If suppose we have 3 jobs in sequencer. 301. how we can run? To run a job even if its previous job in the sequence is failed you need to go to the TRIGGER tab of that particular job activity in the sequence itself. 298. e. This way you can run your job 2 even if your job 1 is aborted. Here the whole data in the file will be taken as a single row and streamed through the folder stage. For example. There are many other options available. How to fetch the last row from a particular column in a sequential file? 300.g. Like for expression type "Failed" you cannot change this field. Expression: This is editable for some options.(Conditional)".select accordingly) Output Link-> Import the column metadata from table definition stored and map directly to seq file( or to transformer->seq file). In output: metadata will be 'File'(filename-key-varchar) and 'Data'(single row of data-long varchar 9999) In XMLINPUT stage->Source. for job activity 1 link name will be the link going to job activity 2). If transformation functions are not required to be done on the data in XML file. There you will find three fields: Name: This is the name of the next link (link going to the next job. To load data directly from a XML file to Sequential file:use these stages in the design:FOLDER->XMLINPUT->SEQUENTIAL FILE (If transformation is required. Expression Type: This will allow you to trigger your next job activity based on the status you want. there is no need for the the 'File' in source column and content as File path or (select 'Data' and XML doc. 299. How do you remove duplicates using transformer stage in DataStage? double click on transformer stage---> Go to Stage properties(its having in header line first icon) --->double click on stage properties --->Go to inputs --->go to partitioning---> select one partition technique(without auto)--->now enable perform sort--->click on perform sort--> now enable unique--->click on that and we can take required column name. if in case job 1 fails and you want to run the job 2 and job 3 then go to trigger properties of the job 1 and select expression type as "Failed . while running if job1 is failed then we have to run job2 and job 3. Now output will have unique values so here duplicates will be removed. can be given as wildcard. What is the use of combinability mode? . u can use the Transformer stage in between XMLINPUT stage and SEQFILE stage) In FOLDER stage properties->Assign the file path .

then there is no need to give the values at job level are Project level? & how to configure this file at job level? So that we need not hardcode the values. Custom stage: Provides a complete C++ API for developing complex and extensible Stages. How low level design will be prepared in DataStage? 307. 304.Page 55 of 57 302. Build and Custom stages? Wrapped stage: Enables you to run an existing sequential program in parallel Build stage: Enables you to write a C expression that is automatically generated into a Parallel custom stage. 303. Explain Wrapped. How technical design will be prepared in DataStage? 308. What is Orchestrate Schema? Distinguish internal data type (Orchestrate schema) vs. Is it like if we define the values of variables in DSParams file. How high level design will be prepared in DataStage? 306. external data type? . What are the DataStage standards to be considered? 305.

Manager: Routines: 1. What are these terms used for in DataStage . Numeric Functions: 1. DSsendmail SDK: Date Routines 1.Page 56 of 57 309. All Stages (Server + Parallel). numeric . strings (concat. Convert date/TIME STAMP to OraOCI format 2. null handling. oconv) 3. TRIM. DatecurrentGMTtime User defined: 1. Getnextkeyvalue: This routine is used in Server jobs for reading the max Surrogate key value of a dimension from the hash file (max. Date functions : current time stamp. 4. Dates 2. Strings 3. ODBC format 3. SCD Type 2 3. Loading data into Dimension Tables & Fact tables. right). ExecSh 2. key value is loaded into the hash file by a job at the starting of a sequence). left. Find job 2. DSwaitforfile 3. 2. Transforms: 1. Any date to YYYY-MM-DD format (used date functions iconv.Downstream and Upstream? Things to Learn: 1. Performance Tuning. This is for inserting a new record into the dimension table.

Manager: 1. How do you do oracle 4 way inner join if there are 4 oracle input files? 312.Page 57 of 57 Constraints & derivations in a transformer. 311. What is the Batch Program and how can generate? . Table definitions & metadata creation 2. What is SQL tuning? How do you do it? 313. Multiple job compilation.

Sign up to vote on this title
UsefulNot useful