1. Explain how can you implement slowly changed dimensions in datastage? 2.

Is it possible to join flat file and database in datastage? If yes, how? 3. What is the exact difference betwwen Join, Merge and Lookup stage? 4. What is DS Director used for? 5. In what way can you implement Lookup in DataStage Server jobs? 6. How can you implement Complex Jobs in datastage? 7. What is Merge, How is it used? 8. State the difference between Datastage and Informatica? 9. State the difference between serverjobs and parallerjobs? 10. Is it possible to run paralleljobs in serverjobs?

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%% Difference between Query calculation and Layout calculation

query calculation is used to deal with tables(to perform any changes to query) Layout calculations can be used to perform any changes regards to appearance of reports

Query Calcualtion is used to perform Data Scrubbing. Layout Calcuation is used to provide run time information, but not used to perform any operation on the data

$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$4 In datastage, what is scheduling, how it works

Actually datastage is in short consolidation of all the data here we have to dump a data which may be historical,current into a database for this we are using different components in datastage. Basically datastage is ETL tool means it extract the data, apply particular rules and load it into respective database or format what the user wants. There are two ways to schedule the job 1] through datstage 2] autosys . Thanks Pinky

In Datastage after the job is developed and compiled clean in Designer, it can be run NOW or can be scheduled to run at a particular frequency(daily/weekly etc.) using Director. Thisscheduling is nothing but when to run the job. Hope that answered your question.

Schedule specify data and time, to run the job. Schedule can be created using DataStage Client Component. to schedule the job , Run JOb(option) to select date,time.
&&&&

Look for DSTX

&&&

1. What is the flow of load ing data into fact & dimensional tables? A) Fact table - Table with Collection of Foreign Keys corresponding to the Primary Keys in Dimensional table. Consists of fields with numeric values. Dimension table - Table with Unique Primary Key.

Load - Data should be first loaded into dimensional table. Based on the primary key values in dimensional table, the data should be loaded into Fact table. 2. What is the default cache size? How do you change the cache size if needed? A. Default cache size is 256 MB. We can increase it by going into Datastage Administrator and selecting the Tunable Tab and specify the cache size over there. 3. What are types of Hashed File? A) Hashed File is classified broadly into 2 types. a) Static - Sub divided into 17 types based on Primary Key Pattern. b) Dynamic - sub divided into 2 types i) Generic ii) Specific. Dynamic files do not perform as well as a well, designed static file, but do perform better than a badly designed one. When creating a dynamic file you can specify the following Although all of these have default values) By Default Hashed file is "Dynamic - Type Random 30 D" 4. What does a Config File in parallel extender consist of? A) Config file consists of the following. a) Number of Processes or Nodes. b) Actual Disk Storage Location. 5. What is Modulus and Splitting in Dynamic Hashed File? A. In a Hashed File, the size of the file keeps changing randomly. If the size of the file increases it is called as "Modulus". If the size of the file decreases it is called as "Splitting". 6. What are Stage Variables, Derivations and Constants? A. Stage Variable - An intermediate processing variable that retains value during read and doesn’t pass the value into target column.

Derivation - Expression that specifies value to be passed on to the target column. Constant - Conditions that are either true or false that specifies flow of data with a link. 7. Types of views in Datastage Director? There are 3 types of views in Datastage Director a) Job View - Dates of Jobs Compiled. b) Log View - Status of Job last run c) Status View - Warning Messages, Event Messages, Program Generated Messages. 8. Types of Parallel Processing? A) Parallel Processing is broadly classified into 2 types. a) SMP - Symmetrical Multi Processing. b) MPP - Massive Parallel Processing. 9. Orchestrate Vs Datastage Parallel Extender? A) Orchestrate itself is an ETL tool with extensive parallel processing capabilities and running on UNIX platform. Datastage used Orchestrate with Datastage XE (Beta version of 6.0) to incorporate the parallel processing capabilities. Now Datastage has purchased Orchestrate and integrated it with Datastage XE and released a new version Datastage 6.0 i.e Parallel Extender. 10. Importance of Surrogate Key in Data warehousing? A) Surrogate Key is a Primary Key for a Dimension table. Most importance of using it is it is independent of underlying database. i.e. Surrogate Key is not affected by the changes going on with a database. 11. How to run a Shell Script within the scope of a Data stage job? A) By using "ExcecSH" command at Before/After job properties. 12. How to handle Date conversions in Datastage? Convert a mm/dd/yyyy format to yyyy-dd-mm?

merges it into a single data flow and loads to target. While using Hash partition we specify the Partition Key.Internal Conversion. Function to convert mm/dd/yyyy format to yyyy-dd-mm is Oconv(Iconv(Filedname. a) Star Schema .A) We use a) "Iconv" function .4]").4]") 13 How do you execute datastage job from command line prompt? A) Using "dsjob" command as follows. 15. 17. Partition Key is a just a part of Primary Key. 16. and Random etc.2. Functionality of Link Partitioner and Link Collector? Link Partitioner: It actually splits data into various partitions or data flows using various partition methods.Complex with more Granularity.2. Denormalized form. Differentiate Primary Key and Partition Key? ☻Page 4 of 210☻ Primary Key is a combination of unique and not null. Differentiate Database data and Data warehouse data? A) Data in a Database is a) Detailed or Transactional . Types of Dimensional Modeling? A) Dimensional modeling is again sub divided into 2 types. It can be a collection of key values called as composite primary key.External Conversion."D-MDY[2. There are several methods of partition like Hash. b) Snowflake Schema . DB2.Simple & Much Faster. b) "Oconv" function ."D/MDY[2. Link Collector: It collects the data coming from partitions. More normalized form. dsjob -run -jobstatus projectname jobname 14.

Compare and Contrast ODBC and Plug-In stages? ODBC: a) Poor Performance. 20. Dimension Modelling types along with their significance Data Modelling is Broadly classified into 2 types.Relatioships). Connectivity Ans: Ascential Products Ascential DataStage Ascential DataStage EE (3) Ascential DataStage EE MVS Ascential DataStage TX . 19. Containers Usage and Types? Container is a collection of stages used for the purpose of Reusability. b) Can be used for Variety of Databases. Q 21 What are Ascential Dastastage Products. There are 2 types of Containers. 18. a) Local Container: Job Specific b) Shared Container: Used in any job within a project.b) Both Readable and Writable. b) Dimensional Modelling. c) Current. c) Can handle Stored Procedures. (Only one database) c) Cannot handle Stored Procedures. a) E-R Diagrams (Entity . Plug-In: a) Good Performance. b) Database specific.

Server Component. Data Stage Manager Data Stage Designer Data Stage Director Server Components: Data Stage Engine .Ascential QualityStage Ascential MetaStage Ascential RTI (2) Ascential ProfileStage Ascential AuditStage Ascential Commerce Manager Industry Solutions Connectivity Files RDBMS Real-time PACKs EDI Other Q 22 Explain Data Stage Architecture? Data Stage contains two components. Client Component. Client Component: Data Stage Administrator.

dsx format. We can compile the job. We can import table definitions. functions. quarterly) . Data Stage Designer: We can create the jobs. we can call routines. Contains set of properties We can set the buffer size (by default 128 MB) We can increase the buffer size. weekly. In tunable we have in process and inter-process In-process—Data read in sequentially Inter-process— It reads the data as it comes.xml or . We can create routines and transforms We can compile the multiple jobs. (Schedule can be done daily. We can declare stage variable in transform. We can export the Data stage components in . Data Stage Director: We can run the jobs. We can run the job. We can write constraints. macros. Data Stage Manager: We can view and edit the Meta data Repository. monthly. We can set the Environment Variables. We can schedule the jobs.Meta Data Repository Package Installer Data Stage Administrator: Used to create the project. It just interfaces to metadata. transform.

We can store up to 2 billion record. Fact Table: It contains foreign keys to the dimension tables. intuitive and allows for high performance access. Dimension Table: It contains a primary key and description about the fact table. Q 23 What is Meta Data Repository? Meta Data is a data about the data.We can monitor the jobs. It contains centralized fact tables surrounded by dimensions table. measures and aggregates. Q 24 What is Data Stage Engine? It is a JAVA engine running at the background. Q 27 What is surrogate Key? It is a 4-byte integer which replaces the transaction / business / OLTP key in the dimension table. Q 26 What is Star Schema? Star Schema is a de-normalized multi-dimensional model. We can release the jobs. It also contains Query statistics ETL statistics Business subject area Source Information Target Information Source to Target mapping Information. Q 28 Why we need surrogate key? . Q 25 What is Dimensional Modeling? Dimensional Modeling is a logical design technique that seeks to present the data in a standard framework that is.

the granularity that is defined in the dimension table is common across between the fact tables. Q 31 Explain the Types of Dimension Tables? Conformed Dimension: If a dimension table is connected to more than one fact table. joins. Semi-Additive: Measures can be added across some dimensions. Junk Dimension: The Dimension table. Eg. Q 30 Explain Types of Fact Tables? Factless Fact: It contains only foreign keys to the dimension tables. Monster Dimension: If rapidly changes in Dimension are known as Monster Dimension.It is used for integrating the data may help better for primary key. Additive Fact: Measures can be added across any dimensions. (Because memory is allocated at the run time). Index maintenance. De-generative Dimension: It is line item-oriented fact table design. which contains only flags. Q 32 What are stage variables? Stage variables are declaratives in Transformer Stage used to store values. Q 33 What is sequencer? . table size. key updates. Stage variables are active at the run time. % age. Average ☻Page 8 of 210☻ Conformed Fact: The equation or the measures of the two fact tables are the same under the facts are measured across the dimensions with a same set of measures. Q 29 What is Snowflake schema? It is partially normalized dimensional model in which at two represents least one dimension or more hierarchy related tables. disconnected inserts and partitioning. discount Non-Additive: Measures cannot be added across any dimensions. Eg.

Universe.ME token as the JobHandle and can be used in all active stages and before/after subroutines. A number of macros are provided in the JOBCONTROL. DSGetJobInfo. Eg. aggregating data and converting data from one data type to another. Q 36 What are Macros? They are built from Data Stage functions and do not require arguments. The available macros are: DSHostName DSProjectName DSJobStatus DSJobName . Eg. DRS stage etc. and DSGetLinkInfo functions with the DSJ. IPC stage. The macros provide the functionality for all the possible InfoType arguments for the DSGet…Info functions. File types. aggregator. DSGetStageInfo. See the Function call help topics for more details.It sets the sequence of execution of server jobs. Row Merger etc. and before/after subroutines. Q 34 What are Active and Passive stages? Active Stage: Active stage model the flow of data and provide mechanisms for combining data streams. filenames and table names. Transformer. These can be used in expressions (for example for use in Transformer stages). sort. These macros provide the functionality of using the DSGetProjectInfo. Passive Stage: A Passive stage handles access to Database for the extraction or writing of data. job control routines. and links and stages belonging to the current job. Q 35 What is ODS? Operational Data Store is a staging area where data can be rolled back. Unidata.H file to facilitate getting information about the current job.

☻Page 9 of 210☻ DSJobController DSJobStartDate DSJobStartTime DSJobStartTimestamp DSJobWaveNo DSJobInvocations DSJobInvocationId DSStageName DSStageLastErr DSStageType DSStageInRowNum DSStageVarList DSLinkRowCount DSLinkLastErr DSLinkName 1) Examples 2) To obtain the name of the current job: 3) MyName = DSJobName To obtain the full current stage name: MyName = DSJobName : ″.″ : DSStageName Q 37 What is keyMgtGetNextValue? It is a Built-in transform it generates Sequential numbers. Q 38 What are stages? . Its input type is literal string & output type is string.

most have 1 or more.The stages are either passive or active stages. You can also use shared containers as a way of incorporating server job functionality into parallel jobs. argument) DataStage BASIC functions: These functions can be used in a job control routine. There are two types of shared container Q 41 What is function? ( Job Control – Examples of Transform Functions ) Functions take arguments and return a value. and return a value. BASIC functions: A function performs mathematical or string manipulations on the arguments supplied to it. Q 40 What is container? A container is a group of stages and links. Containers enable you to simplify and modularize your server job designs by replacing complex areas of the diagram with a single container stage. Q 39 What index is created on Data Warehouse? Bitmap index is created in Data Warehouse. which is defined as part of a job’s properties and allows other jobs to be . aggregating data. These are created separately and are stored in the Repository in the same way that jobs are. separated by commas. Passive stages handle access to databases for extracting or writing data. and converting data from one data type to another. These are created within a job and are only accessible by that job. A local container is edited in a tabbed page of the job’s Diagram window. as shown in this general syntax: FunctionName (argument. Some functions have 0 arguments. Arguments are always in parentheses. DataStage provides two types of container: Local containers. Active stages model the flow of data and provide mechanisms for combining data streams. Shared containers.

.and after-stage subroutines. Some of the functions can also be used for getting status information on the current job. Specify the job you want to control DSAttachJob Set parameters for the job you want to control DSSetParam Set limits for the job you want to control DSSetJobLimit Request that a job is run DSRunJob Wait for a called job to finish DSWaitForJob Gets the meta data details for the specified link DSGetLinkMetaData Get information about the current project DSGetProjectInfo Get buffer size and timeout value for an IPC or Web Service stage DSGetIPCStageProps Get information about the controlled job or current job DSGetJobInfo Get information about the meta bag properties associated with the named job DSGetJobMetaBag Get information about a stage in the controlled job or current job DSGetStageInfo Get the names of the links attached to the specified stage DSGetStageLinks Get a list of stages of a particular type in a job.. DSGetStagesOfType Get information about the types of stage in a job.. these are useful in active stage expressions and before.run and controlled from the first job.. DSGetStageTypes Get information about a link in a controlled job or current job DSGetLinkInfo . Use this function . To do this .

DSPrepareJob Interface to system send mail facility. DSMakeJobReport Insert arguments into the message template. DSTranslateCode . DSLogToController Log a warning message in a job's log file. DSLogFatal Log an information message in a job's log file. of a specified type. DSLogInfo Put an info message in the job log of a job controlling current job. DSSendMail Log a warning message to a job log file.Get information about a controlled job’s parameters DSGetParamInfo Get the log event from the job log DSGetLogEntry Get a number of log events on the specified subject from the job log DSGetLogSummary Get the newest log event. DSLogWarn Generate a string describing the complete status of a valid attached job. DSMakeMsg Ensure a job is in the correct state to be run or validated. DSTransformError Convert a job control status or error code into an explanatory text message. from the job log DSGetNewestLogId Log an event to the job log of a different job DSLogEvent Stop a controlled job DSStopJob Return a job handle previously obtained from DSAttachJob DSDetachJob Log a fatal error message in a job's log file and aborts the job.

DSExecute Set a status message for a job to return as a termination message when it finishes DSSetUserStatus Q 42 What is Routines? Routines are stored in the Routines branch of the Data Stage Repository. either in VOC as a callable item.Suspend a job until a named file either exists or does not exist. The following programming components are classified as routines: Transform functions. DSCheckRoutine Execute a DOS or Data Stage Engine command from a before/after subroutine. Custom UniVerse functions. ActiveX (OLE) functions. view or edit. where you can create. otherwise) Q 49 Are u generated job Reports? S Q 50 What is plug-in? Q 51 Have u created any custom transform? Explain? (Oconv) . Before/After subroutines. Web Service routines Q 43 What is data stage Transform? Q 44 What is Meta Brokers? Q 45 What is usage analysis? Q 46 What is job sequencer? ☻Page 12 of 210☻ Q 47 What are different activities in job sequencer? Q 48 What are triggers in data Stages? (conditional. unconditional. DSWaitForFile Checks if a BASIC routine is cataloged. or in the catalog space.

Question: Differentiate Database data and Data warehouse data? Answer: Data in a Database is A) Detailed or Transactional B) Both Readable and Writable. A) E-R Diagrams (Entity . i.Relatioships).Simple & Much Faster. Question: Dimensional modelling is again sub divided into 2 types.e. Surrogate Key is not affected by the changes going on with a database. Question: Importance of Surrogate Key in Data warehousing? Answer: Surrogate Key is a Primary Key for a Dimension table. B) Dimensional Modelling. More normalized form.Question: Dimension Modeling types along with their significance Answer: Data Modelling is broadly classified into 2 types.Complex with more Granularity. C) Current. it is independent of underlying database. Denormalized form. B) Snowflake Schema . Most importance of using it is. Answer: A) Star Schema . Question: What is the flow of loading data into fact & dimensional tables? .

Question: What are Stage Variables. Dimension table .Table with Collection of Foreign Keys corresponding to the Primary Keys in Dimensional table. It can be a collection of key values called as composite primary key.Data should be first loaded into dimensional table.Table with Unique Primary Key. Now Datastage has purchased Orchestrate and integrated it with Datastage XE and released a new version Datastage 6. Question: Differentiate Primary Key and Partition Key? Answer: Primary Key is a combination of unique and not null. Partition Key is a just a part of Primary Key. Load . Question: Orchestrate Vs Datastage Parallel Extender? Answer: Orchestrate itself is an ETL tool with extensive parallel processing capabilities and running on UNIX platform. Random etc. There are several methods of partition like Hash.An intermediate processing variable that retains value during read and doesn’t pass the value into target column. Derivations and Constants? Answer: Stage Variable . Based on the primary key values in dimensional table. Parallel Extender.While using Hash partition we specify the Partition Key.. DB2..0 i.e. . then data should be loaded into Fact table.0) to incorporate the parallel processing capabilities.Answer: Fact table . Consists of fields with numeric values. Datastage used Orchestrate with Datastage XE (Beta version of 6.

Question: What is the default cache size? How do you change the cache size if needed? Answer: Default cache size is 256 MB. Question: What are types of Hashed File? Answer: Hashed File is classified broadly into 2 types. B) Dynamic . Question: What is Hash file stage and what is it used for? Answer: Used for Look-ups. OCI tables for better performance.Constraint .Sub divided into 17 types based on Primary Key Pattern.Type Random 30 D" Question: What are Static Hash files and Dynamic Hash files? Answer: As the names itself suggest what they mean. In general we use Type-30 dynamic Hash files.sub divided into 2 types i) Generic ii) Specific Default Hased file is "Dynamic . Derivation . It is like a reference table. It is also used in-place of ODBC. We can increase it by going into Datastage Administrator and selecting the Tunable Tab and specify the cache size over there. A) Static . The Data file has a default size of 2GB and the overflow file is used if the data .Expression that specifies value to be passed on to the target column.Conditions that are either true or false that specifies flow of data with a link.

exe . A) Local Container: Job Specific B) Shared Container: Used in any job within a project. Question: Compare and Contrast ODBC and Plug-In stages? Answer: ODBC PLUG-IN Poor Performance Good Performance Can be used for Variety of Databases Database Specific (only one database) Can handle Stored Procedures Cannot handle Stored Procedures Question: How do you execute datastage job from command line prompt? Answer: Using "dsjob" command as follows. There are 2 types of Containers. Question: What is the Usage of Containers? What are its types? Answer: Container is a collection of stages used for the purpose of Reusability. dsexport.exports the DataStage components.exe .exceeds the 2GB size. Question: How to run a Shell Script within the scope of a Data stage job? .imports the DataStage components. dsjob -run -jobstatus projectname jobname Question: What are the command line functions that import and export the DS jobs? Answer: dsimport.

Question: What are OConv () and Iconv () functions and where are they used? Answer: IConv() .Massive Parallel Processing.Converts an expression to an output format."D-MDY[2.Internal Convertion. Question: What does a Config File in parallel extender consist of? Answer: Config file consists of the following.2."D/MDY[2.2.External Convertion. Question: How to handle Date convertions in Datastage? Convert mm/dd/yyyy format to yyyy-dd-mm? Answer: We use a) "Iconv" function .Answer: By using "ExcecSH" command at Before/After job properties. b) "Oconv" function .4]") Question: Types of Parallel Processing? Answer: Parallel Processing is broadly classified into 2 types.Converts a string to an internal storage format OConv() . .Symmetrical Multi Processing. Function to convert mm/dd/yyyy format to yyyy-dd-mm is Oconv(Iconv(Filedname. a) Number of Processes or Nodes. b) MPP .4]"). a) SMP .

If the size of the file increases it is called as "Modulus". If the size of the file decreases it is called as "Splitting". Program Generated Messages. The often Parameterized variables in a job are: DB DSN name.Warning Messages. the size of the file keeps changing randomly.Status of Job last Run c) Status View . b) Log View . Question: Types of views in Datastage Director? Answer: There are 3 types of views in Datastage Director a) Job View . Question: Did you Parameterize the job or hard-coded the values in the jobs? Answer: Always parameterized the job. There is no way you will hard–code some parameters in your jobs. Question: What is Modulus and Splitting in Dynamic Hashed File? Answer: In a Hashed File. Event Messages. Either the values are coming from Job Properties or from a ‘Parameter Manager’ – a third part tool. . Link Collector: It collects the data coming from partitions. Question: Functionality of Link Partitioner and Link Collector? Answer: Link Partitioner: It actually splits data into various partitions or data flows using various Partition methods.b) Actual Disk Storage Location. merges it into a single data flow and loads to target.Dates of Jobs Compiled.

0 server before the upgrade. So Reject link has to be defined every Output link you wish to collect rejected data. password. version 7.R. Do not stop the 6. Make sure that all your DB DSN's are created with the same name as old ones. dates W.T for the data to be looked against at. There is NO rework (recompilation of existing jobs/routines) needed after the upgrade. The following are some of the steps: Definitely take a back up of the whole project(s) by exporting the project as a .username. In case if you are just upgrading your DB from Oracle 8i to Oracle 9i there is tool on DS CD that can do this for you.0 install process collects project information during the upgrade. This step is for moving DS from one machine to another. After installing the new version import the old project(s) and you have to compile them all again.dsx file See that you are using the same parent folder for the new version also for your old jobs using the hard-coded file path to work. Rejected data is typically bad data like duplicates of Primary keys or nullrows .X. if so tell us some the steps you have taken in doing so? Answer: Yes. Question: Have you ever involved in updating the DS versions like DS 5. Question: How did you handle reject data? Answer: Typically a Reject-link is defined and the rejected data is loaded back into data warehouse. You can use 'Compile All' tool for this.

Tuned the 'Project Tunables' in Administrator for better performance. Before writing a routine or a transform. make sure that there is not the functionality required in one of the standard routines supplied in the sdk or ds utilities categories.where data is expected. Tuned the OCI stage for 'Array Size' and 'Rows per Transaction' numerical values for faster inserts. . updates and selects. Used sorted data for Aggregator. Converted some of the complex joins/business in DS to Stored Procedures on DS for faster execution of the jobs. Removed the data not used from the source as early as possible in the job. Worked with DB-admin to create appropriate Indexes on tables for better performance of DS queries. Sorted the data as much as possible in DB and reduced the use of DS-Sort for better performance of jobs. ☻Page 17 of 210☻ Question: What are other Performance tunings you have done in your last project to increase the performance of slowly running jobs? Answer: Staged the data coming from ODBC/OCI/DB2UDB stages or any database on the server using Hash/Sequential files for optimum performance also for data recovery in case job aborts. If an input file has an excessive number of rows and can be split-up then use standard logic to run jobs in parallel.

Make every attempt to use the bulk loader for your particular database. The jobs in which data is read directly from OCI stages are running extremely slow. Using a constraint to filter a record set is much slower than performing a SELECT … WHERE…. Have an option either cleaning/deleting the loaded data and then run the fixed job or run the job again from the row the job has aborted. Try not to use a sort stage when you can use an ORDER BY clause in the database. Question: Tell me one situation from your last project. The job aborts in the middle of loading some 500. Try to have the constraints in the 'Selection' criteria of the jobs itself. This will eliminate the unnecessary records even getting in before joins are made. Question: Tell me the environment in your last projects Answer: Give the OS of the Server and the OS of the Client of your recent most project . Tuning should occur on a job-by-job basis. This may be the case if the constraint calls routines or external macros but if it is inline code then the overhead will be minimal. I had to stage the data before sending to the transformer to make the jobs run faster.Constraints are generally CPU intensive and take a significant amount of time to process. Use the power of DBMS.000 rows. 2. Bulk loaders are generally faster than using ODBC or OLE. To make sure the load is proper we opted the former. where you had faced problem and How did u solve it? Answer: 1.

Transform Functions 2. Job Control Routines Question: How did you handle an 'Aborted' sequencer? Answer: In almost all cases we have to delete the data inserted by this from DB manually and fix the job and then run the job again.ODBC drivers to connect to AS400/DB2. The data is dumped and sent to us. 'iSeries Access ODBC Driver 9. Question: What are Sequencers? .00. view or edit. Before-After Job subroutines 3. In some cases were we need to connect to DB2 for look-ups as an instance then we used ODBC drivers to connect to DB2 (or) DB2-UDB depending the situation and availability. where you can create.02' . Question: What are Routines and where/how are they written and have you written any routines before? Answer: Routines are stored in the Routines branch of the DataStage Repository. The following are different types of Routines: 1.☻Page 18 of 210☻ Question: How did u connect with DB2 in your last project? Answer: Most of the times the data was sent to us in the form of flat files. Certainly DB2-UDB is better in terms of performance as you know the native drivers are always better than ODBC drivers.02.

Under UNIX: Poll for the file. ] length ] string [ delimiter. repeats ] Question: What will you in a situation where somebody wants to send you a file and use that file as an input or reference and then run job. instance. May be you can schedule the sequencer around the time the file is expected to arrive. Question: Read the String functions in DS Answer: Functions like [] -> sub-string function and ':' -> concatenation operator Syntax: string [ [ start. Answer: Under Windows: Use the 'WaitForFileActivity' under the Sequencers and then run the job. Once the file has start the job or sequencer depending on the file.Answer: Sequencers are job control programs that execute other jobs with preset Job parameters. ☻Page 19 of 210☻ Question: Did you work in UNIX environment? Answer: . Question: What is the utility you use to schedule the jobs on a UNIX server other than using Ascential Director? Answer: Use crontab utility along with dsexecute() function along with proper parameters passed.

One of the most important requirements.0 we have the ability to call external Java functions using a Java package from Ascential. Question: How would call an external Java function which are not supported by DataStage? Answer: Starting from DS 6. Question: How will you determine the sequence of jobs to load into data warehouse? Answer: First we execute the jobs that load the data into Dimension tables. you do have both Clear and Truncate options. Answer: There is no TRUNCATE on ODBC stages. On an OCI stage such as Oracle. It is Clear table blah blah and that is a delete from statement. They are radically different in permissions (Truncate requires you to have alter . Question: Does the selection of 'Clear the table and Insert rows' in the ODBC stage send a Truncate statement to the DB or does it do some kind of Delete logic.Yes. then Fact tables. In this case we can even use the command line to invoke the Java function and write the return values from the Java program (if any) and use that files as a source in DataStage job. Question: The above might raise another question: Why do we have to load the dimensional tables first. then fact tables: Answer: As we load the dimensional tables the keys (primary) are generated and these keys (primary) are Foreign keys in Fact tables. then load the Aggregator tables (if any).

DS 5. Export the whole project as a dsx.0. which can do a simple rename of the strings looking up the Excel file. So you have to make the necessary changes to these Sequencers. trickle fed constantly ☻Page 20 of 210☻ Question: What other ETL's you have worked with? Answer: Informatica and also DataJunction if it is present in your Resume. Question: When should we use ODS? Answer: DWH's are typically read only. Question: How good are you with your PL/SQL? Answer: On the scale of 1-10 say 8. DS 6. Then import the new dsx file probably into a new project for testing.5-9 Question: What versions of DS you worked with? Answer: DS 7. Be cautious that the name of the jobs has also been changed in your job control jobs or Sequencer jobs. Recompile all jobs.2 .2. Question: How do you rename all of the jobs to support your new File-naming conventions? Answer: Create an Excel spreadsheet with new and old names. batch updated on a schedule ODS's are maintained in more real time.5. Write a Perl program.table permissions where Delete doesn't).0. DS 7.

? Answer: Datastage developer is one how will code the jobs.. RUN!! Are they doing a match/merge routine that knows how to process this in sequential files? Then maybe they are the right one. then either would probably be OK. I mean he will deal with blue prints and he will design the jobs the stages that are required in developing the code Question: What are the requirements for your ETL tool? Answer: Do you have large sequential files (1 million rows.Question: What's the difference between Datastage Developers. It all depends on what you need the ETL to do. what are the requirements for your ETL tool? Do you have large sequential files (1 million rows. they have pretty much similar functionality. .. then ask how each vendor would do that. then ask how each vendor would do that. for example) that need to be compared every day versus yesterday? If so. Are they requiring you to load yesterday’s file into a table and do lookups? If so. for example) that need to be compared every day versus yesterday? If so. Think about what process they are going to do. Datastage designer is how will design the job. Are they requiring you to load yesterday’s file into a table and do lookups? If so. If you are small enough in your data sets. However. Question: What are the main differences between Ascential DataStage and Informatica PowerCenter? Answer: Chuck Kelley’s Answer: You are right. Think about what process they are going to do.

. Ask both vendors for a list of their customers with characteristics similar to your own that have used their ETL product for at least a year. here are some differences you may want to explore with each vendor: Does the tool use a relational or a proprietary database to store its Meta data and scripts? If proprietary. source systems. It all depends on what you need the ETL to do. Les Barbusinski’s Answer: Without getting into specifics. industry. why? ☻Page 21 of 210☻ What add-ons are available for extracting data from industry-standard ERP. and how much external scripting is required? What kinds of languages are supported for ETL script extensions? Almost any ETL tool will look like any other on the surface. The best way I’ve found to make this determination is to ascertain how successful each vendor’s clients have been using their product. inhouse skill sets. how and with which ones? How well does each tool handle complex transformations. and CRM packages? Can the tool’s Meta data be integrated with third-party data modeling and/or business intelligence tools? If so. then either would probably be OK. The trick is to find out which one will work best in your environment. data volumes and transformation complexity. benefits.RUN!! Are they doing a match/merge routine that knows how to process this in sequential files? Then maybe they are the right one. Accounting. If you are small enough in your data sets. Then interview each client (preferably several people at each site) with an eye toward identifying unexpected problems. platforms. Especially clients who closely resemble your shop in terms of size.

If you are unfamiliar with the many products available. It is also not a good idea to depend upon a high-level manager at the reference site for a reliable opinion of the product. Joyce Bischoff’s Answer: You should do a careful research job when selecting products. Date Transformation . Question: How many places u can call Routines? Answer: Four Places u can call 1. for product lists. You will not want the vendor to have a representative present when you speak with someone at the reference site.or quirkiness with the tool that have been encountered by that customer. call their references and be sure to talk with technical people who are actually using the product. Ask both vendors and compare the answers. Managers may paint a very rosy picture of any selected product so that they do not look like they selected an inferior product. After you are very familiar with the products. Transform of routine a. If you ask the vendors. which may or may not be totally accurate.tdan. the Data Administration Newsletter. identify all possible products and evaluate each product against the detailed requirements. There are numerous ETL products on the market and it seems that you are looking at only two of them. You should first document your requirements. they will certainly be able to tell you which of their product’s features are stronger than the other product. you may refer to www.com. Ultimately. ask each customer – if they had it all to do over again – whether or not they’d choose the same tool and why? You might be surprised at some of the answers.

Batch program are generate depends your job nature either simple job or sequencer job. in this condition should go director and check it what type of problem showing either data type problem.Commit .after run the job only 5000 data has been loaded in target table remaining are not loaded and your job going to be aborted then..Commit. job 3. XML transformation ☻Page 22 of 210☻ 4. job 2. warning massage. Transform of the Before & After Subroutines 3. job 4 ) if job 1 have 10. you can see this program on job control option. Loading) . Question: Suppose that 4 job control by the sequencer like (job 1.000 row . Web base transformation Question: What is the Batch Program and how can generate? Answer: Batch program is the program it's generate run time to maintain by the Datastage itself but u can easy to change own the basis of your requirement (Extraction. Continue. Transformation.So u should go Run window ->Click-> Tracing->Performance or In your target table ->general -> action-> select this option here two option (i) On Fail -. job fail or job aborted. How can short out the problem? Answer: Suppose job sequencer synchronies or control 4 job but job 1 have problem.b. If job fail means data type problem or missing column action . Continue (ii) On Skip -. Upstring Transformation 2. First u check how much data already load after then select on skip option then .

May be you can schedule the sequencer around the time the file is expected to arrive. Under UNIX: Poll for the file. Recompile all jobs. Export the whole project as a dsx.. Under Windows: Use the 'WaitForFileActivity' under the Sequencers and then run the job. which can do a simple rename of the strings looking up the Excel file. Answer: A. So you have to make the necessary changes to these Sequencers.. Be cautious that the name of the jobs has also been changed in your job control jobs or Sequencer jobs. Question: What will you in a situation where somebody wants to send you a file and use that file as an input or reference and then run job.... Write a Perl program. Again Run the job defiantly u gets successful massage Question: What happens if RCP is disable? Answer: In such case OSH has to perform Import and export every time when the job runs and the processing time job is also increased. Then import the new dsx file probably into a new project for testing. Once the file has start the job or sequencer depending on the file ☻Page 23 of 210☻ Question: What are Sequencers? . Question: How do you rename all of the jobs to support your new File-naming conventions? Answer: Create a Excel spreadsheet with new and old names.continue and what remaining position data not loaded then select On Fail ... Continue . B.

Primary key foreign key constraints. 2) The Switch stage is limited to 128 output links. you can disable all the constraints on the tables and load them. How can I go about it? Ans:1) Create a Job Sequencer to load you tables in Sequential mode In the sequencer Call all Primary Key tables loading Jobs first and followed by Foreign key tables. Which does . check for the integrity of the data. the Filter stage can have a theoretically unlimited number of output links. I want my primary key tables to be loaded first and then my foreign key tables and also primary key tables should be committed before the foreign key tables are executed.Answer: Sequencers are job control programs that execute other jobs with preset Job parameters. (Note: this is not a challenge!) Question: How can i achieve constraint based loading using datastage7. Question: How did you handle an 'Aborted' sequencer? Answer: In almost all cases we have to delete the data inserted by this from DB manually and fix the job and then run the job again. The Switch stage can not .My target tables have inter dependencies i. when triggering the Foreign tables load Job trigger them only when Primary Key load Jobs run Successfully ( i. Once loading done. Question34: What is the difference between the Filter stage and the Switch stage? Ans: There are two main differences. The two main differences are as follows. 1) The Filter stage can send one input row to more than one output link. and probably some minor ones as well.the C switch construct has an implicit break in every case.e.5. OK trigger) 2) To improve the performance of the Job.e.

3) If you use Star schema modeling. Question: How do you pass filename as the parameter for a job? Ans: While job development we can create a parameter 'FILE_NAME' and the value can be passed while Question: How did you handle an 'Aborted' sequencer? Ans: In almost all cases we have to delete the data inserted by this from DB manually and fix the job and then run the job again. This only a suggestion. At the same time RI is being maintained at ETL process level. you can delete all constraints and the referential integrity would be maintained in the ETL process by referring all your dimension keys while loading fact tables. normally when loading on constraints are up. when you create physical DB from the model. Question: Is there a mechanism available to export/import individual DataStage ETL jobs from the UNIX command line? . Question: How do you merge two files in DS? ☻Page 24 of 210☻ Ans: Either use Copy command as a Before-job subroutine if the metadata of the 2 files are same or create a job to concatenate the 2 files into one. Using that stage we can eliminate the duplicates based on a key column. Once all dimensional keys are assigned to a fact then dimension and fact can be loaded together.not meet raise exceptional data and cleanse them. Question: How do you eliminate duplicate rows? Ans: Data Stage provides us with a stage Remove Duplicates in Enterprise edition. if the metadata is different. will drastically performance will go down.

You can only export full projects from the command line. and compare them. It helps to understand the new and already existing clients. You can find the export and import executables on the client machine usually someplace like: C:\Program Files\Ascential\DataStage. It is able to integrate data coming from all parts of the company. Won't handle the "individual job" requirement.Ans: Try dscmdexport and dscmdimport.Merge key columns are one or more columns that exist in both the master and update records. ☻Page 25 of 210☻ . The columns from the records in the master and update data set s are merged so that the out put record contains all the columns from the master record plus any additional columns from each update record that required. MERGE: Combines a sorted master data set with one or more sorted updated data sets. Question: Advantages of the DataStage? Answer: Business advantages: Helps for better business decisions. between JOIN stage and MERGE stage. Answer: JOIN: Performs join operations on two or more data sets input to the stage and then outputs the resulting dataset. We can collect data of different clients with him. A master record and an update record are merged only if both of them have the same values for the merge key column(s) that we specify . Question: Diff.

It offers the possibility for the organization of a complex business intelligence. It accelerates the running of the project. Technological advantages: It handles all company data and adapts to the needs.It makes the research of new business possibilities possible. deleting the project & setting the environment variables. Flexibly and scalable. Data stage administrator is used for creating the project. Easily implementable. Data stage director 4. Data stage manager Data stage designer is user for to design the jobs Data stage manager is used for to import & export the project to view & edit the contents of the repository. Server components . validate the jobs. Data stage designer 2. scheduling the jobs. Data stage administrator 3. 1. Client components & server components Client components are 4 types they are 1. We can analyze trends of the data read by him. What is the architecture of data stage? Basically architecture of DS is client/server architecture. Data stage director is use for to run the jobs.

In job properties set the option ALLOW MULTIPLE INSTANCES. Repository or project: a central store that contains all the information required to build DWH or data mart. other user can wait until the first user complete the operation. CVSS. CVSS cost is high.visual source safe 2. DS Package installer: A user interface used to install packaged DS jobs and plug-in. 2. that extract. What is version controlling in DS? In DS. VSS is designed by Microsoft but the disadvantage is only one user can access at a time. What r the stages u worked on? ☻Page 26 of 210☻ 3.1 version onwards. and load data into a DWH.DS server: runs executable server jobs. Version controls r of 2 types. by using this many users can access concurrently. I want to run the multiple jobs in the single job.concurrent visual source safe. 5. What is the difference between clear log file and clear status file? . 4. VSS. How can u handle. This option is available in DS 7. under the control of the DS director. transform. 1. I have some jobs every month automatically delete the log details what r the steps u have to take for that We have to set the option autopurge in DS Adminstrator. 6. CVSS. When compared to VSS. version controlling is used for back up the project or jobs.

My job takes 30 minutes time to run.(in DS Director) 7. 10.lets the user remove the status of the record associated with all stages of selected jobs. Clear status file---. I developed 1 job with 50 stages. Under job menu clear log option is available. we can reduce time. Pivot is an active stage that maps sets of columns in an input table to a single column in an output table. how can you unlock the particular job in DS? We can unlock the job by using clean up resources option which is available in DS . which is available in DS manager.Clear log--. Tuning aspect In DS administrator : in-process and inter process In between passive stages : inter process stage ☻Page 27 of 210☻ OCI stage : Array size and transaction size And also use link partitioner & link collector stage in between passive stages 9. By using this option we can clear the log details of particular job.we can clear the log details by using the DS Director. How to do road transposition in DS? Pivot stage is used to transposition purpose. 8. we can find out the what r the items r used in job. at the run time one stage is missed how can u identify which stage is missing? By using usage analysis tool. If a job locked by some user. I want to run the job less than 30 minutes? What r the steps we have to take? By using performance tuning aspects which are available in DS.

DataStage provides two types of container: • Local containers. I am getting input value like X = Iconv(“31 DEC 1967”. What is the Unit testing. integration testing and system testing? Unit testing: As for Ds unit test will check the data type mismatching.Director. Other wise we can find PID (process id) and kill the process in UNIX server. .It takes 31 dec 1967 as zero and counts days from that date(31-dec-1967). Containers enable you to simplify and modularize your server job designs by replacing complex areas of the diagram with a single container stage. Iconv Function Converts a string to an internal storage format. Yes we can use container as look up. These are created separately and are stored in the Repository in the same way that jobs are. • Shared containers. These are created within a job and are only accessible by that job only. column mismatching. How to deconstruct the shared container? To deconstruct the shared container.”D”)? What is the X value? X value is Zero. What is a container? How many types containers are available? Is it possible to use container as look up? A container is a group of stages and links. 11. And then deconstruct the container. Size of the particular data type. Shared containers can use any job in the project. 14. 13. 12. first u have to convert the shared container to local container.

19. . That is called control sequence. Two hashing algorithms for dynamic hash file( GENERAL or SEQ. I have three jobs A. 15.exe ---. System testing: System testing is nothing but the performance tuning aspects in Ds.B.exe ---. What happens when you have a job that links two passive stages together? Obviously there is some process going on. 18.☻Page 28 of 210☻ Integration testing: According to dependency we will put all jobs are integrated in to one sequence.NUM) 17. which just passes data straight from one stage to the other.To import the DataStage components Dsexport. How can u do it? First you have to schedule A & C jobs Monday to Saturday in one sequence. What is the use use of Nested condition activity? Nested Condition. Under covers Ds inserts a cut-down transformer stage between the passive stages. Which are dependent on each other? I want to run A & C jobs daily and B job runs only on Sunday. Allows you to further branch the execution of a sequence depending on a condition. What are the command line functions that import and export the DS jobs? Dsimport. Next take three jobs according to dependency in one more sequence and schedule that job only Sunday.C . How many hashing algorithms are available for static hash file and dynamic hash file? Sixteen hashing algorithms for static hash file.To export the DataStage components 16.