This action might not be possible to undo. Are you sure you want to continue?
E-R model is Entity Relation model used in two dimensional Databases. For Example, SQL Server, or Oracle. A table is based on two dimensional Rows and Columns. Generally, OLTP systems are based on two dimensions. But, if you see in Dimensional modeling, we have more than two dimensions. A cube represents a three dimensional model in a data warehouse, the data are stored in the form of summary of information. Also, these data can be easily retrieved from a DB compared to a normal OLTP Database. Let us assume, PROD, GEOG, TIME and MEAS are the four dimensions we have. A DW System have stored information with these four dimensions. If you want to know the sales of Lux (Prod), in?North India (Geog), during (Oct 2006) for a measure value of Lux 75 grams (MEAS). ie., FACT_TBL(PROD LUX, GEOG NORTH_INDIA, TIME OCT06, MEAS Units) would give rise to some quantity say, 75809 Units. This means, in north india this many units have been sold during the given period. This you can very well access with a normal OLTP system. But the problem is when the size of the data grows, your system will not tolerate the load. Your query performance will die down. Not just this alone, for many other advantages, we need DWH instead of a normal OLTP system. What is the architecture of any Data warehousing project? What is the flow? 1) The basic step of data warehousing starts with datamodelling. i.e. creation of dimensions and facts. 2) data warehouse starts with collection of data from source systems such as OLTP,CRM,ERPs etc 3) Cleansing and transformation process is done with ETL(Extraction Transformation Loading)?tool. 4) by the end of ETL process target databases(dimensions,facts) are ready with data which accomplishes the business rules. 5) Now finally with the use of Reporting tools(OLAP) we can get the information which is used for decision support. Discuss the advantages & Disadvantages of star & snowflake schema? In a star schema every dimension will have a primary key. In a star schema, a dimension table will not have any parent table. Whereas in a snow flake schema, a dimension table will have one or more parent tables. Hierarchies for the dimensions are stored in the dimensional table itself in star schema. Whereas hierachies are broken into separate tables in snow flake schema. These hierachies helps to drill down the data from topmost hierachies to the lowermost hierarchies. Compare Data Warehousing Top-Down approach with Bottom-up approach? In top down approach: first we have to build data warehouse then we will build data marts. Which will need more cross functional skills and time taking process also costly.
in bottom up approach: first we will build data marts then data warehouse. The data mart that is first build will remain as a proof of concept for the others. Less time as compared to above and less cost. Definition of data marts? Data Mart is the subset of data warehouse. You can also consider data mart holds the data of one subject area. For an example, you consider an organization that has HR, Finance, Communications and Corporate Service divisions. For each division you can create a data mart. The historical data will be stored into data marts first and then exported to data warehouse finally. What is the difference between E-R modeling and Dimensional modeling? E-R modeling is the relation between entities in the form of normalization. Dimensional modeling is the relation between dimensions in the form of de normalization. Are OLAP databases also called decision support system??? True/false? True What is the difference between OLAP and datawatehouse? Data warehouse is the place where the data is stored for analyzing Where as OLAP is the process of analyzing the data, managing aggregations, Partitioning information into cubes for in depth visualization. What is the difference between Data warehousing and Business Intelligence? Data warehousing deals with all aspects of managing the development, implementation and operation of a data warehouse or data mart including meta data management, data acquisition, data cleansing, data transformation, storage management, data distribution, data archiving, operational reporting, analytical reporting, security management, backup/recovery planning, etc. Business intelligence, on the other hand, is a set of software tools that enable an organization to analyze measurable aspects of their business such as sales performance, profitability, operational efficiency, effectiveness of marketing campaigns, market penetration among certain customer groups, cost trends, anomalies and exceptions, etc. Typically, the term? Business intelligence? is used to encompass OLAP, data visualization, data mining and query/reporting tools. Think of the data warehouse as the back office and business intelligence as the entire business including the back office. The business needs the back office on which to function, but the back office without a business to support, makes no sense . Why Denormalization is promoted in Universe Designing? In a relational data model, for normalization purposes, some lookup tables are not merged as a single table. In a dimensional data modeling(star schema), these tables would be merged as a single table called DIMENSION table for performance and slicing data. Due to this merging of tables into one large Dimension table, it comes out of complex intermediate joins. Dimension tables are directly joined to Fact tables. Though, redundancy of data occurs in DIMENSION table, size of DIMENSION table is
15%onlywhen compared to FACT table. So only Denormalization is promoted in Universe Designing. What is fact less fact table? Where you have used it in your project? Fact less Fact Table contains nothing but dimensional keys. It is used to support negative analysis report. For example a Store that did not sell a product for a given period. What is snapshot? Snapshot is static data source; it is permanent local copy or picture of a report, it is suitable for disconnected networks. we can’t add any columns to sanpshot. we can sort, grouping and aggregations and it is mainly used for analyzing the historical data. what are non-additive facts in detail? A fact may be measure, metric or a dollar value. Measure and metric are non additive facts. Dollar value is additive fact. If we want to find out the amount for a particular place for a particular period of time, we can add the dollar amounts and come up with the total amount. A non additive fact, for eg measure height(s) for ‘citizens by geographical location’ , when we rollup ‘city’ data to ’state’ level data we should not add heights of the citizens rather we may want to use it to derive ‘count’ Data warehouse interview questions only What is source qualifier? Difference between DSS & OLTP? What is cube and why we are crating a cube what is diff between ETL and OLAP cubes? Any schema or Table or Report which gives you meaningful information Of One attribute wrt more than one attribute is called a cube. For Ex: In a product table with Product ID and Sales colomns, we can analyze Sales wrt to Prodcut Name, but if you analyze Sales wrt Product as well as Region( region being attribute in Location Table) the report or Resultant table or schema would be Cube. ETL Cubes: Built in the staging area to load frequently accessed reports to the target. Reporting Cubes: Built after the actual load of all the tables to the target depending on the customer requirement for his business analysis. What is surrogate key? Surrogate key is a substitution for the natural primary key. What are Aggregate tables? Aggregate table contains the Summary of existing warehouse data which is grouped to certain levels of dimensions. Retrieving the required data from the actual table, which have millions of records will take more time and also affects the server performance To avoid this we can aggregate the table to certain required level and can use it. This table reduces the load in the database server and increases the performance of the query and can retrieve the result very fastly.
How data in data warehouse stored after data has been extracted and transformed from heterogeneous sources and where does the data go from data warehouse? Data in Data warehouse stored in the form of relational tables, most of the data ware houses approach is snowflake schema. What is the difference between hierarchies and levels? Levels: Columns available in dimension table is levels Hierarchies - Process of representing levels in Top to Bottom OR Bottom to Top Approach. Ex: Regional, Country, State, City Year, Month, Day, Hours Multi level hierachies can be natural like Year, Month, and Day. But a hierarchy doesn’t have to be natural. You can create a hierarchy just For navigational or reporting purposes. Ex: Days to manufacture and Safety Stock level. There’s no relationship between the two attributes in this navigational hierarchy. In natural hierarchy is one in which you should define attribute relationship between levels. Levels are constructed from attributes. What is the difference between data warehouse and BI? DATAWAREHOUSE: Datawarehouse is integrated, time-variant, subject oriented and non-volatile collection data in support of management decision making process. BUSINESS INTELLIGENCE: Business Intelligence is the process of extracting the data, converting it into information and then into knowledge base is known as Business Intelligence. What are non-additive facts? # Additive: Additive facts are facts that can be summed up through all of the dimensions in the fact table. # Semi-Additive: Semi-additive facts are facts that can be summed up for some of the dimensions in the fact table, but not the others. # Non-Additive: Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table. What are the different architecture of datawarehouse? Oracle & Misc, DataWarehousing Basic Interview Qs No Comments » There are three types of architectures. Datewarehouse Basic Architecture : In this architecture end users access data that is derived from several sources through the datawarehouse. architecture: Source –> Warehouse?–> End Users Datawarehouse with staging area Architecture: Whenever the data that is derived from sources need to be cleaned and processed before putting it into warehouse then staging area is used. architecture: Source –> Staging Area –>Warehouse –> End Users Datawarehouse with staging area and data marts Architecture:
Customization of warehouse architecture for different groups in the organization then data marts are added and used. architecture: Source –> Staging Area –> Warehouse –> Data Marts –> End Users What are modeling tools available in the Market These tools are used for Data/dimension modeling Oracle Designer ERWin (Entity Relationship for windows) Informatica (Cubes/Dimensions) Embarcadero Power Designer Sybase What is the main differnce between schema in RDBMS and schemas in DataWarehouse….? RDBMS Schema * Used for OLTP systems * Traditional and old schema * Normalized * Difficult to understand and navigate * Cannot solve extract and complex problems * Poorly modelled DWH Schema * Used for OLAP systems * New generation schema * De Normalized * Easy to understand and navigate * Extract and complex problems can be easily solved * Very good model What is ODS? ODS stands for Online Data Storage. What is a general purpose scheduling tool? The basic purpose of the scheduling tool in a DW Application is to stream line the flow of data from Source To Target at specific time or based on some condition. what is the need of surrogate key;why primary key not used as surrogate key? Surrogate Key is an artificial identifier for an entity. In surrogate key values are generated by the system sequentially(Like Identity property in SQL Server and Sequence in Oracle). They do not describe anything. Primary Key is a natural identifier for an entity. In Primary keys all the values are entered manually by the user which are uniquely identified. There will be no repetition of data.
Need for surrogate key not Primary Key
Determine how to maintain changes to this dimension. you can query by date/time from one data mart to another to another. and therefore allow queries to be executed across star schemas. But. What is Snow Flake Schema? Snowflake schemas normalize dimensions to eliminate redundancy. While this saves space. regardless of what data mart it is used in your organization. For example. By making this Calendar dimension adhere to a single structure. The same dimension is used in all subsequent star schemas defined. That is. the Calendar dimension is commonly needed in most data marts. This enables reporting across the complete data warehouse in a simple format. by using the concept of Data Mining. What are conformed dimensions? They are dimension tables in a star schema data mart that adhere to a common structure.which contains a normalised tables and online data. it does not give accurate results when compared to Data Mining. the dimension data has been grouped into multiple tables instead of one large table. Figure out how to extract this data. which automation tool is used in data warehouse testing? No Tool testing in done in DWH.?if we take a company/business organization.which have frequent insert/updates/delete. What are conformed dimensions? A conformed dimension is a single. For example. what are the advantages data mining over traditional approaches? Data Mining is used for?the estimation of future. For example. How are the Dimension tables designed? Find where data for this dimension are located. and a product_manufacturer table in a snowflake schema. we can predict the future of business interms of Revenue (or) Employees (or) Cutomers (or) Orders etc. The result is more complex queries and reduced query performance What is the Difference between OLTP and OLAP? OLTP is nothing but OnLine Transaction Processing . only manual testing is done. Traditional approches use?simple algorithms?for estimating the future. a product dimension table in a star schema might be normalized into a products table. a product_category table. Give examples of degenerated dimensions 6 . it increases the number of dimension tables and requires more foreign key joins. Change fact table and DW population routines. coherent view of the same piece of data throughout the organization.If a column is made a primary key and later there needs a change in the data type or the length for that column then all the foreign keys that are dependent on that primary key should be changed making the database Unstable Surrogate Keys make the database more stable because it insulates the Primary and foreign key relationships from changes in the data types and length.
4. What is hybrid slowly changing dimension? what ever changes done in source for each and every record there is a new entry in target side.Once the dimensions and Fact tables are designed define the relation ship between the tables by using primery key and Foriegn Key. we have already extracted this info into other dimension. while the other is TOE_ACCT table this contains information like Contact Details.Once the business requirements are clear then Identify the Grains(Levels). 5. POS Transaction Number?looks like a dimension key in the fact table but does not have the corresponding dimension table.Grains are defined . Phone No where history is not only important but considered to be changing slowly. Therefore. Therefore. In a traditional parent-child database.Understand the bussiness requirements. what is the datatype of the surrogate key? Normally Surrogate keys are sequencers which keep on increasing with new records being injected into the table. Like account information is usually maintained in two categories: Current Account and other is Time of Event Account i. Product Key (FK).design the Fact table With the Key Performance Indecators(Facts). Promotion Key?(FP). Store Key (FK). POS Transactional Number would be?the key to the transaction header record that contains all the info valid for the transaction as a whole. The standard datatype is integer What are the steps to build the datawarehouse? 1. 2.design the Dimensional tables with the Lower level Grains. Example: ?????In the PointOfSale Transaction Fact table.Later the data will be checked against the table constraints and the bad data won’t be indexed.Once the Dimensions are designed.?But in this? dimensional model. Data will be loaded directly. 3. What are the Different methods of loading Dimension tables? Conventional Load: Before loading the data.e We have two set of tables eg CUR_ACCT this is fast moving dimension containing information like Balance etc . Production Dimension corresponds to Production Key.?and POS Transaction Number?? Date Dimension corresponds to Date Key. we have: ???????? Date Key (FK).Degenerated Dimension is a dimension key without corresponding dimension.In logical phase data base design looks like Star Schema design so it is named as Star Schema Design. such as the transaction date and store?identifier. whether it may be UPDATE or INSERT and in target mentaining the history. POS Transaction Number is a degenerated dimension. What is a linked cube? 7 . Let me give an example to make the point clear…. Direct load:(Faster Loading) All the Constraints will be disabled. all the Table constraints will be checked against the data. With?this respect TOE_ACCT table qualiefies as slowly changing dimension.
then it is called as slowly changing dimension. Informatica Power Analyzer 8 .t have its own dimensions. MS reporting services 6. these dimensions doesn. People change their names for some reason. a product price changes over time. MS-Excel 2.A cube can be stored on a single analysis server and then defined as a linked cube on other Analysis servers. This arrangement avoids the more costly alternative of storing and maintaining copies of a cube on multiple analysis servers. End users connected to any of these analysis servers can then access the cube. If the?data??in the?dimension table happen to change very rarely. Country and State names may change over time.which happens rerely. Difference between Snow flake and Star Schema. ex: changing the name and address of a person. Microstrategy 5. These are a few examples of Slowly Changing Dimensions since some changes are happening to them over a period of time . linked cubes can be connected using TCP/IP or HTTP. What is degenerate dimension table? The values of dimension which is stored in fact table is called degenerate dimensions. To end users a linked cube looks like a regular cube. Cognos (Impromptu. What are the various Reporting tools in the Market? 1. What are situations where Snow flake Schema is better than Star Schema to use and when the opposite is true? Star Schema means A centralized fact table and sarounded by diffrent dimensions Snowflake means In the same star schema dimensions split into another dimensions Star Schema contains Highly Denormalized Data Snow flake? contains Partially normalized Star can not have parent table But snow flake contain parent tables Why need to go there Star: Here 1)less joiners contains 2)simply database 3)support drilling up options Why nedd to go Snowflake schema: Here some times we used to provide?seperate dimensions from existing dimensions that time we will go to snowflake Dis Advantage Of snowflake: Query performance is very low because more joiners is there What are slowly changing dimensions? Dimensions that change over time are called Slowly Changing Dimensions. Power Play) 4. For instance. Business Objects (Crystal Reports) 3.
4..so that it got its name. View is at database level. The different data types in business objects are:1. Long text. Character. What is the main difference between Inmon and Kimball philosophies of data warehousing? Both differed in the concept of building teh datawarehosue. Hence the development of the data warehouse can start with data from the online store. but alias?is a different name given for the same table to resolve the loops in universe. data type descriptions. Proclarity What is a Star Schema? Star schema is a type of organising the tables such that we can retrieve the result from the database easily and fastly in the warehouse environment. And the data warehouse is a conformed dimension of the data marts. data dictionaries. Hyperion (BRIO) 9. what r the data types present in BO?n wht happens if we implement view in the designer n report n my knowlegde. Data marts are focused on delivering business objectives for departments in the organization. Number What is meant by metadata in context of a Datawarehouse and how it is important? Metadata or Meta Data Metadata is data about data. Oracle Express OLAP 10.e.And alias is different from view in the universe. attribute/property descriptions. Other subject areas can be added to the data warehouse as their needs arise.3. Kimball–First DataMarts–Combined way —Datawarehouse Inmon—First Datawarehouse–Later—-Datamarts why fact table is in normal form? Basically the fact table consists of the Index keys of the dimension/ook up tables and the measures. and process/method descriptions.that itself implies that the table is in the normal form. Inmon beliefs in creating a data warehouse on a subject-by-subject area basis. Date.7.2. According to Kimball … Kimball views data warehousing as a constituency of data marts. i.. these are?called as object types in the Business Objects. so when ever we have the keys in a table . and 9 . The repository environment encompasses all corporate metadata resources: database catalogs. Point-of-sale (POS) data can be added later if management decides it is necessary.Usually a star schema consists of one or more dimension tables around a fact table which looks like a star. range/domain descriptions. Hence a unified view of the enterprise can be obtain from the dimension modeling on a local departmental level. Examples of metadata include data element descriptions. Actuate 8.
If your business process is “Sales” . valid values. Fact table also contains the foriegn keys for the dimension tables. Since the graphical representation resembles a star it is called a star schema. Data and information are extracted from heterogeneous sources as they are generated…. let us look at geographical level of granularity. We may analyze data at the levels of COUNTRY. What is Fact table? Fact Table contains the measurements or metrics or facts of business process. TERRITORY. every month. these time dimensions are updated. It must be noted that the foreign keys in the fact table link to the primary key of the dimension table. As an example. Metadata Synchronization The process of consolidating. What does level of Granularity of a fact table signify? In simple terms. level of granularity defines the extent of detail. available for queries and analysis. REGION.navigation services.. Metadata synchronization joins these differing elements together in the data warehouse to allow for easier access. Weekly process gets updated every week and monthly process. and description of a data element. How do you load the time dimension? Every Datawarehouse maintains a time dimension. Day also we have one function to implement loading of Time Dimension. Customer. The Product table links to the product_class table through the primary key and indirectly to the fact table. Metadata is stored in a data dictionary and repository. Differences between star and snowflake schemas ? The star schema is created when all the dimension tables directly link to the fact table. then a measurement of this business process such as “monthly sales number” is captured in the Fact table. Product_class and time_by_day. we say the highest level of granularity is STREET. Depending on the data loads. relating and synchronizing data elements with the same or similar meaning from different systems. What is a Data Warehouse? Data Warehouse is a repository of integrated information. The dimensions created are Store. day of the month and so on). length.e.This makes it much easier and more efficient to run queries over data that originally came from different sources. Metadata includes things like the name. The fact table contains foreign keys that link to the dimension tables. CITY and STREET. This sample provides the star schema for a sales_ fact for the year 1998. Generally we load the Time dimension by using SourceStage as a Seq File and we use one passive stage in that transformer stage we will manually write functions as Month and Year Functions to load the time dimensions but for the lower level i. It insulates the data warehouse from changes in the schema of operational systems. Typical relational databases are designed for online transactional processing (OLTP) and do not meet the requirements for effective on- 10 . It would be at the most granular level at which the business runs at (ex: week day. In this case.
unique/non-unique indexes.Facts table and Dimension table. Second Normal Form .1.? What are Semi-additive and factless facts and in which scenario will you use such kinds of fact tables? Semi-Additive: Semi-additive facts are facts that can be summed up for some of the dimensions in the fact table.line analytical processing (OLAP). such as IDEFIX. Why bother especially if you are pressed for time? A common What is Dimensional Modelling? Dimensional Modelling is a design concept used by many data warehouse desginers to build thier datawarehouse. but it does not make sense to add them up through time (adding up all current balances for a given account for each day of the month does not give us any useful information 11 . For example: Current_Balance and Profit_Margin are the facts. Inmon model structed as Normalised structure. What is Normalization. the dimensions on which the facts are calculated. Depends on the requirements of the company anyone can follow the company’s DWH will choose the one of the above models. so as to avoid duplication of values. Fact table contains the facts/measurements of the business and the dimension table contains the context of measuremnets ie. data warehouses are designed differently than traditional relational databases. To my knowledge. Third Normal Form? Normalization can be defined as segregating of table into two different tables. but not the others. Some methodologies. there is no standard process for doing so. Kimbell model always structed as Denormalised structure. What are the methodologies of Data Warehousing? They are mainly 2 methods. Dimension tables can use bitmap and/or the other types of clustered/non-clustered. SQLServer does not support bitmap indexes. Only Oracle supports bitmaps. As a result. First Normal Form. Ralph Kimbell Model 2. Current_Balance is a semi-additive fact. Steps In Building the Data Model While ER model lists and defines the constructs required to build a data model. as it makes sense to add them up for all accounts (what’s the total current balance for all accounts in the bank?). What type of Indexing mechanism do we need to use for a typical datawarehouse? On the fact table it is best to use bitmap indexes. specify a bottom-up Why is Data Modeling Important? Data modeling is probably the most labor intensive and time consuming part of the development process. In this design model all the data is stored in two types of tables . Inmon Model.
Fact tables that contain aggregated facts are often called summary tables. They are often used to record events or coverage information. An example of this is inventory levels.Tracking insurance-related accident events . and those that are foreign keys to dimension tables.Identifying product promotion events (to determine promoted products that didn?t sell) . A fact table contains either detail-level facts or facts that have been aggregated. Additive facts can be aggregated by simple arithmetical addition. What is the difference between view and materialized view? View . SQL statement only executes once and after that everytime you run the query. The data items thar are not facts and data items that do not fit into the existing dimensions are termed as Degenerate Dimensions.? the SQL statement executes. Degenerate Dimensions are used when fact tables represent transactional data. they can also be semi-additive or non-additive. where you cannot tell what a level means simply by looking at it. An example of this is averages. This dimension is typically represented as a single field in a fact table.Tracking student attendance or registration events . A common example of this is sales. Non-additive facts cannot be added at all. and equipment schedules for a hospital or university Is it correct/feasible develop a Data Mart using an ODS? Yes it is correct to develop a Data Mart using an ODS. A Degenerate dimension?is a?Dimension which has only a single attribute.becoz ODS which is used to?store transaction data and few Days (less historical data) this is what datamart is required so it is coct to develop datamart using ODS . Though most facts are additive. Degenerate Dimensions are the fastest way to group similar transactions.Identifying building. Common examples of factless fact tables include: . They can be used as primary key for the fact table but they cannot act as foreign keys. A fact table usually contains facts with the same level of aggregation. Everytime you access the view. facility. 12 . What are non-additive facts? Fact table typically has two types of columns: those that contain numeric facts (often called measurements).store the SQL statement in the database and let you use it as a table. materialized view .stores the results of the SQL in table form in the database. Explain degenerated dimension. Pros include quick query results. Semi-additive facts can be aggregated along some of the dimensions and not along others. but contains no numeric or textual facts. the stored result set is used.A factless fact table captures the many-to-many relationships between dimensions.
for a?query?the no. eg:??Data Mart of Sales. flags and/or text attributes that are unrelated to any particular dimension.For eq. Data Mart of Maketing.If we have a banking appln. summarization. Data Mart of HR etc. here. What is junk dimension? A “junk” dimension is a collection of random transactional codes. of tables queried?are less. What is ETL? ETL is an abbreviation for “Extract.What is VLDB? Very Large Database (VLDB) it is sometimes used to describe databases occupying magnetic storage in the terabyte range and containing billions of table rows.? But if you? consider overall ODS?tables. a rapidly changing dimension is one that holds the transactional data rather than staging data. we will have a separate table for customer personal details . Data Marts are used to improve the performance during the retrieval of data. Can a dimension table contains numeric values? yes. aggregation. these all details can be stored in one sinlge table thus decreasing the scanning of multiple tables for a single record of a customer details. we can have numeric values in dimensional table but these are not frequently updated as dim table contains constent data but only on some occassions it can change What is rapidly changing dimension? There is no Dimension called Rapidly changing dimension in DWH. What is the definition of normalized and denormalized view and what are the differences between them? I would like to add one more pt. these are decision support systems or transaction processing applications serving large numbers of users. The junk dimension is simply a structure that provides a convenient place to store the junk attributes. more no.?as through primary key and foreign key data needs to be fetched from its respective Master tables.This is the process of extracting data from their operational data sources or external data sources. of tables are?scanned or referred for a single query. as well as basic transformation and loading the data into some form of the data warehouse. if we take a Data Warehouse of an organization. a data warehouse is?divided into small units according the busness requirements. Data Mart?of Finance. What are Data Marts? Data Mart: a data mart is a?small data warehouse. Whereas in OLAP. Address details. as the data is in De-normailzed form.. In general. 13 . integration. as OLTP is in Normalized?form. Transform and Load”. transforming the data which includes cleansing. then it may be divided into the following individual Data Marts.. Typically. for example.Whereas in OLAP env.where asA degenerate dimension is data that is dimensional in nature but stored in a fact table..:. in OLTP env.?its transaction details etc.
month_agg. The marketplace is coming of age as we progress from first-generation “passive” decision-support systems to current. The all sub systesms maintinns the customer address can be different.. there will b no need to change the metadata of these tables and they can go along with any number of facts in that application without any changes What is aggregate table and aggregate fact table? Aggregate table contains summarized data. Mainly SCD 3 types what is conformed fact? Conformed dimensions are those table that have a fixed structure. clean it before it is add to Datawarehouse. For example an insurance dataware house can be used to mine data for the most high risk people to insure in a certain geographial area. Other typical example can be Addresses. Active data warehousing is all about integrating advanced decision support with day-to-day-even minute-to-minute-decision making in a way that increases quality of those customer touches which encourages customer loyalty and thus secure an organization’s bottom line. Current data will be change once you upload throgh ETL on schedule basis. The other may store it as MALE and FEMALE. to retrive date from this tables we use @aggrtegate function. efficiently and proactively. for ex in sales we have only date transaction. OLTP : Having data with on line system which connected to network and all update on transaction hppened in seconds. What is Data cleansing. So we need to polish this data. Every second data summrasied value will get changed. What is active data warehousing? An active data warehouse provides information that enables decision-makers within an organization to manage customer relationships nimbly. 14 .? This is nothing but polising of data. We might need a address cleansing to tool to have the customers addresses in clean and neat form. type 2(SCD). For example of one of the sub system store the Gender as M and F. year_agg. The materialized view is aggregated tables. quarter_agg. type 3 ? SCD means If the data in the dimension is happen to change very rarely.What is data mining? Data mining is a process of extracting hidden trends within a datawarehouse. if we want to create a report like sales by product per year. in such cases we aggregate the date vales into week_agg. explain about type 1. No further transaction will take place for current data which is part of the data ware house.and next-generation “active” data warehouse implementations What is the difference between ODS and OLTP? ODS :Having data with Datwarehouse that will be?stand alone.
It also means that we can have (for example) data aggregated for a year for a given product as well as the data can be drilled down to Monthly. data that is incorrect. weekly and daily basis…the lowest level is known as the grain. Its a final integration point ETL process we load the data in ODS before you load the values in target. Now. redundant. It is used for storing the details of daily transactions while a data warehouse is a huge storage of historical information obtained from different data marts for making intelligent decisions about the organization. we are calling oracle.e. level of granularity would mean what detail are you willing to put for each transactional fact. incomplete. Data Warehouse. What is data cleaning? How is it done? I can simply say it as purifying the data. Product sales with respect to each minute or you want to aggregate it upto minute and put that data. sqlserver.Data warehouse is collection of integrated. location. ODS. Data Cleansing: the act of detecting and removing and/or correcting a databases dirty data (i. it is nothing but a database. Going down to details is Granularity 15 . Why OLTP database are designs not generally a good idea for a Data Warehouse OLTP cannot store historical information about the organization. ODS AND DATA WAREHOUSE? Olap . as the name implies. ie:?a way of identifying Conforming Dimensions. What are the possible data marts in Retail sales? Example: product information. time varient. or formatted incorrectly) What is a level of Granularity of a fact table? Level of granularity means level of detail that you put into the fact table in a data warehouse. db2 are olap tools.means online transaction processing.stands for Operational Data Store. out-of-date. sales.non volatile and time variant collection of data which is used to take management decisions. handle real time transactions which inherently have some special requirements. Strategies are different methods followed to meet the validation requirements Summarize the difference between OLTP.. time… What are data validation strategies for data mart validation after loading process? Data validation is to make sure that the loaded data is accurate and meets the business requirements.. OLTP databases. For example: Based on design you can decide to put the sales data in each transaction.What is BUS Schema? A BUS Schema or a BUS Matrix? A BUS Matrix (in Kimball approach) is to identify common Dimensions across Business Processes.
16 . In other words. the fact table rows will reference the new surrogate key with the new roll-up thereby perfectly partitioning history. the roll-up attribute is merely updated with the current value. What is real time data-warehousing? Real-time data warehousing is a combination of two things: 1) real-time activity and 2) data warehousing. Data warehousing captures business activity data.Which columns go to the fact table and which columns go the dimension table? The Aggregation or calculated value columns will go to Fact Table and details information will go to dimensional table. What is ER Diagram ? The Entity-Relationship (ER) model was originally proposed by Peter in 1976 [Chen76] as a way to unify the network and relational database views. and SCD3? SCD Type 1. provide actual state name (’Texas’) in place of TX to the output. Foreign key elements along with Business Measures.a new record with the new attributes is added to the dimension table. It also depends on the granularity at which the data is stored. or current version and original. SCD2. Real-time activity is activity that is happening right now. SCD Type 2. are stored in the fact table. attributes are added to the dimension table to support two simultaneous rollups . the attribute value is overwritten with the new value. Real-time data warehousing captures business activity data as it occurs. it saves joins and space in terms of transformations. it is used at the run time. obliterating the historical attribute values. when the product roll-up changes for a given product. The activity could be anything such as the sale of widgets. Example. SCDType 3. The edge of the cube contains dimension members and the body of the cube contains data values. real-time data warehousing is a framework for deriving information from data as the data becomes available. there is data about it. What is a CUBE in data warehousing concept? Cubes are logical representation of multidimensional data. going forward. a lookup table called states. units (qty sold) may be a business measure. What is a lookup table? A lookup table is nothing but a ‘lookup’ it give values to referenced table (it is a reference). As soon as the business activity is complete and there is data about it. Date may be a business measure in some case. To add on. the completed activity data flows into the data warehouse and becomes available instantly. For example. Historical fact table rows continue to reference the old dimension key with the old roll-up attribute. Once the activity is complete.perhaps the current product roll-up as well as ?current version minus one?. such as Sales in $ amt. What is SCD1.
A hierarchy can be used to define data aggregation. No redundancy but very fast. SAS ETL 7. integrated. and 5. volatile and highly normalized. It generally contains the data through the whole life cycle of the company/product.What are the various ETL tools in the Market? 1. MS DTS 8. both systems are different in nature and functionality we should always keep them in different systems. subject oriented data.Make several physical hard drives look like one hard drive. Since. Informatica Power Center 2. 1/0. reporting or data mining purposes. Denormalized. Ab Intio 5.Great for readonly systems. Raid 5 is great for DW but not good for OLTP. Why should you put your data warehouse on a different system than your OLTP system? Data Warehouse is a part of OLAP (On-Line Analytical Processing). Pervasive Data Junction 10.Striped Raid 0. Raid 1 is half the speed of Raid 0 and the read and write performance are good. Raid 1. It is the source from which any BI tools fetch data for Analytical. What type of RAID setup would you put your TX logs Raid 0 . in a time dimension. then mirrored Raid 1. Each twin has an exact copy of the other twins data so if one hard drive fails. Depends on vendor implementation. May use for temporary spaces where loss of the files will not result in loss of committed data. What are the differences between the static and dynamic caches? Static cache stores overloaded values in the memory and it wont change throught the running of the session 17 . Oracle OWB 9. Raid 5 . Similar to Raid 1. Raid 1/0 .Mirroring. BO Data Integrator 6. For example. The nature of data in OLTP is: current. on the other hand the OLTP system contains data that is generally limited to last couple of months or a year at most. Each hard drive in the drive array has a twin. Explain the advantages of RAID 1. the other is used to pull the data. Write performance is 1/3rd that of Raid 1 but Read is same as Raid 1. ESS Base Hyperion 4. it contains only the textual attributes. Hard drives are cheap now so I always recommend Raid 1. a What is a dimension table? A dimensional table is a collection of hierarchies and categories along which the user can drill down and drill up. Cognos Decision Stream What is Data warehousing Hierarchy? Hierarchies are logical structures that use ordered levels as a means of organizing data. DWH contains historical. However. Ascential Data Stage 3. Sometimes faster than Raid 1.
. SO DATA MART IS SUBJECT ORIENTED AND WAREHOUSE IS NOTHING BUT COLLECTION OF DATAMARTS SO WE ASSUME IT ALSO SUBJECT ORIENTED BCZ IT’S COLLECTION OF DATA MARTS … SO FOR INDIVIDUAL ANALYSIS WE NEED DATAMARTS.. but it contains some additional data. this data is placed in a separate table and is connected to a dimension is called extended star schema. Galaxy schema What is star schema? The face surrounded by different dimensions and their respective level is called star schema..Where as dynamic cache stores the values in the memory and changes dynamically during the running of the session used in scd types — where target table changes and is cache are dynamically changes Does u need separate space for Data warehouse & Data mart? IN THE DATAWARE HOUSE ALL THE INFORMATION OF THE ENTERPRISE IS THERE BUT THE DATA MART IS SPECIFIC FOR THE PARTICULAR ANALYSIS LIKE SALES. What is Difference between E-R Modeling and Dimensional Modeling? In the E-R modeling the data is represented in entities and attributes and it’s in the Denormalized form. 18 . Snow flake schema 4.. What is galaxy schema? Some times the two facts are going to share common dimensions..PRODUCTION …. Star schema 2. Star flake schema 3. What is star flake schema? The fact surrounded by different dimensions and their respective levels with single level of hierarchy is called star flake schema. which is called fact constellation or galaxy schema. Extended star schema 5.Which are useful for decision support for management What are the types of dimensional Modeling? 1. In the dimensional modeling the data is represented in form of facts and dimension’s the fact’s contain only numerical and foreign key’s and the dimension’s are used to refer’s the data from fact table’s using these fact and dimensions we can form OLAP cubes for analysis . What is extended star schema? It is nothing but a star schema. What is snow flake schema? The fact surrounded by different dimensions and their respective levels with multiple hierarchies is called snow flake schema.
10) Use Write to cache in the hash file input. check the order of execution of the routines. 9) Use Preload to memory option in the hash file output. modify.HELP to determine the optimal settings for your hash files.FILE or HASH. disable this Option. This will improve the performance. Also. 19) Removed the data not used from the source as early as possible in the job. 17) Tuned the 'Project Tunables' in Administrator for better performance. If the hash file is used only for lookup then "enable Preload to memory". 14) If possible. 11) Write into the error tables only after all the transformer stages. 7) Check the write cache of Hash file. 15) Staged the data coming from ODBC/OCI/DB2UDB stages for optimum performance also for data recovery in case job aborts.ASENTIAL DATASTAGE 7. 13) Cache the hash files you are reading from and writing into. Filter. 12) Reduce the width of the input record . Used sorted data for Aggregator. 18) Sorted the data as much as possible in DB and reduced the use of DS-Sort for better performance of jobs. 16) Tuned the OCI stage for 'Array Size' and 'Rows per Transaction' numerical values for faster inserts.remove the columns that you would not use. updates and selects. If the same hash file is used for Look up and as well as target. Minimize the warnings 3) Reduce the number of lookups in a job design Use not more than 20stages in a job 4)Use IPC stage between two passive stages to Reduces processing time 5)Drop indexes before data loading and recreate after loading data into tables 6) There is no limit for no of stages like 20 or 30 but we can break the job into small jobs then we use dataset Stages to store the data. Make sure your cache is big enough to hold the hash files.) This would also minimize overflow on the hash file. 8) Don't use more than 7 lookups in the same transformer.5 1. (Use ANALYZE. 20) Worked with DB-admin to create appropriate Indexes on tables for better performance of DS queries 21) Converted some of the complex joins/business in DS to Stored Procedures on DS for 19 . break the input into multiple threads and run multiple instances of the job. Row Generator) 2) Use SQL Code while extracting the data Handle the nulls. introduce new transformers if it exceeds 7 lookups. What are other Performance tunings you have done in your last project to increase the performance of slowly running jobs? 1) Minimize the usage of Transformer (Instead of this use Copy.
How can I extract data from DB2 (on IBM i-series) to the data warehouse via Datastage as the ETL tool? I mean do I first need to use ODBC to create connectivity and use an adapter for the extraction and transformation of data? You would need to install ODBC drivers to connect to DB2 instance (does not come with regular drivers that we try to install. drop them onto the Designer work area. that would have ODBC drivers to connect to DB2) and then try out 3. you can enable this to ensure that hash files are written in order onto cash before flushed to disk instead of order in which individual rows are written.did u use it? You use the Designer to build jobs by creating a visual design that models the flow and transformation of data from the data source to the target warehouse. use CD provided for DB2 installation. 3 . 25) Tuning should occur on a job-by-job basis. Preloading hash file into memory -->this can be done by enabling preloading options in hash file output stage 2.faster execution of the jobs. by passing parameters from UNIX file. 24) Try to have the constraints in the 'Selection' criteria of the jobs itself. How can we pass parameters to job by using file? You can do like this. which are passed by UNIX 20 . This may be the case if the constraint calls routines or external macros but if it is inline code then the overhead will be minimal. 26) Using a constraint to filter a record set is much slower than performing a SELECT … WHERE…. 22) If an input file has an excessive number of rows and can be split-up then use standard logic to run jobs in parallel. Bulk loaders are generally faster than using ODBC or OCI. Write caching options -->.It makes data written into cache before being flushed to disk. This will eliminate the unnecessary records even getting in before joins are made. 4. How to improve the performance of hash file? You can improve performance of hashed file by 1. The Designer graphical interface lets you select stage icons.Preallocating--> Estimating the approx size of the hash file so that file need not to be spitted to often after write operation 5. The ds job has the parameters defined. and then calling the execution of a Datastage job. 23) Constraints are generally CPU intensive and take a significant amount of time to process. and add links. What is DS Designer used for . 27) Make every attempt to use the bulk loader for your particular database. 2.
Can u join flat file and database in Datastage? How? Yes. example 1 sequential file. 7. User-defined components.Process the data in a wide range of languages 21 . These are customized components created using the Datastage Manager or Datastage Designer 9. 11..6. You have various join types in Merge Stage like Pure Inner Join. You can use any one of these which suits your requirements. Right Outer Join etc. How we use NLS function in Datastage? What are advantages of NLS function? Where we can use that one? Explain briefly? By using NLS function we can do the following . Yes you can extract the data from two heterogeneous sources in data stages using the transformer stage it's so simple you need to just form a link between the two sources in the transformer stage. How can u implement slowly changed dimensions in Datastage? Explain? 8. Oracle in a single Job. Will Datastage consider the second constraint in the transformer if the first constraint is satisfied (if link ordering is given)?" Answer: Yes. What is a project? Specify its various components? You always enter Datastage through a Datastage project. 12. First create a job which can populate the data from database into a Sequential file and name it as Seq_First1. What are built-in components and user-defined components? Built-in components. 10. Sybase. Can any one tell me how to extract data from more than 1 heterogeneous Sources? Means. Left Outer Join. we can do it in an indirect way. These are predefined components used in a job. Take the flat file which you are having and use a Merge Stage to join the two files. When you start a Datastage client you are prompted to connect to a project.
For that we can go to DS Admin. Either the values are coming from Job Properties or from 22 . For Parallel jobs. Then job will start by skipping upto the failed record. We can also define new environment variable. 16. times and money . 13.Use Local formats for dates. If you’re running 4 ways parallel and you have 10 stages on the canvas. TNG. What are all the third party tools used in Datastage? Autosys. various extra features appear in the product. NLS is implemented in Datastage Server engine. NLS is implemented using the ICU library. We can set either as Project level or Job level.5 edition. this option is available in 7. 14. If a Datastage job aborts after say 1000 records. 17. For Server jobs.Sort the data according to the local rules If NLS is installed. Did you Parameterize the job or hard-coded the values in the jobs? Always parameterized the job. how many processes does Datastage create? Answer is 40 You have 10 stages and each stage can be partitioned and run on 4 nodes which makes total number of processes generated are 40 18. Once we set specific variable that variable will be available into the project/job. What is an environment variable? What is the use of this? Basically Environment variable is predefined variable those we can use while creating DS job. event coordinator are some of them that I know and worked with What is APT_CONFIG in Datastage? APT_CONFIG is just an environment variable used to identify the *. if we restart the job. Don’t confuse that with *.apt file that has the node's information and Configuration of SMP/MMP server..apt file. how to continue the job from 1000th record after fixing the error? By specifying Check pointing in job sequence properties. How to kill the job in data stage? ANS by killing the respective process ID 15.
What is merge and how it can be done plz explain with simple example taking 2 tables Merge is used to join two tables.Dept. Let us consider two table i. it is not possible to access the same job two users at the same time. Defaults nodes for Datastage parallel Edition Actually the Number of Nodes depends on the number of processors in your system. It is possible to run parallel jobs in server jobs? No. and password. But Server jobs can be executed in Parallel jobs 21. DS will produce the following error: "Job is accessed by other user" 22. Meta Data defines the type of data we are handling.a ‘Parameter Manager’ – a third part tool. Emp. Does u know about METASTAGE? MetaStage is used to handle the Metadata which will be very useful for data linkage and data analysis later on. 23.e. It takes the Key columns sort them in Ascending or descending order. it is not possible to run Parallel jobs in server jobs. There is no way you will hard–code some parameters in your jobs. 20. What is difference between Merge stage and Join stage? Merge and Join Stage Difference: 1.If we want to join these two tables we are having DeptNo as a common Key so we can give that column name as key and sort DeptNo in ascending order and can join those two tables 24. Can take Multiple Update links 3. 19. then first matching data will be the output . If your system is supporting two processors we will get two nodes by default. If you used it for comparison. It is possible to access the same job two users at a time in Datastage? No. username. The often Parameterized variables in a job are: DB DSN name. This Data Definitions are stored in repository and can be accessed with the use of MetaStage. Because it uses the update links to extend the primary details which are coming from 23 . Merge Reject Links are there 2.
Join and look up used to join oracle and sequential file 27. end loop activity. 31. In job sequence many stages like start loop activity. For Datastage it is a Top-Down approach. command stage and generate report option was there in file tab. What are the enhancements made in Datastage 7. What is the purpose of exception activity in data stage 7.5 compare with 7. 26. and this can be static or dynamic. What is Modulus and Splitting in Dynamic Hashed File? The modulus size can be increased by contacting your Unix Admin 29.0. How can we join one Oracle source and Sequential file?. In parallel jobs surrogate key stage.master link 25.did u use it? The Manager is a graphical tool that enables you to view and manage the contents of the Datastage Repository 30. Based on the Business needs we have to choose products. What is the difference between Datastage and informatica? The main difference is Vendors? Each one is having plus from their architecture. stored procedure stages were introduced. 24 . 28.0 Many new stages were introduced compared to Datastage version 7. In server jobs we have stored procedure stage.5? The stages followed by exception activity will be executed whenever there is an unknown error occurs while running the job sequencer. What is DS Manager used for . terminates loop activity and user variables activities were introduced. What are Static Hash files and Dynamic Hash files? The hashed files have the default size established by their modulus and separation when you create them.
use a join. Once the sort is over the join processing is very fast and never involves paging or other I/O Unlike Join stages and Lookup stages. When a Static Hash file is created. 34. 33. This can involve I/O if the data is big enough. DATASTAGE creates a file that contains the number of groups specified by modulo. There are many groups as the specified by the modulus. Size of Hash file = modulus (no. A join does a high-speed sort on the driving and reference datasets. How do you eliminate duplicate rows? The Duplicates can be eliminated by loading the corresponding data in the Hash file. the Merge stage allows you to specify several reject links as many as input links. Merge and lookup is the three stages differ mainly in the memory they use Datastage doesn't know how large your data is.Overflow space is only used when data grows over the reserved size for someone of the groups (sectors) within the file. but the I/O is all highly optimized and sequential. Importance of Surrogate Key in Data warehousing? 25 . 36. When a hashed file is created. groups) * Separations (buffer size) 35. What is the exact difference between Join. the Lookup stage can be used for doing lookups. Separation and modulo respectively specifies the group buffer size and the number of buffers allocated for a file. Here's how to decide which to use: if the reference datasets are big enough to cause trouble. 32. How can we implement Lookup in Datastage Server jobs? The DB2 stage can be used for lookups. In the Enterprise Edition. so cannot make an informed choice whether to combine data using a join stage or a lookup stage. What does separation option in static hash-file mean? The different hashing algorithms are designed to distribute records evenly among the groups of the file based on characters and their position in the record ids. Specify the columns on which u want to eliminate as the keys of hash. Merge and Lookup Stage? The exact difference between Join.
How to find the number of rows in a sequential file? Using Row Count System variable 41.What is Hash file stage and what is it used for? We can also use the Hash File stage to avoid / remove dupilcate rows by specifying the hash key on a particular fileld 39. In such condition there is a need of a key by which we can identify the changes made in the dimensions.How to run the job in command prompt in UNIX? Using dsjob command. How do we do the automation of dsjobs? We can call Datastage Batch Job from Command prompt using 'dsjob'. -options dsjob -run -jobstatus projectname jobname 26 . We can also pass all the parameters from command prompt. and SCD3. SCD2. Suppose if there are million records did you use OCI? If not then what stage do you prefer? Using Orabulk 42. reverts to previous version of a job also view version histories 40. Mainly they are just the sequence of numbers or can be Alfa numeric values also. The 2nd option is schedule these jobs using Data Stage director. Then call this shell script in any of the market available schedulers. These are system generated key. Runs different versions of same job.What is version Control? Version Control stores different versions of DS jobs. 37.The concept of surrogate comes into play when there is slowly changing dimension in a table. These slowly changing dimensions can be of three types namely SCD1. 38.
27 .This field exists for backward compatibility. Set the default values of Parameters in the Job Sequencer and map these parameters to job. U can check it in Job properties. DRS (Dynamic Relational stage) is a stage that tries to make it seamless for switching from one database to another. 2. that is.0 and later of the Plug-in. Dynamic Relational Stage was leveraged for People soft to have a job to run on any of the supported databases. U can add and access the environment variables from Job properties How do you pass the parameter to the job sequence if the job is running at night? Two ways: 1. each row is written in a separate statement. all the rows are written before being committed to the data table. Run the job in the sequencer using dsjobs utility where we can specify the values to be taken for each parameter. how can I access it in Datastage? U can view all the environment variables in designer. ODBC uses the ODBC driver for a particular database.How to find errors in job sequence? Using Datastage Director we can find the errors in job sequence How good are you with your PL/SQL? WE will not witting pl/sql in Datastage! Sql knowledge is enough.. Read more of that in the plug-in documentation... What is the difference between DRS (Dynamic Relational Stage) and ODBC STAGE? To answer your question the DRS stage should be faster then the ODBC stage as it uses native database connectivity. You will need to install and configure the required database clients on your Datastage server for it to work. Rows per transaction . The transaction size for new jobs is now handled by Rows per transaction on the Transaction Handling tab on the Input page.. It supports ODBC connections too. but it is ignored for release 3. The default value is 1. that is. Array Size . If I add a new environment variable in Windows. The default value is 0.The number of rows written to or read from the database at a time. It uses the native connectivity’s for the chosen target . What is the transaction size and array size in OCI stage? How these can be used? Transaction Size .The number of rows written before a commit is executed for the transaction.
How do you track performance statistics and enhance it? Through Monitor we can view the performance statistics What is the mean of Try to have the constraints in the 'Selection' criteria of the jobs itself? This will eliminate the unnecessary records even getting in before joins are made? This means try to improve the performance by avoiding use of constraints wherever possible and instead using them while selecting the data itself using a where clause.did u use it? The Administrator enables you to set up Datastage users. There are three different types of user-created stages available What are they? These are the three different stages: i) Custom ii) Build iii) Wrapped How will you call external function or subroutine from Datastage? There is Datastage option to call external programs. How to drop the index befor loading data in target and how to rebuild it in data stage? This can be achieved by "Direct Load" option of SQL Loaded utility. What is the max capacity of Hash file in DataStage? Take a look at the uvconfig file: 28 . ExecSH what is DS Administrator used for . control the purging of the Repository. This improves performance. if National Language Support (NLS) is enabled. install and manage maps and locales. and.
The default behavior # may be overridden by keywords on certain commands. How to implement type2 slowly changing dimension in Datastage? Give me with example? Slow changing dimension is a common problem in Dataware housing. Cluster systems can be physically disporsed. 32-bit files have a maximum file size of # 2 gigabytes.# 64BIT_FILES .Conditions that are either true or false that specifies flow of data with a link. In general 3 ways to solve this problem Type 1: The new record replaces the original record. Massively parallel processing? Symmetric Multiprocessing (SMP) . # The maximum file size for 64-bit # files is system dependent. 64BIT_FILES 0 What is the difference between symmetrically parallel processing.An intermediate processing variable that retains value during read and doesn’t pass the value into target column.Expression that specifies value to be passed on to the target column. The company must modify her address now. Derivation .Known as shared nothing in which each processor have exclusive access to hardware resources. constraints and column derivation or expressions What are Stage Variables. Constraint . Processors communicate via shared memory and have single operating system. # A value of 0 results in the creation of 32-bit # files. no trace of the old record at all 29 .This sets the default mode used to # create static hashed and dynamic files. Cluster or Massively Parallel Processing (MPP) . Derivations and Constants? Stage Variable . A value of 1 results in the creation # of 64-bit files (ONLY valid on 64-bit capable platforms).The processor have their own operations system and communicate via high speed network What is the order of execution done internally in the transformer with the stage editor having input links on the left hand side and output links? Stage variables.Some Hardware resources may be shared by processor. Later she moved to Florida. For example: There exists a customer called lisa in a company ABC and she lives in New York.
But one problem is when the customer moves from Florida to Texas the New York information is lost. Helps in keeping some part of the history and table size is not increased. Example a new column will be added which shows the original address as New York and the current address as Florida. These both r active stages and the design and mode of execution of server jobs has to be decided by the designer What happens if the job fails at night? Job Sequence Abort What is job control? How can it used explain with steps? JCL defines Job Control Language it is used to run more number of jobs at a time with or without using loops. of 64 output links and the data processed by the same stage having same meta data Link Collector: It will collects the data from 64 input links. steps: click on edit in the menu bar and select 'job properties' and enter the parameters as parameter prompt typeSTEP_ID STEP_ID string Source SRC stringDSN DSN string Username unm string Password pwd stringafter editing the above steps then set JCL button and select the jobs from the list box and run the job 30 . storage and performance can become a concern. the ipc stage as well as link partioner and link collector will simulate the parallel mode of execution over the server jobs having single cpu Link Partitioner: It receives data on a single input link and diverts the data to a maximum no. In Type2 New record is added. so Type 3 should only be used if the changes will only occur for a finite number of time.Type 2: A new record is added into the customer dimension table. Functionality of Link Partitioner and Link Collector? server jobs mainly execute the jobs in sequential fashion. simple to use. Therefore. therefore both the original and the new record Will be present. Advantage of using this type2 is. the new record will get its own primary key. Type 3: The original record is modified to reflect the changes. Type2 should only be used if it is necessary for the data warehouse to track the historical changes. merges it into a single data flow and loads to target. In Type1 the new one will over write the existing one that means no history is maintained. History of the person where she stayed last is lost. the customer is treated essentially as two different people. In Type3 there will be 2 columns one to indicate the original value and the other to indicate the current value. Historical information is maintained But size of the dimension table grows.
This enables the job to run using a separate process for each active stage. This is not recommended practice. Inter-process Use this if you are running server jobs on an SMP parallel system. which will run simultaneously on a separate processor.Tx is used for ODS source .this much I know If the size of the Hash file exceeds 2GB. 31 ..What is the difference between Datastage and Datastage TX? Its a critical question to answer.. but one thing i can tell u that Datastage Tx is not a ETL tool & this is not a new version of Datastage 7. and it is advisable to redesign your job to use row buffering rather than COMMON blocks. Note: You cannot inter-process row-buffering if your job uses COMMON blocks in transform functions to pass data between stages. This is not recommended practice. and it is advisable to redesign your job to use row buffering rather than COMMON blocks. This allows connected active stages to pass data via buffers rather than row by row. Note: You cannot use in-process row-buffering if your job uses COMMON blocks in transform functions to pass data between stages. How can you do incremental load in Datastage? Incremental load means daily load. When ever you are selecting data from source. select the records which are loaded or updated between the timestamp of last successful load and today’s load start date and time.What happens? Does it overwrite the current rows? Yes it overwrites the file Do you know about INTEGRITY/QUALITY stage? Integrity/quality stage is a data integration tool from ascential which is used to standardize/integrate the data from different sources How much would be the size of the database in Datastage? What is the difference between In process and Interprocess? In-process: You can improve the performance of most DataStage jobs by turning in-process row buffering on and recompiling the job.5. For this u have to pass parameters for those two dates.
we can run the data stage job from one job to another job in data stage. can we run the data stage job from one job to another job? File extender means the adding the columns or records to the already existing the file. What r XML files and how do you read data from XML files and what stage to be used? In the pallet there is a Real time stage like xml-input. What is the default cache size? How do you change the cache size if needed? Default read cache size is 128MB. xml-output. A single output link and a single rejects link. Datasets r used to import the data in parallel jobs like odbc in server jobs what is meaning of file extender in data stage server jobs.It allows you to read data from or write data to a file set. xml-transformer Where actually the flat files store? What is the path? Flat files stores the data and the path can be given in general tab of the sequential file stage What is data set? And what is file set? File set:. It only executes in parallel mode the data files and the file that lists them are called a file set. We can increase it by going into Datastage Administrator and selecting the Tunable Tab and specify the cache size. How do you remove duplicates without using remove duplicate stage? In the target make the column as the key column and run the job. This capability is useful because some operating systems impose a 2 GB limit on the size of a file and you need to distribute files among nodes to prevent overruns. The stage can have a single input link. in the data stage. How do you merge two files in DS? Either used Copy command as a Before-job subroutine if the metadata of the 2 files are same or created a job to concatenate the 2 files into one if the metadata is different. 32 .Store the last run date and time in a file and read the parameter through job parameters and state second argument as current date and time.
System variables are read-only. See the Date function. Where does UNIX script of Datastage execute weather in client machine or in server? Datastage jobs are executed in the server machines only. @SYSTEM.What about System variables? Datastage provides a set of variables containing useful system information that you can access from a transform or routine.STR The internal representation of the null value. @FALSE The compiler replaces the value with 0. but is set to FALSE whenever an output link is successfully written. @YEAR The current year extracted from @DATE. @INROWNUM Input row counter. @OUTROWNUM Output row counter (per link). @TM A text mark (a delimiter used in UniVerse files). @DATE the internal date when the program started. @USERNO The user number. @NULL. @SCHEMA The schema name of the current Datastage project. Char(253). 33 . @NULL The null value. @TRUE The compiler replaces the value with 1. Char(255). @PATH The pathname of the current Datastage project. For use in derivations in Transformer stages. Char(251). REJECTED is initially TRUE. Char(128). Char(254). REJECTED Can be used in the constraint expression of a Transformer stage of an output link. See the Time function. @LOGNAME The user login name. @FM A field mark. @TIME The internal time when the program started. @WHO The name of the current DataStage project directory. Char(252).CODE Status codes returned by system processes or commands. There is nothing that is stored in the client machine.RETURN. @SM A sub value mark (a delimiter used in Universe files). @IM An item mark. @DAY The day of the month extracted from the value in @DATE. @MONTH The current extracted from the value in @DATE. For use in constraints and derivations in Transformer stages. @VM A value mark (a delimiter used in UniVerse files).
dsexport. B. dsimport. run.did u use it? Datastage Director is GUI to monitor.imports the Datastage components. B.datastage designer is one who will design the job.Complex with more Granularities.exports the Datastage components. Denormalized form.exe. What are the command line functions that import and export the DS jobs? A. the stages that are required in developing the code What other ETL's you have worked with? Ab-initio Datastage EE parllel edition oracle -Etl there are 7 ETL in market! What will you in a situation where somebody wants to send you a file and use that file as an input or reference and then run job Under Windows: Use the 'WaitForFileActivity' under the Sequencers and then run the job. b) Snowflake Schema . Datastage developer is one who will code the jobs.exe. More normalized form.What is DS Director used for . 34 . What's the difference between Datastage Developers and Datastage Designers? What are the skills required for this. validate & schedule Datastage server jobs. i mean he will deal with blue prints and he will design the jobs.Simple & Much Faster. a) Star Schema . Under UNIX: Poll for the file. Once the file has start the job or sequencer depending on the file. Dimensional modeling is again sub divided into 2 types. May be you can schedule the sequencer around the time the file is expected to arrive.
on volatile. It insulates the data warehouse from changes in the schema of operational systems. data dictionaries. data type descriptions. The sequencer operates in two modes: ALL mode. time variant collect of data. if you are in testing 40 jobs for every 6 months although it need not be the same number for everybody 1. In this mode all of the inputs to the sequencer must be TRUE for any of the sequencer outputs to fire. and description of a data element. How many jobs have you created in your last project? 100+ jobs for every 6 months if you are in Development. Metadata is stored in a data dictionary and repository.What about System variables? 2.How can we create Containers? 35 . and navigation services. output triggers can be fired if any of the sequencer inputs are TRUE What are the Repository Tables in Datastage and what are they? A data warehouse is a repository (centralized as well as distributed) of Data. integrated. The repository environment encompasses all corporate metadata resources: database catalogs. able to answer any adhoc. and process/method descriptions. analytical. It can have multiple input triggers as well as multiple output triggers. What difference between operational data stage (ODS) & data warehouse? A Dataware house is a decision support database for organizational needs. ANY mode. In this mode. Metadata includes things like the name. It contains maximum 90 days information. In data stage I/O and Transfer. valid values. ODS (Operational Data Source) is a integrated collection of related information. length.What is sequence stage in job sequencer? What are the conditions? A sequencer allows you to synchronize the control flow of multiple activities in a job sequence. out put & transfer pages will have 4 tabs and the last one is build under that u can find the TABLE NAME. Metadata is data about data. range/domain descriptions. Examples of metadata include data element descriptions. It is subject oriented. under interface tab: input. historical or complex queries. attribute/property descriptions.
Maximum how many characters we can give for a Job name in DataStage? How do you pass filename as the parameter for a job? 1. Keep the project default in the text box.what are the Job parameters? 5. Go to the stage Tab of the job.How can we join one Oracle source and Sequential file?. Steps will be Edit --> Job Properties --> Job Control 36 .what is the difference between routine and transform and function? 6. you can reject duplicated records using -u parameter or 3)using a Sort stage What is the utility you use to schedule the jobs on a UNIX server other than using Ascential Director? AUTOSYS": Thru autosys u can automate the job by invoking the shell script written to schedule the datastage jobs. How to remove duplicates in server job 1)Use a hashed file stage or 2) If you use sort command in UNIX(before job sub-routine).3. Go to DataStage Administrator->Projects->Properties->Environment->UserDefined. In fact calling doesn't sound good. click on the "Use Job Parameter" and select the parameter name which you have given in the above.Difference between Hashfile and Sequential File? 12. because you attach/add the other job through job properties. In fact.How can we implement Slowly Changing Dimensions in DataStage?. you can attach zero or more jobs.How can we implement Lookup in DataStage Server jobs? 8. select the NLS tab.What is iconv and oconv functions? 11.How can we improve the performance of DataStage? 4. The selected parameter name appears in the text box beside the "Use Job Parameter" button. where you can enter your parameter name and the corresponding the path of the file.What are all the third party tools used in DataStage? 7. 10. It is possible to call one job in another job in server jobs? I think we can call a job into another job. 2. Here you can see a grid. 9. Copy the parameter name from the text box and use it in your job.
1.. 2. Used in server jobs (can also be used in parallel jobs).Server shared container. 37 . Ex: Consider two Jobs XXX and YYY. You can also include server shared containers in parallel jobs as a way of incorporating server job functionality into a parallel stage (for example.Click on Add Job and select the desired job. Run the other job using DSRunjob function 3. · There are two types of shared container:· 1. following steps needs to be followed in Routines.. Stop the job using DSStopJob function Containers : Usage and Types? Container is a collection of stages used for the purpose of Reusability. you could use one to make a server plug-in stage available to a parallel job). REmove log files periodically.Parallel shared container.· 2. Attach job using DSAttachjob function. what issues could arise? Data will partitioned on both the keys ! hardly it will take more for execution . The Job YYY can be executed from Job XXX by using Datastage macros in Routines. Used in parallel jobs. What is job control?how it is developed?explain with steps? Controlling Datstage jobs through some other Datastage jobs.. If data is partitioned in your job on key 1 and then you aggregate on key 2. a) Local Container: Job Specific b) Shared Container: Used in any job within a project. How do u clean the datastage repository. There are 2 types of Containers. To Execute one job from other job..
What does a Config File in parallel extender consist of? a) Number of Processes or Nodes. passwords like that What user variable activity when it used how it used !where it is used with real example By using This User variable activity we can create some variables in the job sequnce.datasource names.We can easily implement any complex design in DataStage by following simple tips in terms of increasing performance also. How can you implement Complex Jobs in Datastage Complex design means having more joins and more look ups. then use same job in routine and call the job with different file name. b) Actual Disk Storage Location. Then that job design will be called as complex job... Use at the Max of 20 stages in each job. For better performance. There is no limitation of using stages in a job.. What are validations you perform after creating jobs in designer. What r the different type of errors u faced during loading and how u solve them Check for Parameters. and check for input files are existed or not and also check for input tables existed or not and also usernames. If the metadata for all the files r same then create a job having file name as parameter.this variables r available for all the activities in that sequnce.Use not more than 7 look ups for a transformer otherwise go for including one more transformer.. how can i do that. 38 . while processing the files it should fetch files automatically . Most probably this activity is @ starting of the job sequence I want to process 3 files in sequentially one by one .or u can create sequencer to use the job. If it is exceeding 20 stages then go for another job.
5. Unix. we can use shared container more than one time in the job. why because in my job i used the Shared container at 6 39 . The server shared container can also be used in parallel jobs as shared container. You can convert your server job into a server shared container. This will not return any error code.What happens out put of hash file is connected to transformer. however..e only datastage can understand example :.now u will use iconv and oconv. ocnv(iconv(datecommingfromi/pstring.date comming in mm/dd/yyyy format datasatge will conver this ur date into some number like :. I have some information which will help you in saving a lot of time. suppose u want to change mm/dd/yyyy to dd/mm/yyyy.There is any limit to use it.defineoconvformat)) What is ' inserting for update ' in datastage I think 'insert to update' is updated value is inserted to maintain history How I can convert Server Jobs into Parallel Jobs? I have never tried doing this. if there is no primary link then this will treat as primary link itself.740 u can use this 740 in derive in ur own format by using oconv. Can we use shared container as lookup in Datastage server jobs? I am using DataStage 7. you can do SCD in server job by using Lookup functionality.SOMEXYZ(seein help which is iconvformat).What error it through? If Hash file output is connected to transformer stage the hash file will consider as the Lookup file if there is no primary link to the same Transformer stage. What is iconv and oconv functions? Iconv( )-----converts string to internal storage formatOconv( )----converts an expression to an output format What are OConv () and Iconv () functions and where are they used? iconv is used to convert the date into into internal format i.
The source data is first loading into the staging area.flows. 1. can you please share the info on this. ensure that you use proper elimination of data between stages so that data volumes do not cause overhead. there might be a re-ordering of stages needed for good performance. there would be a lot of overhead and the performance would degrade drastically. What is the difference between sequential file and a dataset? When to use the copy stage? Sequential Stage stores small amount of the data with any extension in order to access the file Where as Dataset is used to store huge amount of the data and it opens only with an extension (. I would suggest you to write a query instead of doing several look ups. So. If you do not want to write the query and use intermediate stages. Finally the Fact tables are loaded from the corresponding source tables from the staging area. The data from staging area is then loaded into dimensions/lookups. hash look up). It seems as though embarassing to have a tool and still write a query but that is best at times.ds) 40 . which is causing problem and providing the solution. 3. ensure that you have appropriate indexes while querying. Other things in general that could be looked in: 1) for massive transaction set hashing size and buffer size to appropriate values to perform as much as possible in memory and there is no I/O overhead to disk. 2. 2) Enable row buffering and set appropate size for row buffering 3) It is important to use appropriate objects between stages for performance What is the flow of loading data into fact & dimensional tables? Here is the sequence of loading a data warehouse. At any time only 2 flows are working. DataStage from Staging to MDW is only running at 1 row per second! What do we do to remedy? I am assuming that there are too many stages. If there are too many look ups that are being done. where data cleansing takes place. In general. if you too many stages (especially transformers .
These are functions that you can use whendefining custom transforms. If RCP is disabled for the job. or before or after an activestage. so there is no need to map it at design time. Records can be copied without modification or you can drop or change the order of columns. What happens if RCP is disable? Runtime column propagation (RCP): If RCP is enabled for any job. What are Routines and where/how are they written and have you written any routines before? RoutinesRoutines are stored in the Routines branch of the DataStage Repository.• Custom UniVerse functions. How we can call the routine in datastage job?explain with steps? Routines are used for impelementing the business logic they are two types 1) Before Sub Routines and 2)After Sub Routinestepsdouble click on the transformer stage right click on any one of the mapping field select [dstoutines] option within edit window give the business logic and select the either of the options( Before / After Sub Routines) 41 .The Copy stage copies a single input data set to a number of output datasets. Youspecify the category when you create the routine. These are specialized BASIC functionsthat have been defined outside DataStage.which are located in the Routines ➤ Built-in ➤ Before/Afterbranch in the Repository. When designing a job.where you can create. and specifically for those stages whose output connects to the shared container input. DataStage has a number of builtin before/after subroutines.• Before/After subroutines. view. These functionsare stored under the Routines branch in the Repository. then meta data will be propagated at run time. in such case OSH has to perform Import and export every time when the job runs and the processing time job is also increased. DataStage has a number of built-intransform functions which are located in the Routines ➤ Examples➤ Functions branch of the Repository. If NLS is enabled. You can also define your ownbefore/after subroutines using the Routine dialog box. or edit them using the Routine dialog box. Each record of the input data set is copied to every output data set. Using the Routinedialog box. Thefollowing program components are classified as routines:• Transform functions. you can specify asubroutine to run before or after the job. You can also defineyour own transform functions in the Routine dialog box. you can get DataStage to create a wrapper that enablesyou to call these functions from within DataStage.
What is the OCI? and how to use the ETL Tools? OCI doesn't mean the orabulk data. What are the different types of lookups in Datastage? 42 .Symmetrical Multi Processing.Avoid the Use of only one flow for tuning/performance testing 3.Massive Parallel Processing. What are orabulk and bcp stages? ORABULK is used to load bulk data into single table of target oracle database. BCP is used to load bulk data into a single table for microsoft sql server and sysbase How can ETL excel file to Data mart? Open the ODBC Data Source Administrator found in the control panel/administrative tools.Work in increment 4. Then u'll be able to access the XLS file from Datastage. Under the system DSN tab. Types of Parallel Processing? Parallel Processing is broadly classified into 2 types.Do not involve the RDBMS in intial testing 8.Distribute file systems to eliminate bottlenecks 7.How can we improve the performance of Datastage jobs? Performance and tuning of DS jobs: 1. It actually uses the "Oracle Call Interface" of the oracle to load the data.Isolate and solve 6. It is kind of the lowest level of Oracle being used for loading the data.Establish Baselines 2.Evaluate data skew 5. a) SMP .Understand and evaluate the tuning knobs available. b) MPP . add the Driver to Microsoft Excel.
There are two types of lookupslookup stage and lookupfilesetLookup:Lookup refrence to another stage or Database to get the data from it and transforms to other database.5 in server jobs? There are lot of Diffrences: There are lot of new stages are available in DS7. which by convention has the suffix .0 and 7. one file will be created for each partition. the transforms required. Where as Shared Containers can be used any where in the project. The output link must be a reference link. A design interface used to create Datastage applications (known as jobs).Shared Container Local container is available for that particular Job only. The stage can have a single input link or a single output link. Briefly describe the various client components? There are four client components Datastage Designer.. 43 . and the destination of the data. Local container: Step1:Select the stages required Step2:Edit>Construct Container>Local SharedContainer: Step1:Select the stages required Step2:Edit>Construct Container>Shared Shared containers are stored in the Shared Containers branch of the Tree Structure How do you populate source files? There are many ways to populate one is writing SQL statement in oracle is one way What are the differences between the data stage 7.LookupFileSet:It allows you to create a lookup file set or reference one for a lookup.Local Container 2. Jobs are compiled to create executables that are scheduled by the Director and run by the Server. The individual files are referenced by a single descriptor file.5 For Eg: CDC Stage Stored procedure Stage etc. When creating Lookup file sets.fs. Each job specifies the data sources. The stage can be configured to execute in parallel or sequential mode when used with an input link. How can we create Containers? There are two types of containers 1.
increase the sequential or dataset read/ write buffer. ex: $APT_CONFIG_FILE Like above we have so many environment variables.database. 44 . run. xml etc select the required stages for transformation logic such as transformer. What are the Steps involved in development of a job in DataStage? The steps required are: select the data source stage depending upon the sources for ex:flatfile. link Partitioner. we are talking about running the Job in the "check only" mode. b) Log View . Types of vies in Datastage Director? There are 3 types of views in Datastage Director a) Job View . Datastage Administrator. and Program Generated Messages. and monitor Datastage jobs. data mart. When we say "Validating a Job". Event Messages. A user interface used to configure Datastage projects and users. The following checks are made : . link collector.staging etc what is the difference between validated ok and compiled in Datastage.we can associate the configuration file(Wighout this u can not run ur job).Warning Messages.Status of Job last run c) Status View . Datastage Manager. schedule. Aggregator. A user interface used to view and edit the contents of the Repository. ODS.Connections are made to the data sources or data warehouse. A user interface used to validate.We can use them to to configure the job ie.Datastage Director. merge etc select the final target stage where u want to load the data either it is datawatehouse.Dates of Jobs Compiled. Please go to job properties and click on "add environment variable" to see most of the environment variables. What are the environment variables in datastage?give some examples? Theare are the variables used at the project or job level.
. UniVerse. Intermediate files in Hashed File. We can use Datastage Extract Pack for SAP R/3 and DataStage Load Pack for SAP BW to transfer the data from oracle to SAP Warehouse. Is it possible to move the data from oracle ware house to SAP Warehouse using with DATASTAGE Tool. In the SCD2 scenarions surrogate keys play a major role How does Datastage handle the user security? We have to create users in the Administrators and give the necessary privileges to users. Why do you use SQL LOADER or OCI STAGE? When the source data is anormous or for bulk data we can use OCI and SQL loader depending upon the source Where we use link Partitioner in data stage job? explain with example? We use Link Partitioner in DataStage Server Jobs. The natural key is the one coming from the OLTP system. Purpose of using the key and difference between Surrogate keys and natural key We use keys to provide relationships between the entities (Tables).. We can use thease surrogate keys insted of using natural key.SQL SELECT statements are prepared. instead we can parameterize the source file name in a sequential file. if they do not already exist. The surrogate key is the artificial key which we are going to create in the target DW. These Plug In Packs are available with DataStage Version 7. By using primary and foreign key relationship.5 45 . How to parameterize a field in a sequential file? I am using Datastage as ETL Tool.Files are opened. We cannot parameterize a particular field in a sequential file. we can maintain integrity of the data. Sequential file as source. or ODBC stages that use the local data source are created.The Link Partitioner stage is an active stage which takes one input andallows you to distribute partitioned rows to up to 64 output links.
then renamed to Parallel Extender when purchased by Ascential. The enterprise edition offers parallel processing features for scalable high volume solutions. “Insert rows Else Update rows” Or “Update rows Else Insert rows”. How to handle the rejected rows in Datastage? We can handle rejected rows in two ways with help of Constraints in a Tansformer. Apply either of the two steps above said on that Link. Linux and Unix System Services on mainframes. it now supports Windows. Server jobs only accept server 46 .1) By Putting on the Rejected cell where we will be writing our constraints in the properties of the Transformer2)Use REJECTED in the expression editor of the Constraint Create a hash file as a temporary storage for rejected rows. parallel jobs. while loading the data due to some regions job aborts? Does Enterprise Edition only add the parallel processing for better performance? Are any stages/transformations available in the enterprise edition only? • Datastage Standard Edition was previously called Datastage and Datastage Server Edition. • Datastage Enterprise Edition was originally called Orchestrate. • Datastage Enterprise MVS: Server jobs. What are the difficulties faced in using Datastage? 1) If the number of lookups are more? 2) If clients want currency in terms of integer in conjunction with character like 2m. 3) What will happen. Parallel jobs have parallel stages but also accept some server stages via a container. sequence jobs. in update action of target Type 2: Use the steps as follows a) U have use one hash file to Look-Up the target b) Take 3 instances of target c) Give different conditions depending on the process d) Give different update actions in target e) Use system variables like Sysdate and Null.How to implement type2 slowly changing dimensions in data stage?explain with example? We can handle SCD in the following ways Type 1: Just use. Create a link and use it as one of the output of the transformer. • Datastage Enterprise: Server jobs.3l. All the rows which are rejected by all the constraints will go to the Hash File. mvs jobs. The first two versions share the same Designer interface but have a different set of design stages depending on the type of job you are working on. Designed originally for UNIX. parallel jobs. sequence jobs. MVS jobs are jobs designed using an alternative set of stages that are generated into Cobol/JCL code and are transferred to a mainframe to be compiled and run. Jobs are developed on a UNIX or Windows server transferred to the mainframe to be compiled and run.
If data is partitioned in your job on key 1 and then you aggregate on key 2. Click on Add Job and select the desired job. you can attach zero or more jobs. 1) E-R Diagrams 2) Dimensional modeling a) logical modeling b) Physical modeling What is job control? How it is developed? Explain with steps? Controlling Datstage jobs through some other Datastage jobs. Ex: Consider two Jobs XXX and YYY.. Is it possible to call one job in another job in server jobs? I think we can call a job into another job. Steps will be Edit --> Job Properties --> Job Control. · There are two types of shared container: 47 . To execute one job from other job. because you attach/add the other job through job properties. a) Local Container: Job Specific b) Shared Container: Used in any job within a project. What is the utility you use to schedule the jobs on a UNIX server other than using Ascential Director? AUTOSYS": Through autosys we can automate the job by invoking the shell script written to schedule the Datastage jobs.. The Job YYY can be executed from Job XXX by using Datastage macros in Routines. Run the other job using DSRunjob function 3. 1. following steps needs to be followed in Routines. In fact calling doesn't sound good. Dimension Modeling types along with their significance Ans. How does u clean the Datastage repository? Remove log files periodically. what issues could arise? Data will partition on both the keys! Hardly will it take more for execution. In fact. 2.stages. There are some stages that are common to all types (such as aggregation) but they tend to have different fields and options within that stage. MVS jobs only accept MVS stages. Attach job using DSAttachjob function... Stop the job using DSStopJob function Containers: Usage and Types? Container is a collection of stages used for the purpose of Reusability. There are 2 types of Containers.
1. can be exported to a file. and then re-import them into the new project. Parallel shared container. Example: Cust_Id<>0 is set as a constraint and it means and only those records meeting this will be processed further. 1) Connections are made for sources. 3) Moving Datastage jobs from one project to another project. which are stored in manager repository.txt file. Derivation is a method of deriving the fields. 48 . Used in server jobs (can also be used in parallel jobs). Just export the objects. This exported file can then imported back into Datastage. AVG etc. For better performance.txt file into customer address files based on customer country. Similarly for Canadian and Australian customer. Server shared container. when zipped. 2) Maintaining different version of jobs or project. Use not more than 7 look ups for a transformer otherwise go for including one more transformer. What are validations you perform after creating jobs in designer? Validation guarantees that Datastage job will be successful. Constraint which output link is used. we need to pass constraints. For simple example input column is a derivation that passes the value to target column.· 2. What are constraints and derivation? Constraint specifies the condition under which data flow through the output link. 4) Sharing jobs and projects between developers. Explain the process of taking backup in Datastage? Any Datastage objects including whole projects. move t o the other project. Constraints are nothing but business rule or logic. for example if you need to get some SUM. What is the difference between constraint and derivations? Constraints are applied to links where as derivations are applied to columns. it carry out fallowing without actually data processing. There is no limitation of using stages in a job. Suppose we want us customer addresses we need to pass constraint for us customer. If it is exceeding 20 stages then go for another job. How can you implement Complex Jobs in Datastage? Complex design means having more joins and more lookups. are small can be easily emailed from developer to another. We can easily implement any complex design in Datastage by following simple tips in terms of increasing performance also. Use at the Max of 20 stages in each job. Then that job design will be called as complex job. Used in parallel jobs. the export files. Import and export can be use for many purposes. you could use one to make a server plug-in stage available to a parallel job). Constraints are used to check for a condition and filter the data. For example-we have to split customers.derivations specifies the expression to pass values to the target column. like 1) Backing up jobs and projects. You can also include server shared containers in parallel jobs as a way of incorporating server job functionality into a parallel stage (for example.
In such condition there is a need of a key by which we can identify the changes made in 49 . Use the power of DBMS. link-colector.4]”).MPP machines u can use parallel jobs What is DS Manager used for? The Manager is a graphical tool that enables you to view and manage the contents of the DataStage Repository .this variables r available for all the activities in that sequnce.2. How to handle Date convertions in Datastage? We use a) “Iconv” function .. 2) Tuning should occur on a job-by-job basis. passwords like that What r the different type of errors u faced during loading and how u solves them? How do you fix the error "OCI has fetched truncated data" in DataStage Can we use Change capture stage to get the truncated data’s? Members please confirm What user variable activity when it used how it used !where it is used with real example By using This User variable activity we can create some variables in the job sequnce.”D/MDY[2. link-partitioner for performance tuning If u have cluster. And check for input files are existed or not and also check for input tables existed or not and also usernames. 3) Prepares the sql statements necessary for fetching the data. eg: Function to convert mm/dd/yyyy format to yyyy-dd-mm is Oconv(Iconv(Filedname. data source names. 5) Check for Parameters.4]”) Importance of Surrogate Key in Data warehousing? The concept of surrogate comes into play when there is slowely changing dimension in a table. If u have SMP machines u can use IPC. Its main use of export and import is sharing the jobs and projects one project to other project.”D-MDY[2. Most probably this activity is @ starting of the job sequence What is the meaning of the following.2. b) “Oconv” function .External Convertion. 4) It makes all connection from source to target that ready for data processing from source to target.2) Opens the files. 1)If an input file has an excessive number of rows and can be split-up then use standard logic to run jobs in parallel ANS: row partitioning and collecting.Internal Convertion.
The hash file needs atleast one key column to create. if it is insert we will modify with current timestamp and the old time stamp will keep as history. based on return value we will find out whether this is for insert . 50 .we are calling oracle. Data Cleansing: the act of detecting and removing and/or correcting a database?s dirty data (i. data that is incorrect. Let me try to explain Type 2 with time stamp. it return system time and one key. For satisfying the lookup condition we are creating a key column by using the column generator. out-of-date. level of granularity would mean what detail are you willing to put for each transactional fact. ODS. redundant. OLTP databases. For example: Based on design you can decide to put the sales data in each transaction. DataWareHouse.it is nothing but a database . handle real time transactions which inherently have some special requirements. Why are OLTP database designs not generally a good idea for a Data Warehouse OLTP cannot store historical information about the organization. These are sustem genereated key.SCD3.. These slowely changing dimensions can be of three type namely SCD1.Mainly they are just the sequence of numbers or can be alfanumeric values also.time varient.e. Sep 19 Summarize the differene between OLTP.stands for Operational Data Store.Its a final integration point ETL process we load the data in ODS before you load the values in target. Step :1 time stamp we are creating via shared container. How can u implement slowly changed dimensions in datastage? Yes you can implement Type1 Type2 or Type 3.ODS AND DATA WAREHOUSE ? oltp . the change capture stage will return a value for chage_code..?? or update.the dimensions. what is data cleaning? how is it done? I can simply say it as Purifying the data.Datawarehouse is collection of integrated. Product sales with respect to each minute or you want to aggregate it upto minute and put that data.sqlserver. incomplete.non volotile and time varient collection of data which is used to take management decisions. Now. Step 2: Our source is Data set and Lookup table is oracle OCI stage. It is used for storing the details of daily transactions while a datawarehouse is a huge storage of historical information obtained from different datamarts for making intelligent decisions about the organization.db2 are olap tools.means online transaction processiing . as the name implies. by using the change capture stage we will find out the differences. or formatted incorrectly) What is a level of Granularity of a fact table? Level of granularity means level of detail that you put into the fact table in a data warehouse. Edit. How can we implement Lookup in DataStage Server jobs? We can use a Hash File as a lookup in server jobs.SCD2.
provide actual state name (’Texas’) in place of TX to the output. when the product roll-up changes for a given product.The edge of the cube contains dimension members and the body of the cube contains data values. In other words. Foreign key elements along with Business Measures. there is data about it.a new record with the new attributes is added to the dimension table. 51 . the roll-up attribute is merely updated with the current value. What is real time data-warehousing? Real-time data warehousing is a combination of two things: 1) real-time activity and 2) data warehousing. it is used at the run time. Once the activity is complete. such as Sales in $ amt. attributes are added to the dimension table to support two simultaneous rollups . the attribute value is overwritten with the new value. or current version and original. weekl and daily basis…teh lowest level is known as the grain. SCDType 3. SCD Type 2. ? What is a lookup table? A lookup table is nothing but a ‘lookup’ it give values to referenced table (it is a reference). real-time data warehousing is a framework for deriving information from data as the data becomes available. units (qty sold) may be a business measure. What is SCD1 . Data warehousing captures business activity data. Historical fact table rows continue to reference the old dimension key with the old roll-up attribute. Real-time activity is activity that is happening right now. Example. the fact table rows will reference the new surrogate key with the new roll-up thereby perfectly partitioning history. What is a CUBE in datawarehousing concept? Cubes are logical representation of multidimensional data. it saves joins and space in terms of transformations. The activity could be anything such as the sale of widgets. Date may be a business measure in some case.It also means that we can have (for example) data agregated for a year for a given product as well as the data can be drilled down to Monthly. As soon as the business activity is complete and there is data about it. the completed activity data flows into the data warehouse and becomes available instantly. a lookup table called states.perhaps the current product roll-up as well as ?current version minus one?. It also depends on the granularity at which the data is stored. going down to details is Granularity Which columns go to the fact table and which columns go the dimension table? The Aggreation or calculated value colums will go to Fac Tablw and details information will go to diamensional table. going forward. To add on. What is ER Diagram ? The Entity-Relationship (ER) model was originally proposed by Peter in 1976 [Chen76] as a way to unify the network and relational database views. Real-time data warehousing captures business activity data as it occurs. are stored in the fact table. SCD3? SCD Type 1. obliterating the historical attribute values. SCD2 .For example.
Primary Key is a natural identifier for an entity. the dimension data has been grouped into multiple tables instead of one large table. and a product_manufacturer table in a snowflake schema. In Primary keys all the values are entered manually by the users which are uniquely identified. There will be no repetition of data. For example. What is Snow Flake Schema? Snowflake schemas normalize dimensions to eliminate redundancy. a product_category table. a product dimension table in a star schema might be normalized into a products table. “snow flake” and there advantages / disadvantages under different conditions ? How to design an optimized data warehouse both from data upload and query performance point of view ? What to exactly is parallel processing and partitioning & how it can be employed for optimizing the data warehouse design ? what are preferred indexes & constraints for DWH ? 52 . While this saves space. it increases the number of dimension tables and requires more foreign key joins. why primary key not used as surrogate key? Surrogate Key is an artificial identifier for an entity. Need for surrogate key not Primary Key If a column is made a primary key and?later there needs?a change in the datatype or the length for that column then all the foreign keys that are dependent on that primary key should be changed making the database Unstable Surrogate Keys make the database more stable because it insulates the Primary and foreign key relationships from changes in the data types and length. In surrogate key values are? generated by the system sequentially(Like Identity property in SQL Server and Sequence in Oracle).What is the main difference between schema in RDBMS and schemas in Data Warehouse….? RDBMS Schema * Used for OLTP systems * Traditional and old schema * Normalized * Difficult to understand and navigate * cannot solve extract and complex problems * poorly modelled DWH Schema * Used for OLAP systems * New generation schema * De Normalized * Easy to understand and navigate * Extract and complex problems can be easily solved * Very good model What is the need of surrogate key. The result is more complex queries and reduced query performance INTER VIEW QUESTIONS (GENERAL) What is WH scheme like “star scheme”. They do not describe anything. That is.
• Why should you do indexing first of all?. Hard Disk and Processor. What is Molap and Rolap? What is Diff between Them? 3. what is Star Schema ?Expalin? 6.. In ware house which indexes are used what is diff betw Trancate and Delete table . What are all the Questions you Put to yourClient?when you are Designing DWH? Oracle : how many type of Indexes are there. • What are materialized views?.... • How can you explain Data ware house to a lay man?. • What is a Data Warehouse?.. • What sort of indexing is done in Fact and why?. Project : Project Description and All. What is Snow Flex Schema ?Explain? 5. • What is the difference between OLTP and OLAP?. 53 . what are Diff Schemas used in DWH? Which one is most Commonly Used? 4.Read Optimisation in Oracle. • What sort of indexing is done in Dimensions and why?. • What are indexes and what are the different Types?..How the volume of data (from medium to very high) and frequency of querying will effect the d/n considerations ? why DATAWARE HOUSE ? Diffarent bitwen OLTP & OLAP ? what is the feature of DWH ? Do you know some more ETL TOOL ? what is the use of staging Area ? Do you know the life cycle of WH ? Did you heard about star Tell me about ur –self ? How many dimension & Fact are there in ur project ? What is Dimension ? Different between DWH & DATA MART ? 1. How can you Explain DWH to a Lay man? 2. How you Decide that you have to Go for Ware Hosing?In Requirement Study? 7. • As a DBA what are the differences do you think a DWH architecture should have or what are the parameters that you are concerned about when taking DWH into account?.. we can as well make an OLTP system behave as a DWH?. • What sort of normalization will you have on dimensions and facts?. • Do you mean to say that if we increase the system resources like RAM. how do you Optimise the Query.
Can cubes exist independently?. What is Power play administrator?(Same as above question). What is Informatica powercenter capabilities?. What are dimensions?. How have you done reporting?. What is CDC?. What are active and passive transformers?. What is the difference between Source and Joiner transformers?. On what platform was your Informatica server?. Why is a source used and why is a joiner used?. Which of the types have you used in your project?. Which one would you choose and why?.• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • What is a Star schema?. Can cubes be sources to another application?. What is a cube?. What is your idea of an Authenticator?. How many transformers have you used?. I-GATE INTERVIEW QUESTION 1]what is the difference between snow flake and star flake schemas? 2]how will u come to know that u have to do performance tunning? 3]describe project 4]how many dimensions and facts in your project? 54 . What are the different types of Transformers?. What is a Snow Flake schema?. What are facts?. What is your role in the projects that you have done?. What is the difference between those two?. What are hotfiles? What are snapshots?. What is Slice and Dice?. What is power play transformer?. How many mappings were there in your projects?. What is a date dimension How have you handled sessions?. What are SCDs and what are the different types?. How can you create a catalog and what are its types?. What is the difference?. Which was your source and which was your target platforms?. How can you handle multiple sessions and batches?. How many maximum number of rows can a date dimension have?. How many transformers did your biggest mapping have?.
• How can you explain Data ware house to a lay man?. Hard Disk and Processor. we can as well make an OLTP system behave as a DWH?... • As a DBA what are the differences do you think a DWH architecture should have or what are the parameters that you are concerned about when taking DWH into account?. 55 . • What is the difference between OLTP and OLAP?. • What is a Data Warehouse?. • What are indexes and what are the different Types?. • Do you mean to say that if we increase the system resources like RAM.5]draw scd type 1 and scd type 2 6]if from target you are getting timestamp data and you have one port in target having datatype as date then how will u load it? 7]what are the different type of lookups? 8]what condition will you give in update stratergy transformation in scd type 1? 9]what the different type of variables in update transformation? 10]what is target based commit and source based commit? 11]why you think scd type 2 is critical? 12]what are the type of facts? 13]what is factless fact? 14]if i am returning one port through connected lookup then why you need unconnected lookup? 15]if from flat file some duplicate rows are coming then how will you remove it using informatica? 16]if from relational table duplicate rows are coming then how will you remove them using informatica? 17]if i did not give group by option in aggregator transformation then what will be the result? 18]what is multidimentional analysis? 19]if i give all the charactristics of data wearhouse to oltp then will it be data wearhouse? 20]what are the charactristics of datawearhouse? 21]what is the break up of your team? 22]how will you do performane tunning in mapping? 23]which is good for performance static or dynamic cache? 24]what is target load order? 25]what are the transformation you worked on? 26]what is the naming convention you are using? 27]how are you getting data from client? 28]how will you convert rows into column and column into rows using informatica? 29]how will you enable test load? 30]did you work with connceted and unconnected lookup tell the difference 31]did you ever use normalizer? Here are some questions I had faced during interviews.
Why is a source used and why is a joiner used?. 56 . What is your role in the projects that you have done?. What is the difference between those two?. Which was your source and which was your target platforms?.• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • Why should you do indexing first of all?. What is Slice and Dice?. How can you create a catalog and what are its types?. What are the different types of Transformers?. How can you handle multiple sessions and batches?. What is the difference between Source and Joiner transformers?. What is a Star schema?. How many transformers have you used?. What are hot files? What are snapshots?. What are SCDs and what are the different types?. Which of the types have you used in your project?. What is CDC?. How many transformers did your biggest mapping have?. What sort of indexing is done in Dimensions and why?. What is Power play administrator?(Same as above question). How have you done reporting?. What is a cube?. What is Informatica powercenter capabilities?. What is your idea of an Authenticator?. What sort of indexing is done in Fact and why?. What are active and passive transformers?. What are materialized views?. What are facts?. How many mappings were there in your projects?. What is the difference?. On what platform was your Informatica server?. How many maximum number of rows can a date dimension have?. What is a Snow Flake schema?. What are dimensions?. Can cubes be sources to another application?. Can cubes exist independently?. What is a date dimension How have you handled sessions?. What is power play transformer?. Which one would you choose and why?. What sort of normalization will you have on dimensions and facts?.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.