You are on page 1of 6

February 2013 (Spring drive) Bachelor of Computer Application (BCA) Semester 6 BC0058 Data Warehousing 4 Credits

Q1. With necessary diagram, Explain about Data Warehouse Development Life Cycle. Ans. Data Warehouse Development Life Cycle The Data Warehouse development life cycle covers two vital areas. One is warehouse management and the second one is data management. The former deals with defining the project activities and requirements gathering; Define the Project

Gather Requirements Model the Warehouse Validate the Model

Design the warehouse

Validate the Design Implementation

Q2. What is Metadata? What is its use in Data Warehouse Architecture? Ans. Metadata in a Data Warehouse is similar to the data dictionary or the data catalog in a Database Management System. In the data dictionary, you keep the information about the logical data structures, the information about the files and addresses, the information about the indexes, and so on. The Data Dictionary contains data about the data in the database. This definition is a commonly used definition. We need to elaboration this definition. Metadata in a Data Warehouse is similar to a data dictionary. Q3. Write briefly any four ETL tools. What is transformation? Briefly explain the basic transformation types.

Ans. ETL Tools ETL process can be created using almost any programming language, creating them from scratch is quite complex. Increasingly, companies are buying ETL tools to help in the creation of ETL processes. ETL tool must be able to communicate with the many different relational databases and read the various file formats used throughout an organization. Many ETL vendors now have data profiling, data quality and metadata capabilities. The Four ETL Tools are: Microsoft DTS Pervasive Data Junction Hummingbird Genio Clover ETL Transformation Data transformations are often the most complex and, in terms of processing time, the most costly part of the ETL process. They can range from simple data conversions to extremely complex data scrubbing techniques. Basic Tasks in Data Transformation The data transformation contains the following basic tasks: Selection: This tasks place at the beginning of the whole process of data transformation. The task of selection usually forms part of the extraction function itself. Splitting / Joining: This task includes the types of data manipulation you need to perform on the selected parts of source records. Joining of parts selected from many source systems is more widespread in the Data warehouse environment. Conversion: This is an all-inclusive task. It includes a large variety of rudimentary conversions of single fields for two primary reasons one to standardize among the data extractions from disparate source systems, and the other to make the fields usable and understandable to the users. Summarization: It is not feasible to keep data at the lowest level of detail in your Data Warehouse. It may be that none of your users ever need data at the lowest granularity for analysis or querying. For example, for a grocery chain, sales data at the lowest level of detail for every transaction at the checkout may not be needed. Storing sales by product by store by day in the Data Warehouse may be quite adequate. So, in this case, the data transformation function includes summarization of daily sales by product and by store. Enrichment: This task is the rearrangement and simplification of individual fields to make them more useful for the Data Warehouse environment. You may use one or more fields from the same input record to create a better view of the data for the Data Warehouse. This principle is extended when one or more fields originate from multiple records, resulting in a single field for the Data Warehouse.

Q4. What are ROLAP, MOLAP and HOLAP? What is Multidimensional Analysis? How do we achieve it? Ans. ROLAP: These are the intermediate servers stand in between a relational back-end server and client front-end tools. ROLAP servers include optimization for each DBMS back end, implementation of aggregation navigation logic, and additional tools and services. ROLAP technology tends to have greater scalability than MOLAP technology. The DSS server of Micro strategy, for example, adopts the ROLAP approach. MOLAP: These servers support multidimensional views of data through array-based multidimensional storage engines. They map multidimensional views directly to data cube array structures. The advantage of using a data cube is that it allows fast indexing to precomputed summarized data. Many MOLAP servers adopt a two-level storage representation to handle dense and sparse data sets: denser sub cubes are identified and stored as array structures, whereas sparse sub cubes are identified and stored as array structures, whereas sparse sub cubes employ compression technology for efficient storage utilization. HOLAP: The hybrid OLAP approach combines ROLAP and MOLAP technology, benefiting from the greater scalability of ROLAP and the faster computation of MOLAP. For example, a HOLAP server may allow large volumes of detail data to be stored in a relational database, while aggregations are kept in a separate MOLAP store. The Microsoft SQL Server 2000 supports a hybrid OLAP server. Q5. Explain testing process for Data Warehouse with necessary diagram. Ans. Testing Process for Data Warehouse Requirements Testing: - The main aim for doing Requirements testing is to check stated requirements for completeness. Requirements can be tested on following factors: 1. 2. 3. 4. 5. Are the requirements Complete? Are the requirements Singular? Are the requirements Ambiguous? Are the requirements Developable? Are the requirements Testable?

Successful requirements are those structured closely to business rules and address functionality and performance. These business rules and requirements provide a solid foundation to the data architects. Using the defined requirements and business rules, high level design of the data model is created.

QA Team Reviews BRD for Completeness QA Team builds Test Plan Develop Test Cases and SQL Queries

Business

High Level Design

Requirements Testing

Review of HLD

Test Case Preparation

Unit Testing Functional Testing

Test Execution

Regression Testing Performance Testing

User Acceptance Testing (UAT)

Process for Data Warehouse Testing

Unit Testing: - Unit testing for Data Warehouse is WHITEBOX. It should check the ETL procedures/mappings/jobs and the reports developed. The developers usually do this. Unit testing will involve the following: 1. Whether ETLs are accessing and picking up right data from right source. 2. Testing the rejected records that dont fulfill transformation rules. 3. Checking the source system connectivity. 4. Extracting the right data. 5. Security permission need to be checked. Regression Testing: - Regression testing is revalidation of existing functionality with each new release of code. When building test cases, remember that they will likely be executed multiple times as new releases are created due to defect fixes, enhancements or upstream system changes. A simple but effective and efficient strategy to retest basic functionality is to store source data sets and results from successful runs of the code and compare new test results with previous runs. When doing a regression test, it is much quicker to compare results to a previous execution than to do an entire data validation again. Integration Testing: - Integration testing shows how the application fits into the overall flow of all upstream and downstream applications.

Integration testing will involves following: 1. Sequence of ETLs job in batch. 2. Dependency and sequencing. 3. Job re-start ability. 4. Initial loading of records on Data Warehouse. 5. Error log generation. Scenarios to be covered in Integration Testing Integration Testing would cover End-to-End Testing for DWH. The coverage of the tests would include the below: 1. Count Validation - Record Count Verification DWH backend/Reporting queries against source and target as an initial check.

2.

Source Isolation - Validation after isolating the driving sources. Dimensional Analysis - Data integrity between the various source tables and relationships. Statistical Analysis - Validation for various calculations. Data Quality Validation - Check for missing data, negatives and consistency. Field-by-Field data verification can be done to check the consistency of source and target data. Granularity - Validate at the lowest granular level possible. Other Validations - Graphs, Slice/dice, meaningfulness, accuracy.

3.

4.

5.

6.

7.

User-Acceptance Testing User Acceptance Testing typically focuses on data loaded to the Data Warehouse and any views that have been created on top of the tables, not the mechanics of how the ETL application works. Use data that is either from production or as near to production data as possible. Test database views comparing view contents to what is expected. Plan for the system test team to support user during UAT. Q6. What is testing? Differentiate between the Data Warehouse testing and tradition software testing.

Ans. Testing for Data Warehouse is quite different from testing the development of OLTP systems. The main areas of testing for OLTP include testing user input for valid data type, edge values, etc. Testing for Data Warehouse, on the other hand, cannot and should not duplicate all of the error checks done in the source system. Data Warehouse implementations must pretty much take in what the OLTP system has produced. Testing for Data Warehouse falls into three general categories. These are testing for ETL, testing reports and other artifacts in the Data Warehouse which provide correct answers and lastly that the performance of all the Data Warehouse components is acceptable Here are some main areas of testing that should be done for the ETL process: Making sure that all the records in the source system that should be brought into the Data Warehouse actually are extracted into the Data Warehouse: no more, no less. Making sure that all of the components of the ETL process complete successfully. All of the extracted source data is correctly transformed into dimension tables and fact tables. All of the extracted and transformed data is successfully loaded into the Data Warehouse. Data Warehouse is system triggered where as QLTP is a user triggered. Volumes of the test data: o The test data in a transaction system is a very small sample of the overall production data. Typically, to keep the matters simple, we include as many test cases as are needed to comprehensively include all possible test scenarios, in a limited set of test data. o Data Warehouse has typically large test data as one does try to fill-up maximum possible combination and permutations of dimensions and facts. o For example, if you are testing the location dimension, you would like the location-wise sales revenue report to have some revenue figures for most of the 90 cities and the 34 states. This would mean that you have to have thousands of sales transaction data at sales office level.

You might also like