You are on page 1of 8

Bachelor of Computer Application (BCA) – Semester 6 BC0058 – Data Warehousing

1. with necessary diagram, Explain about Data Warehouse Development Life Cycle?
Ans :-

Data warehouse life cycle cover two vital areas. First is Warehouse management and second is Data management. The former deals with defining the project activities and requirements gathering. Life Cycle of Data Warehouse Development :-

Define the

Gather Requirements

Project Model The Warehouse

Validate The Model

Design The Warehouse

Validate The Design


3. 2. In the data dictionary. What is transformation? Briefly explain the basic transformation types. and so on. The logical Meta data repository can be centralized or distributed depending on the business needs and organizational requirements. this will notify the subscribers about any new changes to the Meta data contents depending on the user profile. Meta Data Results Manager: This will process the results of the Meta data search and allow the user to make an appropriate selection. Transaction processing system focus on automating the process. The active Meta data manager would be the core component of the Meta data architecture and would ideally consist of the following components:       Meta Data Capture: Initial capture of Meta data from a variety of sources. creating them from scratch is quite complex. the information about the indexes.Life Cycle Step Of a Data Warehouse Managing the Data Warehouse project is an on going activity. Meta Data Synchronizer: Processes to keep Meta data up to date. Ans:ETL process can be created using almost any programming language. Write briefly any four ETL tools. What is Metadata? What is its use in Data Warehouse Architecture? Ans:Metadata in a Data Warehouse is similar to the data dictionary or the data catalog in a Database Management System. Companies are buying ETL tools to help in the creation of . The data dictionary contains data about data in the DataBase. the information about the file and addresses. It is not like traditional system project. It would most likely be a group of data stores consisting of objects and relational data. making it faster and efficient. The Data Warehouse is concerned with the execution of Warehousing process and the data. Meta Data Alerter: As part of a push technology. Meta Data Query Trigger: Will trigger an appropriate query tool to get data from a data warehouse or any other source based on the selection made by the user in the Meta data results manager. you keep the information about the logical data structures. Meta Data Search Engine: This will be the front end for the users to search and access Meta data.

Most common transformation types are : Format revisions:.A common type of data transformation. . product package types may be indicated by codes and names in which the fields are numeric and text data types.  Calculated and derived Values:. The lengths of the package types may vary among the different source systems.  Decoding of Fields:.The extracted data from the sales system contains sales amounts. They can range from single data conversion to extremely complex data scrubbing techniques. These revisions include changes to the data types and lengths of individual fields. in terms of processing time . sales unit and operating cost estimates by product. or even enterprise service bus. You will have to calculate the total cost and the profit margin before data can be stored in the data warehouse. ETL tool have started to migrate into enterprise application integration. ETL Tools :      PL/SQL SAS Data Integrator/SAS-integration studio Ascential Data Stage Cognos Decision Stream Microsoft DTS Business Objects Data Integrator Transformation :Data transformations are often the most complex and. the most costly part of the ETL process. data quality and metadata capabilities. Many ETL vendors now have data profiling. In your source system. When you deal with multiple source system. with one source system using 1 and 2 for Male and Female and another system using M and F. Ex.ETL processes. A good ETL tool must be able to communicate with the many different relational database and read the various file formats used throughout an will come across these quite often. you are bound to have the same data items described by a plethora of field values. systems that now cover much more than just the extraction transformation and loading of data. The coding for gender.

MOLAP (Multidimensional Online Analytical Processing):. Splitting of Single Field:. ROLAP Technology tends to have greater scalability than MOLAP technology. MOLAP and HOLAP? What is Multidimensional Analysis? How do we achieve it? Ans:ROLAP (Relational Online Analytical Processing) :. . while aggregations are kept in a separate MOLAP store.  Character Set Conversion:. The advantage of using a data cube is that is allow fast indexing to precomputed summarized data. ROLAP servers include optimization for each DBMS back end implementation of aggregation navigation logic .  Conversion of unit of measurements  Date/Time conversion . benefiting from the greater scalability of ROLAP and the faster computation of MOLAP. the source data from these systems will be in EBCDIC characters.this type of data transformation does not literally mean the merging of several field to create a single field of data. and OLAP middleware to support missing pieces.These servers support multidimensional view of data through array-based multidimensional storage engines. HOLAP server may allow large volumes of detail data to be stored in a relational database. The first name. They map multidimensional views directly to data cube array structure. What are ROLAP. Notice that with multidimensional data stores.this type of transformation relates to the conversion of character set to an agreed standard character set for textual data in the data warehouse.  Summarization  De-duplication.These are the intermediate servers that stand in between a relational back-end server and client front-end tool. the storage utilization may be low if the data set is sparse. HOLAP (Hybrid Online Analytical Processing):.  Merging of Information :. and additional tool and services. middle initials and last name were stored as a large text in a single field. If you have mainframe legacy system as source system. 4. They use a relational or extended relational DBMS to store and manage warehouse data.Earlier the legacy systems stored name and addresses of customers and employees in large text fields.The hybrid OLAP approach combines ROLAP and MOLAP technology.

The developers usually do this. While.  Testing the rejected records that don’t fulfil transformation rules. Using the defined requirement and business rules. a data set consisting of the number of wins for a single football team at each of several years is a single-dimensional (in this case. the term multidimensional tends to be applied only to data sets with three or more dimension. For example. high level design of the data model is created. Successful requirement are those structured closely to business rules and address functionality and performance. A data set consisting of the number of wins for several football teams in a single year is also a single-dimensional (in this case. A data set consisting of the number of wins for several football teams over several years is a two-dimensional data set. The endpoints for system testing are the input and output of the ETL . Hence it becomes more important to verify whether these reporting requirements can be created using the data available. strictly speaking. two.System testing only include testing within the ETL application. These business rule and requirement provide a solid foundation to the data architects. It should check the ETL procedures and the reports developed. Ans:Testing process for data warehouse is: Requirement Testing :.  Extracting the right data.  Regression Testing  Integration Testing :.  Whether ETL are accessing and picking up right data form right source. Explain testing process for Data Warehouse with necessary diagram.Unit Testing for data warehouse is White Box. longitudinal) data set.Multidimensional analysis:. 5.The main aim for doing Requirement Testing is to check stated requirement for completeness.dimensional data sets are multi-dimensional. cross-sectional) data set.It is a data analysis process that groups data into two or more categories: data dimensions and measurements.  Security permission needs to be checked.  Checking the source system connectivity.  Unit Testing :. The requirements are mostly around reporting.  All the data transformation are correct according to the business rule and data warehouse is correctly populated with the transformed data.and higher. Twodimensional data sets are also called panel data.

Integration testing shows how the application fits into the overall flow of all upstream and downstream application. QA Team Review BRD For Completeness Business High Level Design Requirements Testing QA Team builds Test Plan Review of HLD Develop Test Cases and SQL Queries Test Case Preparation Unit Testing Functional Testing Regression Testing Performance Testing Test Execution User Acceptance Testing (UAT) .  Error log generation.  Testing the rejected those don’t fulfil transformation rules. consider how the overall process can break and focus on touch point between applications rather than within one application. Integration testing will involve following: Sequence of ETL jobs in batch. When creating integration test scenarios.  Dependency and sequencing.  Initial loading of records at a later date to verify the newly inserted or updated data.code being tested.  Job re-starts ability.

etc.The main reason for building a data warehouse application is to make data available to business users. Users know the data best.Process of Data warehouse Testing  User-Acceptance testing :. Testing for data warehouse falls into three general categories. testing that reports and other artifacts in the data warehouse provide correct answers and lastly that the performance of all the data warehouse components is acceptable .  Plan for the system test team to support user during UAT. These are testing for ETL. Testing for data warehouse. such as making sure postal codes are associated with the correct city and state that are practical to do. data warehouse implementations must pretty much take in what the OLTP system has produced. The user will likely have question about how the data is populated and need to understand detail of how the ETL works. Even though there are some data quality improvements. Ans:Testing for data warehouse is quite different from testing the development of OLTP systems. on the other hand. The main areas of testing for OLTP include testing user input for valid data type. and their participation in the testing effort is a key component to the success of a data warehouse implementation. It is important that user sing off and clearly understand how the views are created.  Use data that is either from production or as near to production data as possible. User Acceptance Testing typically focuses on data loaded to the data warehouses and any views that have been created on top of the tables. cannot and should not duplicate all of the error checks done in the source system. edge values. 6.  Consider how the user would require the data loaded during UAT and negotiate how often the data will be refreshed. What is testing? Differentiate between the Data Warehouse testing and traditional software testing.  Test database views comparing view contents to what is expected. not the mechanics of how the ETL application work.

no less.Here are some main areas of testing that should be done for the ETL process:     Making sure that all the records in the source system that should be brought into the data warehouse actually are extracted into the data warehouse: no more. Making sure that all of the components of the ETL process complete successfully All of the extracted source data is correctly transformed into dimension tables and fact tables All of the extracted and transformed data is successfully loaded into the data warehouse .