You are on page 1of 66

Syllabus

Introduction to Data Warehousing


Unit-I Data Warehousing Design Consideration and Dimensional Modeling
An Introduction to Oracle Warehouse Builder
Unit-II Defining and Importing Source Data Structures
Unit-III Designing the Target Structure
Creating the Target Structure in OWB
Extract, Transform, and Load Basics
Unit-IV Designing and building an ETL mapping
ETL: Transformations and Other Operators
Unit-V Validating, Generating, Deploying, and Executing Objects
Unit-VI Extra Features
Data warehousing and OLAP

Index

Sr. No Topic Page No


1 Introduction to Data Warehousing
Data Warehousing Design Consideration and Dimensional
Modeling
3-18
2 An Introduction to Oracle Warehouse Builder 19-40
Defining and Importing Source Data Structures
3 Designing the Target Structure 41-
Creating the Target Structure in OWB
4 Extract, Transform, and Load Basics
Designing and building an ETL mapping
5 ETL: Transformations and Other Operators
Validating, Generating, Deploying, and Executing Objects
6 Extra Features
Data warehousing and OLAP

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 1


Unit-I
Topics:
Introduction to D.W | D.W Design Consideration
and Dimensional Modelling

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 2


Q1. What is data warehouse? List and explain the characteristics of data warehouse.
Answer:

Data Warehouse:
 It is a central managed and integrated database containing data from the operational sources
in an organization.
 A data warehouse is a powerful database model that significantly enhances the user’s ability
to quickly analyze large multidimensional data sets.
 It cleanses and organizes data to allow users to make business decisions based on facts.
And so, the data in data warehouse must have strong analytical characteristics.
 Data warehouse is a decisional database system.
 Data Warehouse Delivers Enhanced Business Intelligence.
 Data Warehouse Saves Time.
 Data Warehouse Enhances Data Quality and Consistency.
 Data Warehouse Provides Historical Intelligence.
 Knowledge Discovery and Decision Support

Characteristics of Data Warehouse:


Following are different characteristics of data warehousing as follows:

Subject Oriented Data: - A data warehouse is organized around major subjects, such as
customer, vendor, product, and Sales. It focuses on the modeling and analysis of data rather
than day-to-day business operations.
Integrated Data: -A data warehouse is constructed by integrating data from multiple
heterogeneous data sources.
Time Referenced Data: - A data warehouse is a repository of historical data. It gives the view
of the data for a designated time frame.
Non Volatile Data: - A data warehouse is always a physically separate store of data
transformed from the application data found in the operational environment.
Due to this separation, a data warehouse does not require transaction processing, recovery,
and concurrency control mechanisms.
The non-volatility of data enables users to dig deep into history and arrive at specific business
decision based on facts.

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 3


Q2. What are operational databases? Explain the characteristics of a data warehouse.

Operational Database:-
The Operational databases are often used for on-line transaction processing (OLTP). It deals
with day-to-day operations such as banking, purchasing, manufacturing, registration,
accounting, etc.
These systems typically get data into the database. Each transaction processes information
about a single entity. Following are some examples of OLTP queries:
– What is the price of 2GB Kingston Pen drive?
– What is the email address of the president?

The purpose of these queries is to support business operations.

Characteristics of Data Warehouse:


Following are different characteristics of data warehousing as follows:

Subject-oriented:
A data warehouse is organized around major subjects, such as customer, vendor, product, and
Sales. It focuses on the modeling and analysis of data rather than day-to-day business
operations.
Integrated:
A data warehouse is constructed by integrating data from multiple heterogeneous data sources.
Time variant:
A data warehouse is a repository of historical data. It gives the view of the data for a designated
time frame.
Non-volatile:
A data warehouse is always a physically separate store of data transformed from the application
data found in the operational environment.
Due to this separation, a data warehouse does not require transaction processing, recovery,
and concurrency control mechanisms.
Q3. Draw and explain data warehouse architecture.
Or
Explain the framework of data warehouse.
Data Warehouse Architecture:-
Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 4
A Data Warehouse Architecture (DWA) is a way of representing the overall structure of data,
communication, processing and presentation that exists for end-user computing within the
enterprise.
The architecture is made up of a number of interconnected parts as follows:
 Source System
 Source data transport layer
 Data Quality control and data profiling layer
 Metadata management layer
 Data integration layer
 Data processing layer
 End user reporting layer

Below figure shows architecture of data warehouse which consist of various layer
interconnected each other.
Source System
Operational systems process data to support critical operational needs. In order to do this,
operational databases have been historically created to provide an efficient processing structure
for a relatively small number of well-defined business transactions.
The goal of data warehousing is to free the information locked up in the operational systems
and to combine it with information from other external sources of data.
Large organizations are acquiring additional data from outside databases. This information
includes demographic, econometric, competitive and purchasing trends.

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 5


Source data transport layer
The data transport layer of the DWA, largely constitutes data trafficking. It particularly
represents the tools and processes involved in transporting data from the source systems to the
enterprise warehouse system.
Data Quality control and data profiling layer
Data quality causes the most concern in any data warehousing solution. Incomplete and
inaccurate data will jeopardize the success of the data warehouse.
Data warehouses do not generate their own data; rather they rely on the input data from the
various source systems.
Metadata management layer
Metadata is the information about data within the enterprise. In order to have a fully functional
warehouse, it is necessary to have a variety of metadata available as also facts about the end-
user views of data and information about the operational databases.
Data integration layer
The data integration layer is involved in scheduling the various tasks that must be accomplished
to integrate data acquired from various source systems.
A lot of formatting and cleansing activities happen in this layer so that the data is consistent
across the enterprise.
Data processing layer
This layer consists of data staging and enterprise warehouse. Data staging often involves
complex programming, but increasingly warehousing tools are being created that help in this
process.
Staging may also involve data quality analysis programs and filters that identify patterns and
structures within existing operational data.
End user reporting layer
Success of a data warehouse implementation largely depends upon ease of access to valuable
information. Based on the business needs, there can be several types of reporting architectures
and data access layers.

Q4. Write any five significant differences between OLTP database and Data warehouse
database.

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 6


Q5. Differentiate between operational system & informational system.

Operational System Informational Systems


Current value is the data content Data is achieved, derived and summarized
Data structure is optimized for transaction. Data structure is optimized by complex queries
Access frequency is high Access frequency is medium to low
Data access type is read, update and delete Data access type is only read
Uses are predictable and are repetitive Usage is ad-hoc and random
Response time is in sub-seconds Response time is in several seconds to minutes
Large number of users Relatively small numbers of users
E.g. Current data of sales E.g. Old data of sales

Q6. What are the various levels of data redundancy in data warehouse?
Or
Describe virtual data warehouse and central data warehouse

There are three levels of redundancy,

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 7


Virtual Data Warehouses:-

End users are allowed to get operational databases directly using whatever tools are enabling
to data access network. That is it provides on-the-fly data for decision support purposes.
This approach is flexible and has minimum amount of redundant data. This approach can put
the unplanned query load on operational systems.
The advantages of this approach are:

 Flexibility
 No data redundancy
 Provides end-users with the most current corporate information
 Virtual data warehouses often provide a starting point for organizations to learn what end
users are really looking for.

Central Data Warehouses:-


It is a single physical repository that contains all data for a specific functional area, department
division or enterprise. A central data warehouse may contain information from multiple
operational systems. A central data warehouse contains time variant data.
The advantages of this approach are
 Security
 Ease of management

The disadvantages are


 Performance implications
 Expansion is expensive
 At times

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 8


Distributed data warehouse:-
Certain components are distributed across a number of different physical locations. Large
organizations are pushing decision making down to LAN or local computer serving local
decision makers.
Q7. Write a short note on Granularity of facts.

Granularity of Facts:-

The granularity of a fact is the level of detail at which it is recorded. If data is to be analyzed
effectively, it must be all at the same level of granularity.
The more granular the data, the more we can do with it. Excessive granularity brings needless
space consumption and increases complexity. We can always aggregate details.
Granularity is not absolute or universal across industries. This has two implications. First, grains
are not predetermined; second, what is granular to one business may be summarized for
another.

Granularity is determined by:


 Number of parts to a key
 Granularity of those parts

Adding elements to an existing key always increases the granularity of the data; removing any
part of an existing key decreases its granularity.

Granularity is also determined by the inherent cardinality of the entities participating in the
primary key. Data at the order level is more granular than data at the customer level, and data
at the customer level is more granular than data at the organization level.
Q8. Explain the various types of additivity of facts with examples.

A fact is something that is measurable and is typically numerical values that can be aggregated.

Following are the three types of facts:

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 9


Additive:-

Facts that are additive across all dimensions are referred as Full Additive or Additive.
Example of Additive Fact:-

The Sales_Amount can be summed up over all of the dimensions


(Time, Customer, Item, Location, Branch)

The Sales_Amount can be summed up over all of the dimensions


(Time, Customer, Item, Location, Branch)

Semi-Additive:

Facts those are additive across some of the dimensions, but not all are referred as Semi-
Additive.
Example of Semi-Additive Fact:-
Suppose a Bank stores current balance by account by end of each day.

The Balance cannot be summed up across Time dimension. It does not make sense if we sum
the current balance by date.

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 10


Non-Additive:-

Facts those are not additive across any dimension are referred as Non-Additive.
Non-additive facts are usually the result of ratio or other calculations.

Example of Non-Additive Fact:-

The Price cannot be summed up across any dimension.


Percentages and ratios are non-additive.

Q9. Explain dimensional model with the help of diagram.


Dimensional Model:-
The purpose of dimensional model is to improve performance by matching data structures to
queries. Users query the data warehouse looking for data like:
Total sales in volume and revenue for the NE region for product “XYZ‟ for a certain period this
year compared to the same period last year.
The central theme of a dimensional model is the star schema, which consists of a central fact
table, containing measures, surrounded by qualifiers or descriptors called dimensions table.
Dimension Table
A dimension table consists of tuple of attribute of the dimension.
Fact Table
A fact table contains the data and dimension identify each tuple in that data
In a star schema, if a dimension is complex and contains relationships such as hierarchies, it is
compressed or flattened to a single dimension.
Another version of star schema is a snowflake schema. In a snowflake schema complex
dimensions are normalized.
Here, dimensions maintain relationships with other levels of the same dimension. The
dimension tables are usually highly non-normalized structures.
These tables can be normalized to reduce their physical size and eliminate redundancy which
will result in a snowflake schema. Most dimensions are hierarchic.

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 11


Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 12
Q10. Explain star schema model with the help of diagram.
The relational implementation of dimensional model is done using star schema. It represents
multi-dimensional data.

A star schema consists of a central fact table containing measures and a set of dimension
tables.

Dimension table:-A dimension table consists of tuple of attribute of the dimension.


Fact table: - A fact table contains the data and dimension identify each tuple in that data

In star schema model a fact table is at the center of the star and the dimension tables as points
of the star.

A star schema represents one central set of facts. The dimension tables contain descriptions
about each of the aspects.

Say for example a warehouse that store sales data, there is a sales fact table stores facts about
sales while dimension tables store data about location , clients, items, times, branches.

Examples of sales facts are unit sales, dollar sales, sale cost etc. Facts are numeric values
which enable users to query and understand business performance metrics by summarizing
data.

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 13


Q11. Differentiate between star schema and snowflake schema.

Characteristics Star Schema Snowflake Schema


Has redundant data and No redundancy and hence
Ease of maintenance /
hence less easy to more easy to maintain and
change
maintain/change change
More complex queries and
Less complex queries and
Ease of Use hence less easy to
easy to understand
understand
Less no. of foreign keys and More foreign keys-and
Query Performance hence lesser query execution hence more query execution
time time
Good to use for data
Good for DataMart with
warehouse core to simplify
Type of Data warehouse simple relationships (1:1 or
complex relationships
1:many)
(many: many)
Joins Fewer Joins Higher number of Joins
Contains only single It may have more than one
Dimension table dimension table for each dimension table for each
dimension dimension
When dimension table is
When dimension table
relatively big in size, snow
When to use contains less number of rows,
flaking is better as it reduces
we can go for Star schema.
space.
Dimension Tables are in
Both Dimension and Fact
Normalization/ De- Normalized form but Fact
Tables are in De-Normalized
Normalization Table is still in De-
form
Normalized form

Q12. Short note on Helper Table.

Helper tables usually take one of two forms:


 Help for multi-valued dimensions
 Helper tables for complex hierarchies
Multi-Valued Dimensions:-

Take a situation where a household can own many insurance policies, yet any policy could be
owned by multiple households.
The simple approach to this is the traditional resolution of the many-to-many relationship, called
an associative entity.

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 14


Complex Hierarchies:-
A hierarchy is a tree structure, such as an organization chart. Hierarchies can involve some
form of recursive relationship.

Recursive relationships come in two forms— “self‟ relationships (1: M) and “bill of materials”
relationships (M: M). A self-relationship involves one table whereas a bill of materials involves
two.

These structures have been handled in ER modeling from its inception. They are generally
supported as a self-relating entity or as an associative entity in which an associate entity
(containing parent-child relationships) has dual relationships to a primary entity.

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 15


Unit-II
Topics:
An Introduction to O.W.B | Defining and
Importing Source Data Structures

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 16


Q1. Explain the various steps involved in installing oracle database software.

Following steps are used to install oracle database software:-

 Download the appropriate install file from Oracle web site


 Unzip the install files into a folder to begin the installation
 Run the setup.exe file from that folder to launch the Oracle Universal Installer program (OUI)
to begin the installation

Step1:-
Asks your email address and oracle support password to configure security updates.

Step2:-
Following are the installation options
 create and configure a database
 install database software only
 Upgrade an existing database

Step3:-
Here you can select the type of installation you want to perform.
The following are the installation types:
 Single Instance database installation
Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 17
 Real Application Cluster database installation (RAC)

Step4:-
To select the language in which your product will run.

Step5:-
We can choose the edition of the database to install, Enterprise, Standard, Standard Edition
One, or Personal Edition.

Step6:-
This step asks us to specify the installation location for storing Oracle configuration files and
software files.

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 18


Step7:-
In this step oracle will checks the environment to see whether it meets the requirements for
successful installation. The prerequisite checks include checking of operating system, physical
memory, swap space, network configuration etc.
Step8
Shows the installation summary.

Step9
The actual installation happens in step 9. A progress bar proceeds to the right as the installation
happens and steps for Prepare, Copy Files, and Setup Files are checked off as they are done.

Step10
Shows the success or failure of database installation.

Q2. What are the hardware and software requirements for installing oracle warehouse
builder?

Following are the Databases which support OWB:

 Oracle Database 12c R1 Standard Edition


 Oracle Database 12c R1 Enterprise Edition
 Oracle Database 11g R2 Standard Edition
 Oracle Database 11g R2 Enterprise Edition
 Oracle Database 11g R1 Standard Edition
 Oracle Database 11g R1 Enterprise Edition

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 19


The enterprise edition of the database gives you the power to use full features of the data
warehouse. But the standard edition does not support all warehouse features. The OWB with
the standard edition allows us to deploy only limited types of objects.

Following are hardware Requirements:


 Intel Core 2 duo or higher processor
 1 GB RAM
 10GB to 15GB Hard disk space
 Operating System Requirements
It support UNIX or Windows Platform
Windows Support (Windows Vista (Business Edition/Enterprise Edition/Ultimate Edition) /
Windows XP / Windows Server 2003 / Windows 7)

Q3. What is a listener? How is it configured?


Listener:-
The Listener is a named process which runs on the Oracle Server. The listener process runs
constantly in the background on the database server computer waiting for requests from clients
to connect to the Oracle database.
It receives connection requests from the client and manages the traffic of these requests to the
database server.

Configuring Listener:-
Run Net Configuration Assistant to configure a listener.

Step1:-
The first screen is a welcome screen. Select Listener Configuration option from it and then click
next button.

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 20


Step 2:-
The second screen allows you to add, reconfigure, delete or rename a listener. Choose Add
from the given option to configure a new listener and click next.

Step3:-
The third screen asks you to enter a name for the listener. The default name is “LISTENER”.
Enter a new name or continue with the default and then click next button to proceed.

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 21


Step4:-
The fourth screen is the protocol selection screen. By default the TCP protocol is selected in this
screen. TCP is the standard communication protocol for internet and most local networks.
Select the protocol and click next.

Step5:-
The fifth and final screen asks the TCP/IP port number for the listener to run. The default port
number is 1521 and continues with the default port number.

It will ask us if we want to configure another listener. Select no to finish the listener
configuration.

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 22


Then click on Next Button.

Q4. Explain various steps used for creating database.


Following steps are used in creation database:-
Run Database Configuration Assistant to create database.

Step1:-
The first step is to specify what action to take. Since we do not have a database created, we'll
select the Create a Database option.
Step2:-
This step will offer the following three options for a database template to select:
 General Purpose or Transaction Processing
 Custom Database
 Data Warehouse
We are going to choose the Data Warehouse option for our purposes

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 23


Step3:-
This step of the database creation will ask for a database name. Prove data warehouse name
as ACMEDW. As the database name is typed in, the SID is automatically filled in to match it.
Step4:-
This step of the database creation process asks whether we want to configure Enterprise
Manager. The box is checked by default and leaves it.
Step5:-
Click on the option to Use the Same Administrative Password for All Accounts and enter a
password.
Step6:-
This step is about storage. We'll leave it at the default of File System for storage management.
Step7:-
This step is for specifying the locations where database files are to be created. This can be left
at the default for simplicity
Step8:-
The next screen is for configuring recovery options. We would want to make sure to use the
Flash Recovery option and to enable archiving.
Step9:-
This step is where we can have the installation program create some sample schemas in the
database for our reference, and specify any custom scripts to run.
Step10:-
The next screen is for Initialization Parameters. These are the settings that are put in place to
define various options for the database such as Memory options.
Step11:-
This step is automatic maintenance and we'll deselect that option and move on, since we don't
need that additional functionality.
Step12:-
The next step is the Database Storage screen referred to earlier. Here the locations of the pre-
built data and control files can be changed if needed.

On the final Database Configuration Screen, there is a button in the lower right corner labeled
Password Management.
We’ll scroll down until we see the OWBSYS schema and click on the check box to uncheck it
(indicating we want it unlocked) and then type in a password and confirm it as shown in the
following image:

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 24


Q5. Explain OWB components and architecture with diagram.
OWB Architecture:
Below diagram show the architecture of Oracle Warehouse Builder (OWB).It consist of 2 parts
i.e. OWB Client and OWB Server.
Following are the client side components:
 Design Center
 Repository Browser.
Following are the server side components:
 Control Center Service
 Repository
 Target Schema.

Design Center:-
The Design Center is the primary graphical user interface for designing a logical design of the
data warehouse.
Design Center is used to:
 import source objects
 design ETL processes
 Define the integration solution.
Control Center Manager:-
It is a part of the design center. It manages communication between target schema and design
center. As soon as we define a new object in the Design Center, the object is listed in the
Control Center Manager under its deployment location.
Repository Browser:-

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 25


It is another user interface used to browse design metadata. The Target Schema is where OWB
will deploy the object to, and where the execution of ETL processes that load our data
warehouse will take place.
Target Schema:-
The target schema is the target to which we load our data and the data objects that we
designed in the Design Center such as cubes, dimensions, views, and mappings. The target
schema contains Warehouse Builder components such as synonyms that enable the ETL
mappings to access the audit/service packages in the repository.
Warehouse Builder Repository:-
The repository schema stores metadata definitions for all the sources, targets, and ETL
processes that constitute our design metadata. In addition to containing design metadata, a
repository can also contain the runtime data generated by the Control Center Manager and
Control Center Service.
Workspaces:-
In defining the repository, we create one or more workspaces, with each workspace
corresponding to a set of users working on related projects.

Q6 .What is the relationship between OWBSYS and Oracle Warehouse Builder?


Oracle configures its databases with most of the pre-installed schemas locked, and so users
cannot access them. It is necessary to unlock them specifically, and assign our own passwords
to them if we need to use them.
One of them is the OWBSYS schema. This is the schema that the installation program
automatically installs to support the Warehouse Builder. We will be making use of it when we
start running OWB.
So under Password Management we see the OWBSYS schema and click on the check box to
uncheck it (indicating we want it unlocked) and then type in a password and confirm it.
Q7. Explain various steps used for configuring repository and workspaces.
Following steps are used in creation database:-
Run Repository Assistant under Warehouse Builder | Administration.

The steps for configuration are as follows:

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 26


The Repository Assistant application on the server and the first step it is going to ask us for the
database connection information:-
Host Name, Port Number, and Oracle Service Name for a SQL*Net connection.
The Host Name, Port Number, and Oracle Service name option as follows:

 The Host Name is the name assigned to the computer on which we've installed the database,
and we can just leave it at LOCALHOST.
 The Port Number is the one we assigned to the listener back when we had installed it. It
defaults to the standard 1521.
 Oracle Service Name is name of database i.e. ACMEDW
Q8. What is design center? Explain the functions of project explorer and connection
explorer windows.
Design Center:-
The Design Center is the main graphical interface used for the logical design of the data
warehouse. Through Design Center we define our sources and targets and design our ETL
processes to load the target from the source. The logical design will be stored in a workspace in
the Repository on the server.
Below diagram shows the Design Center which consist of Project Explorer Connection
Explorer and Global Explorer.
Project Explorer:-
Project Explorer window we can create objects that are relevant to our project. It has nodes for
each of the design objects we'll be able to create.
We need to design an object under the Databases node to model the source database. If we
expand the Databases node in the tree, we will notice that it includes both Oracle and Non-
Oracle databases.
It also has option to pull data from flat files. The Project Explorer can also be used for defining
the target structure.
Connection Explorer:-
The Connection Explorer is where the connections are defined to our various objects in the
Project Explorer. The workspace has to know how to connect to the various databases, files,
and applications we may have defined in our Project Explorer.
As we begin creating modules in the Project Explorer, it will ask for connection information and
this information will be stored and be accessible from the Connection Explorer window.
Connection information can also be created explicitly from within the Connection Explorer .
Global Explorer
The Global Explorer is used to manage these objects. It includes objects such as Public
Transformations or Public Data Rules.
A transformation is a function, procedure, or package defined in the database in Oracle's
procedural SQL language called PL/SQL. Data rules are rules that can be implemented to
enforce certain formats in our data.
Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 27
Q9. Write a procedure to create new project in OWB. What is difference between a
module and a project?
Following steps are used to create new project in OWB.
Step1: Launch the Design Center
Step2: Right-click on the project name in the Project Explorer and select Rename from the
resulting pop-up menu. Alternatively, we can select the project name, then click on the Edit
menu entry, and then on Rename.
Module:
Modules are grouping mechanisms in the Projects Navigator that correspond to locations in the
Locations Navigator. A single location can correspond to one or more modules. However, a
given module can correspond to only one metadata location and data location at a time.
The association of a module to a location enables you to perform certain actions more easily in
Oracle Warehouse Builder. For example, group actions such as creating snapshots, copying,
validating, generating, deploying, and so on, can be performed on all the objects in a module by
choosing an action on the context menu when the module is selected
Project:-
Project contains one or more a module(s).There are 2 types of module. One is Oracle and
second one is Non-Oracle.

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 28


All modules, including their source and target objects, must have locations associated with them
before they can be deployed. We cannot view source data or deploy target objects unless there
is a location defined for the associated module.
Q10. Explain the procedure for defining source metadata manually with Data Object
Editor
Following steps are used to create metadata manually with Data Object Editor. Suppose in
below example:-
Project: ACME_DW_PROJECT
Module: ACME_POS
We are going to define source metadata for the following table columns
ITEMS_KEY number(22)
ITEM_NAME varchar2(50)
ITEM_CATEGORY varchar2(50)
ITEM_VENDOR number(22)
ITEM_SKU varchar2(50)
ITEM_BRAND varchar2(50)
ITEM_LIST_PRICE number(6,2)
ITEM_DEPT varchar2(50)
Before we can continue building our data warehouse, we must have all our source table
metadata created. It is not a particularly difficult task. However, attention to detail is important to
make sure what we manually define in the Warehouse Builder actually matches the source
tables we're defining. The tool the Warehouse Builder provides for creating source metadata is
the Data Object Editor, which is the tool we can use to create any object in the Warehouse
Builder that holds data such as database tables. The steps to manually define the source
metadata using Data Object Editor are:
1. To start building our source tables for the POS transactional SQL Server database, let's
launch the OWB Design Center if it's not already running. Expand the ACME_DW_PROJECT
node and take a look at where we're going to create these new tables. We have imported the
source metadata into the SQL Server ODBC module so that is where we will create the tables.
Navigate to the Databases | Non-Oracle | ODBC node, and then select the ACME_POS module
under this node. We will create our source tables under the Tables node, so let's right-click on
this node and select New, from the pop-up menu. As no wizard is available for creating a table,
we are using the Data Object Editor to do this.
2. Upon selecting New, we are presented with the Data Object Editor screen. It's a clean slate
that we get to fill in, and will look similar to the following screenshot:
There are a number of facets to this interface but we will cover just what we need now in order
to create our source tables. Later on, we'll get a chance to explore some of the other aspects of
this interface for viewing and editing a data object. The fields to be edited in this Data Object
Editor are as follows:

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 29


The first tab it presents to us is the Name tab where we'll give a name to the first table we're
creating. We should not make up table names here, but use the actual name of the table in the
SQL Server database. Let's starts with the Items table. We'll just enter its name into the Name
field replacing the default, TABLE_1, which it suggested for us. The Warehouse Builder will
automatically capitalize everything we enter for consistency, so there is no need to worry about
whether we type it in uppercase or lowercase.
Let's click on the Columns tab next and enter the information that describes the columns of the
Items table. How do we know what to fill in here? Well, that is easy because the names must all
match the existing names as found in the source POS transactional SQL Server database. For
sizes and types, we just have to match the SQL Server types that each field is defined as,
making allowances for slight differences between SQL Server data types and the corresponding
Oracle data types.
Q11. Explain the steps for importing the metadata for a flat file.
Use the Import Metadata Wizard to import metadata definitions into modules.

The steps involved in creating the module and importing the metadata for a flat file are:

1. The first task we need to create a new module to contain our file definition. If we look in the
Project Explorer under our project, we'll see that there is a Files node right below the Databases
node. Right-click on the Files node and select New from the pop-up menu to launch the wizard.

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 30


2. When we click on the Next button on the Welcome screen, we notice a slight difference
already. The Step 1 of the Create Module wizard only asks for a name and description. The
other options we had for databases above are not applicable for file modules. We'll enter a
name of ACME_FILES and click on the Next button to move to Step 2.

3. We need to edit the connection in Step 2. So we'll click on the Edit button, we see in the
following image, it only asks us for a name, a description, and the path to the folder where the
files are.

4. The Name field is prefilled with the suggested name based on the module name. As it did for
the database module location names, it adds that number 1 to the end. So, we'll just edit it to
remove the number and leave it set to ACME_FILES_LOCATION.

5. Notice the Type drop-down menu. It has two entries: General and FTP. If we select FTP (File
Transfer Protocol used for getting a file over the network), it will ask us for slightly more
information.

6. The simplest option is to store the file on the same computer on which we are running the
database. This way, all we have to do is enter the path to the folder that contains the file. We
should have a standard path we can use for any files we might need to import in the future. So
we create a folder called Getting Started with OWB_files, which we'll put in the D: drive. Choose
any available drive with enough space and just substitute the appropriate drive letter. We'll click
on the Browse button on the Edit File System Location dialog box, choose the file path, and
click on the OK button.

7. We'll then check the box for Import after finish and click on the Finish button.

That's it for the Create Module Wizard for files

Q12. Explain the steps for creating oracle database module.


To create an Oracle Database module, right-click on the Databases | Oracle node in the
Project Explorer of Warehouse Builder and select New... from the pop-up menu. The first
screen that will appear is the Welcome screen, so just click on the Next button to continue.
Then we have the following two steps:

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 31


Step1:-
In this step we give our new module a name, a status, and a description that is optional. On the
next window after the Welcome screen, type in a name for the module. The module status is a
way of associating our module with a particular phase of the process, and we'll leave it at
Development and Click on the Next button to proceed.

Step2:-
The screen starts by suggesting a connection name based on the name we gave the module.
Click on the Edit button beside the Name field to fill in the details. This will display the following
screen:

Q13. What is the significance of HS parameters in the heterogeneous service


configuration file?
Answer:
The file named initdg4odbc.ora is the default init file for using ODBC connections. This file
contains the HS parameters that are needed for the Database Gateway for ODBC

Following are two parameters used in initdg4odbc.ora file:

HS_FDS_CONNECT_INFO = <odbc data_source_name>


HS_FDS_TRACE_LEVEL = <trace_level>

First Parameter is used for:

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 32


The HS_FDS_CONNECT_INFO line is where we specify the ODBC DSN. So replace the <odbc
data_source_ name> string with the name of the Data Source, which is for example
ACME_POS.

Second Parameter is used for:


The HS_FDS_TRACE_LEVEL line is for setting a trace level for the connection. The trace level
determines how much detail gets logged by the service and it is OK to set the default as 0
(zero).

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 33


Unit-III
Topics:
Designing the Target Structure |Creating the
Target Structure in OWB

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 34


Q1. Short note on cube and dimensions.
Cube:-

Here, sales indicate data about products sold and to be sold in a company.
The dimensions become the business characteristics about the sales, for example:
• A time dimension—users can look back in time and check various time periods
• A store dimension—information can be retrieved by store and location
• A product dimension—various products for sale can be broken out
Think of the dimensions as the edges of a cube, and the intersection of the dimensions as the
measure we are interested in for that particular combination of time, store, and product.

A picture is worth a thousand words, so let's look at what we're talking about in the following
image:

Think of the width of the cube, or a row going across, as the product dimension. Every piece of
information or measure in the same row refers to the same product, so there are as many rows
in the cube as there are products.

Think of the height of the cube, or a column going up and down, as the store dimension. Every
piece of information in a column represents one single store, so there are as many columns as
there are stores.

Finally, think of the depth of the cube as the time dimension, so any piece of information in the
rows and columns at the same depth represent the same point in time. The intersection of each
of these three dimensions locates a single individual cube in the big cube, and that represents
the measure amount we're interested in. In this case, it's dollar sales for a single product in a
single store at a single point in time.

Q2. Explain implementation of dimensional model.


There are two options: a relational implementation and a multidimensional implementation.
The relational implementation, which is the most common for a data warehouse structure, is
implemented in the database with tables and foreign keys.

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 35


The multidimensional implementation requires a special feature in a database that allows
defining cubes directly as objects in the database.
Relational implementation (Star Schema):-
The term relational is used because the tables in it relate to each other in some way. For a
relational data warehouse design, the relational characteristics are retained between tables.
A data warehouse dimensional design that is represented relationally in the database will have
one main table to hold the primary facts, or measures we want to store, such as count of items
sold or dollar amount of sales.
The ER diagram of such an implementation would be shaped somewhat like a star, and thus
the term star schema is used to refer to this kind of an implementation. The main table in the
middle is referred to as the fact table because it holds the facts, or measures.
The tables surrounding the fact table are known as dimension tables. These are the
dimensions of our cube. These tables contain descriptive information, which places the facts in
a context that makes them understandable.

Multidimensional implementation (Star Schema):-


A multidimensional implementation or OLAP (Online Analytical Processing) requires a
database with special features that allow it to store cubes as actual objects in the database, and
not just tables that are used to represent a cube and dimensions. It also provides advanced
calculation and analytic content built into the database to facilitate advanced analytic querying.
The Oracle Database Enterprise Edition has an additional feature that can be licensed called
OLAP that embeds a full-featured OLAP server directly in an Oracle database. These kinds of
analytic databases are well suited to providing the end user with increased capability to perform
highly optimized analytical queries of information.
Q3. Explain multidimensional implementation of data warehouse.

The multidimensional implementation requires a special feature in a database that allows


defining cubes directly as objects in the database.

In relational implementation data is organized into dimension tables, fact tables and materialized
views. A multidimensional implementation requires a database with special features that allow it
to store cubes as actual objects in the database.

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 36


It also provides advanced calculation and analytic content built into the database to facilitate
advanced analytic querying. These analytic databases are quite frequently utilized to build a
highly specialized data mart, or a subset of the data warehouse, for a particular user
community.

MOLAP uses array-based multidimensional storage instead of relational database. MOLAP


tools generally utilize a pre-calculated data set referred to as data cube.

The data required for the analysis is extracted from relational data warehouse or other data
sources and loaded in a multidimensional database which looks like a hypercube. Hypercube is
a cube with many dimensions.

Q4. What is module? Explain source module and target module.


Module:
Modules are grouping mechanisms in the Projects Navigator that correspond to locations in the
Locations Navigator. A single location can correspond to one or more modules. However, a
given module can correspond to only one metadata location and data location at a time.

The association of a module to a location enables you to perform certain actions more easily in
Oracle Warehouse Builder. For example, group actions such as creating snapshots, copying,
validating, generating, deploying, and so on, can be performed on all the objects in a module by
choosing an action on the context menu when the module is selected

All modules, including their source and target objects, must have locations associated with them
before they can be deployed. You cannot view source data or deploy target objects unless there
is a location defined for the associated module.

Source Module:-
A source module is composed of source statements in the assembler language.
It accepts a no input from the data stream because they are used at the start of a workflow.
It is a place where data are stores.
Target Module:-
A target module is composed of target statements in the assembler language
It accepts input from the data stream.
It is place where data are extracts.

Q5. List and explain the functionalities that can be performed by OWB in order to create
data warehouse.
The Oracle Warehouse Builder is a tool provided by Oracle, which can be used at every stage
of the implementation of a data warehouse, from initial design and creation of the table structure
to the ETL process and data-quality auditing. So, the answer to the question of where it fits in is
everywhere.

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 37


We can choose to use any or all of the features as needed for our project, so we do not need to
use every feature. Simple data warehouse implementations will use a subset of the features and
as the data warehouse grows in complexity, the tool provides more features that can be
implemented. It is flexible enough to provide us a number of options for implementing our data
warehouse.

List of Functions:
 Data modeling
 Extraction, Transformation, and Load (ETL)
 Data profiling and data quality
 Metadata management
 Business-level integration of ERP application data
 Integration with Oracle business intelligence tools for reporting purposes
 Vii. Advanced data lineage and impact analysis

Oracle Warehouse Builder is also an extensible data integration and data quality solutions
platform. Oracle Warehouse Builder can be extended to manage metadata specific to any
application, and can integrate with new data source and target types, and implement support for
new data access mechanisms and platforms, enforce your organization's best practices, and
foster the reuse of components across solutions.

Q6. What is a target schema? How is a target module created?

Target schema:-
A target schema contains the data objects that contain your data warehouse data. The target
schema is going to be the main location for the data warehouse. When we talk about our "data
warehouse" after we have it all constructed and implemented, the target schema is what we will
be referring to. You can design a relational target schema or a dimensional target schema.
Every target module must be mapped to a target schema.

Creation of target module

 Launch Design Center.


 Right-click on the Databases Oracle node in the Project Explorer.
 Select New... from the pop-up menu.
 Welcome screen appears. Click next

Step1

Enter module name and select the module status as ‘Development’


Select the module type as ‘Data Warehouse Target’
Step2

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 38


Specify location name, user name, password, and host, port and service name
Click finish
Q7. What is time dimension? Discuss various steps involved in creating a time
dimension using time dimension wizard.

The Time/Date dimension provides the time series information to describe warehouse data.
Most of the data warehouses include a time dimension.

Also the information it contains is very similar from warehouse to warehouse. It has levels such
as days, weeks, months, etc. The Time dimension enables the warehouse users to retrieve data
by time period.

Creation of Time dimension

 Launch Design Center


 Expand the Databases node under any project where you want create a Time dimension
 Then right-click on the Dimensions node, and select New | Using Time Wizard... to
launch the Time Dimension Wizard.
 The first screen is a welcome screen which shows various steps involved in creation of a
time dimension.
Step1: Provide name and description

Step2: Set the storage type

Step3: Define the range of data stored in the time dimension

Step4: Choose the levels

Step5: Summary of Time Dimension before creation of the Sequence and Map

Step6: Progress Status

Q8. Explain various characteristics of a dimension.


A dimension has the following four characteristics:

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 39


Levels
Each dimension has one or more levels. It defines the levels where aggregation takes place or
data can be summed.
The OWB supports the following levels for Time dimension:
Day
Fiscal week
Calendar week
Fiscal month
Calendar month
Fiscal quarter
Calendar quarter
Fiscal year
Calendar year

Dimension Attributes
The Attributes are actual data items that are stored in the dimension that can be found at more
than one level. Say for example the time dimension has following attributes in each level: id
(identifies that level), Start and end date (designate time period of that level), time span (number
of days in the time period), description etc.

Level Attributes
Each level has Level Attributes associated with it that provide descriptive information about the
value in that level. For example, Day level has level attributes such as day of week, day of
month, day of quarter, day of year etc.

Hierarchies
It is composed of certain levels in order. There can be one or more hierarchies in a dimension.
The month, quarter and year can be a hierarchy. The data can be viewed at each of these
levels, and the next level

Q9. Write notes on the following


1. Slowly changing dimension
2. Surrogate keys

Slowly Changing Dimension


Slowly Changing Dimension (SCD) refers to the fact that dimension values will change over
time. Although this doesn't happen often, they will change and hence the "slowly" designation
It can also be defined as dimensions that change slowly over time are known as slowly
changing dimensions. These changes need to be tracked in order to report historical data.
The OWB allows the following options for slowly changing dimensions.
Type 1 - Do not keep a history. This means we basically do not care what the old value was and
just change it.

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 40


Type 2 - Store the complete change history. This means we definitely care about keeping that
change along with any change that has ever taken place in the dimension.
Type 3 - Store only the previous value. This means we only care about seeing what the
previous value might have been, but don't care what it was before that.
Surrogate Keys:-
Surrogate keys are artificial keys that are used as a substitute for source system primary keys.
They are generated and maintained within the data warehouse.
A Surrogate Key is a NUMBER type column and is generated using a Sequence. The
management of surrogate keys is the responsibility of the data warehouse. The Surrogate Keys
are used to uniquely identify each record in a dimension.

Example:-
The source tables have columns such as AIRPORT_NAME or CITY_NAME which are stated as
the primary keys (according to the business users) but, these can change and we could
consider creating a surrogate key called, say, AIRPORT_ID.
This would be internal to the warehouse system and as far as the client is concerned you may
display only the AIRPORT_NAME. Surrogate keys are numeric values and hence Indexing is
faster.

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 41


Unit-IV
Extract, Transform, and Load Basics

Designing and building an ETL mapping

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 42


Q1. What is ETL? Explain the importance of source target map.

ETL stands for extract transform and load. The ETL process transforms the data from an
application-oriented structure into a corporate data structure. Once the source and target
structures defined, we can move on to the following activities in constructing a data warehouse.

 Work on extracting data from sources


 Perform any transformations on the data
 Load into target data warehouse structure

The data warehouse architect builds a source to-target data map before ETL processing starts.
The source target map specifies:

 What data must be placed in the data warehouse environment?


 Where that data comes from (known as source or system of record)
 The logic or calculation or data reformatting that must be done to the data.

The data mapping is the input needed to feed the ETL process. Mappings are visual
representations of the flow of data from source to target and the operations that need to be
performed on the data.

Q2. What is staging? What are its benefits? Explain the situation where staging is
essential.

Staging:-
Staging is the process of copying the source data temporarily into tables in target database. The
purpose is to perform any cleaning and transformations before loading the source data into the
final target tables. Staging stores the results of each logical step of transformation in staging
tables. The idea is that in case of any failure you can restart your ETL from the last successful
staging step.

Staging make sense in the following case:-


 Large amount of data to load
 Many transformations to perform on that data while loading.
 Pulling data from non-oracle databases

This process will take a lot longer if we directly access the remote database to pull and
transform data. We'll also be doing all of the manipulations and transformations in memory and
if anything fails; we'll have to start all over again.

Benefits:-
• Source database connection can be freed immediately after copying the data to the staging
area. The formatting and restructuring of the data happens later with data in the staging area.

• If the ETL process needs to be restarted, there is no need to go back to disturb the source
system to retrieve the data.

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 43


It provides you a single platform even though you have heterogeneous source systems.

This is the layer where the cleansed and transformed data is temporarily stored. Once the data
is ready to be loaded to the warehouse, we load it in the staging database. The advantage of
using the staging database is that we add a point in the ETL flow where we can restart the load
from. The other advantages of using staging database is that we can directly utilize the bulk
load utilities provided by the databases and ETL tools while loading the data in the
warehouse/mart, and provide a point in the data flow where we can audit the data.

In the absence of a staging area, the data load will have to go from the OLTP system to the
OLAP system directly, which in fact will severely hamper the performance of the OLTP system.
This is the primary reason for the existence of a staging area. Without applying any business
rule, pushing data into staging will take less time because there are no business rules or
transformation applied on it.

Disadvantages:
 It takes more space in database and it may not be cost effective for client.
 Disadvantage of staging is disk space as we have to dump data into a local area.

Q3. Write the steps for building staging area table using Data Object Editor.
It is explained here with example.
STEP 1:-Navigate to the Databases | Oracle | ACME_DATA WAREHOUSE module. We will
create our staging table under the Tables node, so let’s right-click on that node and select
New.... from the pop-up menu.
STEP 2:-Upon selecting New.... we are presented with the Data Object Editor screen. However,
instead of looking at an object that’s been created already, we’re starting with a brand-new one.
STEP 3:-The first tab is Name tab where we’ll give our new table a name. Let’s call it
POS_TRANS_STAGE for Point-of-Sale transaction staging table. We’ll just enter the name into
the Name field, replacing the default TABLE_1 that it suggested for us.
STEP 4:-Let’s click on the Columns tab next and enter the information that describes the
columns of our new table. We have listed the key data elements that we will need for creating
the columns. We didn’t specify any properties of those data elements other than the name, so
we’ll need to figure that out.

Q4. What are mapping operators? Explain any two source target mapping operators in
detail.
Mapping operators
These are the basic design elements to construct an ETL mapping. Used to represent sources
and targets in the data flow. Also used to represent how to transform the data from source to
target.

Following is list of source target mapping operator:


Cube Operator-An operator that represents a cube. This operator will be used to represent
cube in our mapping.

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 44


Dimension Operator-An operator that represents dimensions. This operator will be used in our
mapping to represent them.
External Table Operator-This operator are used to access data stored in flat files as if they
were tables.
Table Operator-It represents a table in the database.
Constant- Represent constant values that are needed. Produces a single output view that can
contain one or more constant attributes.
View Operator-Represent a database view.
Sequence Operator-It represents database sequence which is an automatic generator of
sequential unit number & it is mostly used for populating a primary key field.
Construct Object-This operator can be used to actually construct an object in our mapping.

Q5. List and explain the use of various windows available in mapping editor.
Mapping-The mapping window is the main working area on the right where we will design the
mapping. This window is also referred as canvas.

Explorer-This window is similar to project explorer in design center. It has two tabs that is
available object tab & selected object tab.

Mapping properties-The Mapping properties window display various property that can be set
for objects in our mapping. When an object is selected in the canvas its property will be display
in this window.

Palette-This palette contains each of the objects that can be used in our mapping. We can click
on the object we want to place in the mapping and drag it onto the canvas.

Bird’s Eye View-This window display miniature version of entire canvas & allows us to store
around the canvas without using scroll bar.

Q.6 Explain the various OWB operators.


There are 3 types of operator in Oracle Warehouse Builder as follows:

 Source and Target Operator


 Data Flow Operator
 Pre\Post Processing Operator
Source and Target Operator:
The Warehouse Builder provides operators that we will use to represent the sources of our data
and the targets into which we will load data.
Following are some of the operators:

Cube Operator: An operator that represents a cube. This operator will be used to represent
cube in our mapping.
Dimension Operator: An operator that represents dimensions. This operator will be used in our
mapping to represent them.

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 45


External Table Operator: This operator is used to access data stored in flat files as if they were
tables.
Table Operator: It represents a table in the database.
Constant: Represent constant values that are needed. Produces a single output view that can
contain one or more constant attributes.
View Operator: Represent a database view.
Sequence Operator: It represents database sequence which is an automatic generator of
sequential unit number & it is mostly used for populating a primary key field.
Data Flow Operator:
The true power of a data warehouse lies in the restructuring of the source data into a format that
greatly facilitates the querying of large amounts of data over different time periods. For this, we
need to transform the source data into a new structure. That is the purpose of the data flow
operators.
Following are some of the operators:
Aggregator: When we need to sum the data up to a higher level, or apply some other
aggregation type function such as an average function. This is the purpose of the Aggregator
operator
Deduplicator: Sometimes our data records will contain duplicate combinations that we want to
weed out so we're loading only unique combinations of data. The Deduplicator operator will do
this for us.
Filter: This will limit the rows from an output set to criteria that we specify. It is generally
implemented in a where clause in SQL to restrict the rows that are returned.
Joiner: This operator will implement an SQL join on two or more input sets of data. A join takes
records from one source and combines them with the records from another source using some
combination of values that are common between the two.
Set Operation: This operator will allow us to perform an SQL set operation on our data such as
a union (returning all rows from each of two sources, either ignoring the duplicates or including
the duplicates) or intersect (which will return common rows from two sources).
Pre\Post Processing Operator
There is a small group of operators that allow us to perform operations before the mapping
process begins, or after the mapping process ends. These are the pre- and post-processing
operators.
Mapping Input Parameter: This operator allows us to pass a parameter(s) into a mapping
process.
Mapping Output Parameter: This is similar to the Mapping Input Parameter operator; but
provides a value as output from our mapping.
Post-Mapping Process: Allows us to invoke a function or procedure after the mapping
completes its processing.
Pre-Mapping Process: It allows us to invoke a function or procedure before the mapping
process begins.

Q7. Briefly explain the functions of filter and joiner operators.

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 46


Filter
• This will limit the rows from an output set to criteria that we specify.
• It is generally implemented in a where clause in SQL to restrict the rows that are returned.
• We can connect a filter to a source object, specify the filter criteria, and get only those records
that we want in the output.
• It has Filter Condition property to specify the filter criteria.

Joiner
• This operator will implement an SQL join on two or more input sets of data, and produces a
single output row set.

• That is it combines data from multiple input sources into one.

• A join takes records from one source and combines them with the records from another source
using some combination of values that are common between the two.

• It has a property called

Q8. What are data flow operators? Explain the concept of pivot operator with example.
Data flow operators

A data warehouse requires restructuring of the source data into a format that is congenial for the
analysis of data. The data flow operators are used for this purpose. These operators are
dragged and dropped into our mapping between our sources and targets. Then they are
connected to those sources and targets to indicate the flow of data and the transformations that
will occur on that data as it is being pulled from the source and loaded into the target structure.

Pivot

The pivot operator enables you to transform a single row of attributes into multiple rows.
Suppose we have source records of sales data for the year that contain a column

YEAR Q1_sales Q2_sales Q3_sales Q4_sales


---------- ---------- ---------- ---------- ------------
2005 10000 15000 14000 25000

We wish to transform the data set to the following with a row for each quarter:

YEAR QTR SALES


---------- -- ----------
2005 Q1 10000

2005 Q2 15000

2005 Q3 14000

2005 Q4 25000

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 47


Q9. What is expression operator? Explain the mapping of a date field SALE_DATE to a
numeric field DAY_CODE by applying TO_CHAR() and TO_NUMBER() functions through
expression operator. The string format for TO_CHAR() function is ‘YYYMMDD'.
Answer:
The expression operator represents an SQL expression that can be applied to the output to
produce the desired result. Any valid SQL code for an expression can be used, and we can
reference input attributes to include them as well as functions.

• Drag the Expression operator onto the mapping.

• It has two groups defined—an input group, INGRP1—and an output group, OUTGRP1.

• Link the SALE_DATE attribute of source table to the INGRP1 of the EXPRESSION operator.

• Right-click on OUTGRP1 and select Open Details... from the pop-up menu.
• This will display the Expression Editor window for the expression.

• Click on the Output Attributes tab and add a new output attribute OUTPUT1 of number type
and click OK.

• Click on OUTPUT1 output attribute in the EXPRESSION operator and turn our attention to the
property window of the Mapping Editor.

• The Properly Window shows Expression as its first property.

• Click the blank space after the label Expression.

• This shows a button with three dots.


Q10. Explain the Indexes and Partitions tab in the Table Editor.
Indexer Tab:
An index can greatly facilitate rapid access to a particular record. It is generally useful for
permanent tables that will be repeatedly accessed in a random manner by certain known
columns of data in the table.
It is not desirable to go through the effort of creating an index on a staging table, which will only
be accessed for a short amount of time during a data load. Also, it is not really useful to create
an index on a staging table that will be accessed sequentially to pull all the data rows at one
time.
An index is best used in situations where data is pulled randomly from large tables, but doesn't
provide any benefit in speed if you have to pull every record of the table.

Partition Tab:
A partition is a way of breaking down the data stored in a table into subsets that are stored
separately. This can greatly speed up data access for retrieving random records, as the
database will know the partition that contains the record being searched for based on the
partitioning scheme used.
It can directly home in on a particular partition to fetch the record by completely ignoring all the
other partitions that it knows won't contain the record.

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 48


Unit-V
ETL: Transformations and Other Operators

Validating, Generating, Deploying, and Executing Objects

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 49


Q1. Short note on ETL transformation.
The process of extracting data from source systems and bringing it into the data warehouse is
commonly called ETL, which stands for extraction, transformation, and loading. ETL functions
that are combined into one tool to pull data out of one database and place it into another
database.
Extract is the process of reading data from a database. During extraction, the desired data is
identified and extracted from many different sources, including database systems and
applications. Very often, it is not possible to identify the specific subset of interest; therefore
more data than necessary has to be extracted, so the identification of the relevant data will be
done at a later point in time.
Depending on the source system's capabilities (for example, operating system resources),
some transformations may take place during this extraction process. The size of the extracted
data varies from hundreds of kilobytes up to gigabytes, depending on the source system and
the business situation.
The same is true for the time delta between two (logically) identical extractions: the time span
may vary between days/hours and minutes to near real-time. Web server log files, for example,
can easily grow to hundreds of megabytes in a very short period of time
Transform is the process of converting the extracted data from its previous form into the form it
needs to be in so that it can be placed into another database. Transformation occurs by using
rules or lookup tables or by combining the data with other data. After data is extracted, it has to
be physically transported to the target system or to an intermediate system for further
processing.
Load is the process of writing the data into the target database. ETL is used to migrate data
from one database to another, to form data marts and data warehouses

Q2. What is the purpose of main attribute group in a cube? Discuss about dimension
attributes and measures in the cube.

The first group represents main attributes for the cube and contains data elements to which we
will need to map. Other groups represent the dimensions that are linked to the cube. As far as
the dimensions are concerned we make separate map for them prior to cube mapping.
The data we map for the dimensions will be to attributes in the main cube group, which will
indicate to the cube which record is applicable from each of the dimensions.

• Cube has attributes for surrogate and business identifiers defined for each dimension of the
cube.
• All business identifiers are prefixed with the name of the dimension
• The name of a dimension is used as the surrogate identifier for that dimension.

• Say for example, if SKU and NAME are two business identifiers in PRODUCT dimension, then
the main attribute group will have three PRODUCT related identifiers; PRODUCT_SKU,
PRODUCT_NAME, PRODUCT.

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 50


• Apart from surrogate and business identifiers, the main attribute group also contains the
measures we have defined for the cube.
Q3. Write the steps to add primary key for a columns of a table in Data Object Editor with
suitable example?

Here, table name is COUNTIES_LOOKUP


To add a primary key, we'll perform the following steps:
1. In the Design Center, open the COUNTIES_LOOKUP table in the Data Object Editor by
double-clicking on it under the Tables node.
2. Click on the Constraints tab.
3. Click on the Add Constraint button.
4. Type PK_COUNTIES_LOOKUP (or any other naming convention we might choose) in the
Name column.
5. In the Type column, click on the drop-down menu and select Primary Key.
6. Click on the Local Columns column, and then click on the Add Local Column button.
7. Click on the drop-down menu that appears and select the ID column.
8. Close the Data Object Editor.

Q4. Short note on Control Center Manager


Control Center Manager:
The Control Center Manager is the interface the Warehouse Builder provides for interacting with
the target schema. This is where the deployment of objects and subsequent execution of
generated code takes place.
The Design Center is for manipulating metadata only on the repository. Deployment and
execution take place in the target schema through the Control Center Service.
The Control Center Manager is our interface into the process where we can deploy objects and
mappings, check on the status of previous deployments, and execute the generated code in the
target schema.
We launch the Control Center Manager from the Tools menu of the Design Center main menu.
We click on the very first menu entry, which says Control Center Manager.
This will open up a new window to run the Control Center Manager, which will look similar to the
following

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 51


Q5. What are the two ways of validating repository objects in object editor?
Following are the two ways to validate an object from Data Object Editor:
 Right-click on the object displayed on the Canvas and select Validate from the pop-up menu
 Select the object displayed on the canvas and then click on the Validate icon from the
toolbar.
 Deploy Action: Following are the actions
 Create: Create the object; if an object with the same name already exists, this can generate
an error upon deployment
 Upgrade: Upgrade the object in place, preserving data
 Drop: Delete the object
 Replace: Delete and recreate the object; this option does not preserve data

Q6. Explain the concept of validating and generating objects.


Validating Objects
 The process of validation is all about making sure the objects and mappings we've defined
in the Warehouse Builder have no obvious errors in design.
 Oracle Warehouse Builder runs a series of validation tests to ensure that data object
definitions are complete and that scripts can be generated and deployed.
 When these tests are complete, the results are displayed.
 Oracle Warehouse Builder enables you to open object editors and correct any invalid
objects before continuing.
 Validating objects and mapping can be done with the help of Design Center.
 Validation of repository objects can be done with the help of Data Object Editor.
 Validation of mapping can be done through Mapping Editor.

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 52


Generating Objects
 Generation deals with creating the code that will be executed to create the objects and run
the mapping
 With the generation step in the Warehouse Builder, we can generate the code that we need
to use to build and load our data warehouse.
 The objects—dimensions, cube, tables, and so on—will have SQL Data Definition Language
(or DDL) statements produced, which when executed will build the objects in the database.
 The mappings will have the PL/SQL code produced that when it's run, will load the objects.
 Like validation, generation also can be done with the help of Data Object Editor and Mapping
Editor.

Q7. Write the steps for validating and generating in Data Object Editor
(I) Validating in the Data Object Editor:
Consider, we have POS_TRANS_STAGE table i.e. staging table defined.
Let's double-click on the POS_TRANS_STAGE table name in the Design Center to launch the
Data Object Editor so that we can discuss validation in the editor.

We can right-click on the object displayed on the Canvas and select Validate from the pop-up
menu
(ii) We can select Validate from the Object menu on the main editor menu bar.

(iii) To validate every object currently loaded into our Data Object Editor. It is to select Validate
All from the Diagram menu entry on the main editor menu bar. We can also press the validate
icon on the General Toolbar, which is circled in the following image of the toolbar icons:

When validating from the Design Center. Here we get another window created in the editor, the
Generation window, which appears below the Canvas window.

When we validate from the Data Object Editor, it is on an object-by-object basis for objects
appearing in the editor canvas. But when we validate a mapping in the Mapping editor, the
mapping as a whole is validated all at once. Let's close the Data Object Editor and move on to
discuss validating in the Mapping Editor.

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 53


But as with the generation from the Design Center, we'll have the additional information
available. The procedure for generating from the editors is the same as for validation, but the
contents of the results window will be slightly different depending on whether we're in the Data
Object Editor or the Mapping Editor. Let's discuss each individually as we previously did.

(II)Generating in the Data Object Editor:

Data Object Editor and open our POS_TRANS_STAGE table in the editor by double-clicking on
it in the Design Center.

To review the options we have for generating, there is the

(i) Generate... menu entry under the Object main menu, OR

(ii) The Generate entry on the pop-up menu when we right-click on an object,

(iii)Generate icon on the general toolbar right next to the Validate icon as shown in the following
image:

Result:

Q8. What is object deployment? Explain the functions of control center manager.
 Deployment is the process of creating physical objects in the target schema based on the
logical definitions created using the Design Center.

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 54


 The process of deploying is where the database objects are actually created and PL/SQL
code is actually loaded and compiled in the target database.
 During initial stages of design no physical objects have been created in the target schema.
 The operations such as importing metadata for tables, defining objects, mapping and so on
and do forth are performed with respect to OWB Design Center client.
 These objects are created as Warehouse Builder repository objects.
 So for the actual deployment of object in the target database, we have to use Control
Center Service, which must be running for the deployments to function.
 The Design Center creates a logical design of the data warehouse.
 The logical design will be stored in a workspace in the Repository on the server.
 The Control Center Manager is used for the creation of physical objects into the target
schema by deploying the logical design.
 The Control Center Manager is used to execute the design by running the code associated
with the ETL that we have designed.
 The Control Center Manager interacts with the Control Center Service, which runs on the
server.
 The Target Schema is where OWB will deploy the object to, and where the execution of the
ETL processes that load our data warehouse will take place.
Q9. Explain full and intermediate generation styles.
Full and intermediate generation styles
The generation style has two options we can choose from, full or intermediate. The Full option
will display the code for all operators in the complete mapping for the operating mode selected.
The Intermediate option allows us to investigate code for subsets of the full mapping option. It
displays code at the attribute group level of an individual operator.

If no attribute group is selected when we select the intermediate option in the drop-down menu,
we'll immediately get a message in the Script tab saying the following:
Please select an attribute group.
When we click on an attribute group in any operator on the mapping, the Script window
immediately displays the code for setting the values of that attribute group.

When we selected the Intermediate generation style, the drop-down menu and buttons on the
right-hand side of the window became active. We have a number of options for further
investigation of the code that is generated,

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 55


Unit-VI
Extra Features

Data warehousing and OLAP

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 56


Q1. Explain Metadata Change Management.
Metadata Change Management:-

Metadata change management includes keeping a track of different versions of an object or


mapping as we make changes to it, and comparing objects to see what has changed. It is
always a good idea to save a working copy of objects and mappings when they are complete
and function correctly.

If we need to make modifications later and something goes wrong, or we just want to reproduce
a system from an earlier point in time, we have a ready-made copy available for use. We won't
have to try to manually back out of any changes we might have made.

We would also be able to make comparisons between that saved version of the object and the
current version to see what has been changed.

The Warehouse Builder has a feature called the Recycle Bin for storing deleted objects and
mappings for a later retrieval.

It allows us to make copies of objects by including a clipboard for copying and pasting to and
from, which is similar to an operating system clipboard. It also has a feature called Snapshots,
which allows us to make a copy (or snapshot) of our objects at any point during the cycle of
developing our data warehouse that can later be used for comparisons.

Q2. What is recycling bin? Describe the features of warehouse builder recycle bin
window.
Recycle Bin

 The Recycle Bin in OWB is similar to recycle bin in operating systems.


 OWB keeps deleted objects in the recycle bin.
 The deleted objects can be restored from the Recycle Bin.
 To undo a deletion select an object from the Recycle Bin and click restore. .
 If “Put in Recycle Bin” check box is checked while deleting, then the object will be send to the
Recycle Bin.
Warehouse Builder Recycle Bin
 The Warehouse Builder Recycle Bin window can be opened by clicking on Tools menu and
selecting Recycle Bin option from the pop-up menu
 This window has a content area which shows all deleted objects.
 The content shown with Object Parent as well as Time Deleted information.
 The Object Parent means the project from which it was deleted from and the Time Deleted is
when we deleted the object.
Below the content area it has two buttons:

o One for restoring a deleted object


o Another for emptying the content of recycle bin

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 57


Q3. What is a snapshot? Explain full snapshot and signature snapshot.
Snapshot
 A snapshot is a point in time version of an object.
 The snapshot of an object captures all the metadata information about that object at the
time when the snapshot is taken.
 It enables you to compare the current object with a previously taken snapshot.
 Since objects can be restored from snapshots, it can be used as a backup mechanism.

There are two types of snapshot as follows:-

Full Snapshots:

 Full snapshots provide complete metadata of an object that you can use to restore it later.
 So it is suitable for making backups of objects.
 Full snapshots take longer time to create and require more storage space than signature
snapshots.
Signature Snapshots:

 It captures only the signature of an object.


 A signature contains enough information about the selected metadata component to detect
changes when compared with another snapshot or the current object definition.
 Signature snapshots are small and can be created quickly.

Q4. What are the different operations that can be performed on a snapshot of an object
that is created?

We can perform following operation on snapshot of object:


Restore: We can restore a snapshot from here, which will copy the snapshot objects back to
their place in the project, overwriting any changes that might have been made. It is a way to get
back to a previously known good version of an object if, for example, some future change
should break it for whatever reason.
Delete: If we do not need a snapshot anymore, we can delete it. However, be careful as there is
no recycle bin for deleted snapshots. Once it's deleted, it's gone forever. It will ask us if we are
sure before actually deleting it.

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 58


Convert to Signature: This option will convert a full snapshot to a signature snapshot.
Export: We can export full snapshots like we can export regular workspace objects. This will
save the metadata in a file on disk for backup or for importing later.
Compare: This option will let us compare two snapshots to each other to see what the
differences are.

Q4. Explain the export feature of Metadata Loader.


Metadata Loader (MDL):-

The workspace objects can be exported and save them to a file. We can export anything from
an entire project down to a single data object or mapping. Following are the benefits of
Metadata Loader exports and imports
 Backup
 To transport metadata definitions to another repository for loading

If we choose an entire project or a collection such as a node or module, it will export all objects
contained within it. If we choose any subset, it will also export the context of the objects so that
it will remember where to put them on import.

Say for example if we export a table, the metadata also contains the definition for:
 The module in which it resides
 The project the module is in.

We can also choose to export any dependencies on the object being exported if they exist.
Following steps are used to export object in Design Center:-
Select the project by clicking on it and then select Design | Export |Warehouse Builder
Metadata from the main menu.

Accept the default file names and locations, and click on the Export button.

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 59


Q5. Short note on

1. Metadata Snapshots
2. The Import Metadata Wizard

Metadata Snapshots:-

A snapshot captures all the metadata information about the selected objects and their
relationships at a given point in time. While an object can only have one current definition in a
workspace, it can have multiple snapshots that describe it at various points in time.
Snapshots are stored in the Oracle Database, in contrast to Metadata Loader exports, which are
stored as separate disk files. You can, however, export snapshots to disk files. Snapshots are
also used to support the recycle bin, providing the information needed to restore a deleted
metadata object.

When you take a snapshot, you capture the metadata of all or specific objects in your
workspace at a given point in time. You can use a snapshot to detect and report changes in
your metadata. You can create snapshots of any objects that you can access from the Projects
Navigator.
A snapshot of a collection is not a snapshot of just the shortcuts in the collection but a snapshot
of the actual objects.
Import Metadata Wizard:-

The Import Metadata Wizard automates importing metadata from a database into a module in
Oracle Warehouse Builder. You can import metadata from Oracle Database and non-Oracle
databases.
Each module type that stores source or target data structures has an associated Import Wizard,
which automates the process of importing the metadata to describe the data structures.
Importing metadata saves time and avoids keying errors, for example, by bringing metadata
definitions of existing database objects into Oracle Warehouse Builder.
The Welcome page of the Import Metadata Wizard lists the steps for importing metadata from
source applications into the appropriate module. The Import Metadata Wizard for

Oracle Database supports importing of tables, views, materialized views, dimensions, cubes,
external tables, sequences, user-defined types, and PL/SQL transformations directly or through
object lookups using synonyms.

When you import an external table, Oracle Warehouse Builder also imports the associated
location and directory information for any associated flat files.

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 60


Q6. What are the matching strategies for synchronizing workspace objects with its
mapping operator? Explain inbound and outbound synchronization.

Synchronizing means maintaining consistency between workspace object with its mapping
operator.
It can be achieved with the help of:-
 Inbound Synchronization
 Outbound Synchronization
Inbound Synchronization:-
Inbound uses the specified repository object to update the operator in our mapping for
matching. It means that the changes in workspace object will be reflected in mapping operator.
Outbound Synchronization:-
Outbound option would update the workspace object with the changes we've made to the
operator in the mapping. It means that the changes in mapping operator will be reflected in
workspace object
Following are the three matching strategies:-
Match by Object Identifier
Each source attribute is identified with a uniquely created ID internal to the Warehouse Builder
metadata. The unique ID stored in the operator for each attribute is exactly same as that of the
corresponding attribute in the workspace object to which the operator is synchronized with. This
matching strategy compares the unique object identifier of an operator attribute with that of a
workspace object.
Match by Object Name
This strategy matches the bound names of the operator attributes to the physical names of the
workspace object attributes.

Match by Object Position


This strategy match’s operator attributes with attributes of the selected workspace object by
position. The first attribute of the operator is synchronized with the first attribute of the
workspace object, the second with the second, and so on.

Q7. Explain OLAP Terminologies.

OLAP Terminologies:
Cube:-

Data in OLAP databases is stored in cubes. Cubes are made up of dimensions and measures.
A cube may have many dimensions.
Dimensions:-
In an OLAP database cube categories of information are called dimensions. Some dimensions
could be Location, Products, Stores, and Time.

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 61


Measures:-
Measures are the numeric values in an OLAP database cube that are available for analysis. The
measures could be margin; cost of goods sold, unit sales, budget amount, and so on.
Multidimensional:-
Multidimensional databases create cubes of aggregated data that anticipate how users think
about business models. These cubes also deliver this information efficiently and quickly. Cubes
consist of dimensions and measures. Dimensions are categories of information. For example,
locations, stores and products are typical dimensions. Measures are the content values in a
database that are available for analysis.
Members:-
In a OLAP database cube, members are the content values for a dimension. In the location
dimension, they could be Mumbai, Thane, Mulund and so on. These are all values for location .

Q8. Explain multidimensional database architecture with suitable diagram.


One of the design objectives of the multidimensional server is to provide fast, linear access to
data regardless of the way the data is being requested.
The simplest request is a two-dimensional slice of data from an n-dimensional hypercube. The
objective is to retrieve the data equally fast, regardless of the requested dimensions. The
requested data is a compound slice in which two or more dimensions are nested as rows or
columns
The second role of the server is to provide calculated results. By far the most common
calculation is aggregation; but more complex calculations, such as ratios and allocations, are
also required.
In fact, the design goal should be to offer a complete algebraic ability where any cell in the
hypercube can be derived from any of the others, using all standard business and statistical
functions, including conditional logic.

Q9. Explain MOLAP and its advantages and disadvantages.

MOLAP (Multi-dimensional Online Analytical Processing):-

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 62


MOLAP stands for Multi-dimensional Online Analytical Processing. MOLAP is the most
used storage type. It is designed to offer maximum query performance to the users. The data
and aggregations are stored in a multidimensional format, compressed and optimized for
performance.
When a cube with MOLAP storage is processed, the data is pulled from the relational database,
the aggregations are performed, and the data is stored in the AS database in the form of binary
files. The data inside the cube will refresh only when the cube is processed, so latency is high.
Advantages:
 The data is stored on the OLAP server in optimized format; queries (even complex
calculations) are faster than ROLAP.
 The data is compressed so it takes up less space.
 The data is stored on the OLAP server; we don’t need to keep the connection to the relational
database.
 Cube browsing is fastest using MOLAP.
Disadvantages:
This doesn’t support REAL TIME i.e. newly inserted data will not be available for analysis until
the cube is processed.

Q10. Explain ROLAP and its advantages and disadvantages.

ROLAP stands for Relational Online Analytical Processing. They provide a multidimensional
view of this data. Below diagram show the architecture of ROLAP.
All of the relational OLAP vendors store the data in a special way known as a star or snowflake
schema. The most common form of these stores the data values in table known as the fact
table.
One dimension is selected as the fact dimension and this dimension forms the columns of the
fact table. The other dimensions are stored in additional tables with the hierarchy defined by
child-parent columns.

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 63


The dimension tables are then relationally joined with the fact table to allow multidimensional
queries. The data is retrieved from the relational database into the client tool by SQL queries.

Since SQL was designed as an access language to relational databases, it is not necessarily
optimal for multidimensional queries.

The vast majority of ROLAP applications are for simple analysis of large volumes of information.
Retail sales analysis is the most common one.

The complexity of setup and maintenance has resulted in relatively few applications of ROLAP
to financial data warehousing applications such as financial reporting or budgeting.
Q11. Explain RAP (Real-Time Analytical Processing).

Real-Time Analytical Processing (RAP):-

RAP takes the approach that derived values should be calculated on demand, not pre-
calculated. This avoids both the long calculation time and the data explosion that occur with the
pre-calculation approach used by most OLAP vendors.

In order to calculate on demand quickly enough to provide fast response, data must be stored in
memory. This greatly speeds calculation and results in very fast response to the vast majority of
requests.

Another refinement of this would be to calculate numbers when they are requested but to retain
the calculations (as long as they are still valid) so as to support future requests. This has two
compelling advantages.

First, only those aggregations which are needed are ever performed. In a database with a
growth factor of 1,000 or more, many of the possible aggregations may never be requested.
Second, in a dynamic, interactive update environment, budgeting being a common example,
calculations is always up to date.

There is no waiting for a required pre-calculation after each incremental data change. It is
important to note that since RAP does not pre-calculate, the RAP database is typically 10 per
cent to 25 per cent the size of the data source.

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 64


This is because the data source typically requires something like 50-100 bytes per record and
possibly more. Generally, the data source stores one number per record that will be input into
the multidimensional database. Since RAP stores one number (plus indexes) in approximately
12 bytes, the size ratio between RAP and the data source is typically between 12 / 100=12%
and 12 / 50=24%.

Q12. Explain Data Explosion & Data Sparsity.

Data Explosion:-

Following example demonstrate the concept of data explosion in data warehousing.

Consider what happens when a 200 MB source file explodes to 10 GB. The database no longer
fits on any laptop for mobile computing. When a 1 GB source file explodes to 50 GB it cannot be
accommodated on typical desktop servers.

In both cases, the time to pre-calculate the model for every incremental data change will likely
be many hours. So even though disk space is cheap, the full cost of pre-calculation can be
unexpectedly large.

Therefore it is critically important to understand what data sparsity and data explosion are, what
causes these, and how these can be avoided for, the consequences of ignoring data explosion
can be very costly and, in most cases, result in project failure.

There are three main factors that contribute to data explosion.

 Sparsely populated base data increases the likelihood of data explosion


 Many dimensions in a model increase the likelihood of data explosion, and
 A high number of calculated levels in each dimension increase the likelihood of data
explosion

Data Sparsity:-

Input data or base data in OLAP applications is typically sparse (not densely populated). Also,
as the number of dimensions increase, data will typically become sparser (less dense). Data
explosion is the phenomenon that occurs in multidimensional models where the derived or
calculated values significantly exceed the base values.

Q13. Differentiate between ROLAP and MOLAP

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 65


ROLAP MOLAP
It is used when DW contains relational data. It is used when DW. DW contains relational as
well as non-relational data.
In this different cubs will created dynamically In this different cubs will got already stored.
It does not contain prefabricated cubes. It contains different prefabricated cubes.
It contains analytical server. It contains MDDB server.
It is comparatively slower than MOLAP. It is comparatively faster than ROLAP.
It consumes less amount of memory. It consumes more amount of memory.
The architecture is simple and easy to The architecture is different to implement.
implement.

Salvi College Assistant Professor: Sonu Raj | 8976249271 Page 66

You might also like