You are on page 1of 15

OLAP IN THE DATA WAREHOUSE

A Multidimensional Data Model


A multidimensional model views data in the form of a data-cube.

A data cube enables data to be modelled and viewed in multiple dimensions. It is defined by
dimensions and facts.

The dimensions are the perspectives or entities concerning which an organization keeps
records.

For example, a shop may create a sales data warehouse to keep records of the store's sales
for the dimension time, item, and location.

A multidimensional data model is organized around a central theme, for example, sales. This
theme is represented by a fact table. Facts are numerical measures.

The fact table contains the names of the facts or measures of the related dimensional tables.

Consider the data of a shop for items sold per quarter in the city of Delhi.

The data is shown in the table. In this 2D representation, the sales for Delhi are shown for the
time dimension (organized in quarters) and the item dimension.

Star Schema
A star schema is the elementary form of a dimensional model, in which data are organized
into facts and dimensions.

A fact is an event that is counted or measured, such as a sale or log in.

A dimension includes reference data about the fact, such as date, item, or customer.

A star schema is a relational schema where a relational schema whose design represents a
multidimensional data model. The star schema is the explicit data warehouse schema. It is
known as star schema.

A table in a star schema which contains facts and connected to dimensions.

A fact table has two types of columns: those that include fact and those that are foreign
keys to the dimension table.

The primary key of the fact tables is generally a composite key that is made up of all of its
foreign keys.

A dimension is an architecture usually composed of one or more hierarchies that categorize


data. If a dimension has not got hierarchies and levels, it is called a flat dimension or list.

The primary keys of each of the dimensions table are part of the composite primary
keys of the fact table.

Fact tables store data about sales while dimension tables data about the geographic
region (markets, cities), clients, products, times, channels.

Snowflake Schema
A snowflake schema is equivalent to the star schema.

"A schema is known as a snowflake if one or more dimension tables do not connect directly
to the fact table but must join through other dimension tables."

The snowflake schema is an expansion of the star schema where each point of the star
explodes into more points.

It is called snowflake schema because the diagram of snowflake schema resembles a


snowflake.

The following diagram shows a snowflake schema with two dimensions, each having three
levels. A snowflake schemas can have any number of dimension, and each dimension can
have any number of levels.

Example: Figure shows a snowflake schema with a Sales fact table, with Store, Location,
Time, Product, Line, and Family dimension tables.

The Market dimension has two dimension tables with Store as the primary dimension table,
and Location as the outrigger dimension table. The product dimension has three dimension
tables with Product as the primary dimension table, and the Line and Family table are the
outrigger dimension tables.
Following Figure shows a simple STAR schema for sales in a manufacturing company.
The sales fact table include quantity, price, and other relevant metrics. SALESREP,
CUSTOMER, PRODUCT, and TIME are the dimension tables.
The STAR schema for sales, as shown above, contains only five tables, whereas the
normalized version now extends to eleven tables. We will notice that in the snowflake
schema, the attributes with low cardinality in each original dimension tables are
removed to form separate tables. These new tables are connected back to the original
dimension table through artificial keys.

A snowflake schema is designed for flexible querying across more complex dimensions
and relationship.

It is suitable for many to many and one to many relationships between dimension
levels.

Fact Constellation Schema


A Fact constellation means two or more fact tables sharing one or more dimensions.
It is also called Galaxy schema.
Fact Constellation Schema is a sophisticated database design that is difficult to
summarize information.

Fact Constellation Schema can implement between aggregate Fact tables or


decompose a complex Fact table into independent simplex Fact tables.

A fact constellation schema is shown in the figure below.

This schema defines two fact tables, sales, and shipping.

Sales are treated along four dimensions, namely, time, item, branch, and location.
The schema contains a fact table for sales that includes keys to each of the four
dimensions, along with two measures: Rupee_sold and units_sold.

The shipping table has five dimensions, or keys: item_key, time_key, shipper_key,
from_location, and to_location, and two measures: Rupee_cost and units_shipped.

Concept hierarchy
A concept hierarchy represents a series of mappings from a set of low-level concepts
to larger-level, more general concepts.

Concept hierarchy organizes information or concepts in a hierarchical structure or a


specific partial order, which are used for defining knowledge in brief, high-level
methods, and creating possible mining knowledge at several levels of abstraction.

Concept hierarchy enables raw information to be managed at a higher and more


generalized level of abstraction.

There are several types of concept hierarchies which are as follows –

Schema Hierarchy − Schema hierarchy represents the total or partial order between
attributes in the database.

It can define existing semantic relationships between attributes. In a database, more


than one schema hierarchy can be generated by using multiple sequences and
grouping of attributes.

Set-Grouping Hierarchy − A set-grouping hierarchy constructs values for a given


attribute or dimension into groups or constant range values. It is also known as
instance hierarchy because the partial series of the hierarchy is represented on the set
of instances or values of an attribute.

These hierarchies have more functional sense and are so approved than other
hierarchies.

Operation-Derived Hierarchy − Operation-derived hierarchy is represented by a set


of operations on the data.

These operations are defined by users, professionals, or the data mining system.

These hierarchies are usually represented for mathematical attributes. Such operations
can be as easy as range value comparison, as difficult as a data clustering and data
distribution analysis algorithm.
Rule-based Hierarchy − in a rule-based hierarchy either a whole concept hierarchy or
an allocation of it is represented by a set of rules and is computed dynamically based
on the current information and rule definition.

A lattice-like architecture is used for graphically defining this type of hierarchy, in which
each child-parent route is connected with a generalization rule.

OLAP Operations in the Multidimensional Data Model


In the multidimensional model, the records are organized into various dimensions, and
each dimension includes multiple levels of abstraction described by concept
hierarchies.

This organization support users with the flexibility to view data from various
perspectives.

Consider the OLAP operations which are to be performed on multidimensional data.

Roll-Up
The roll-up operation (also known as drill-up or aggregation operation) performs
aggregation on a data cube, by climbing down concept hierarchies, i.e., dimension
reduction. Roll-up is like zooming-out on the data cubes.

The roll-up operation aggregates the data by ascending the location hierarchy from
the level of the city to the level of the country.

When a roll-up is performed by dimensions reduction, one or more dimensions are


removed from the cube.

For example, consider a sales data cube having two dimensions, location and time.
Roll-up may be performed by removing, the time dimensions, appearing in an
aggregation of the total sales by location, relatively than by location and by time.

Consider the following cubes illustrating temperature of certain days recorded


weekly:
Temperature 64 65 68 69 70 71 72 75 80 81 83 85

Week1 1 0 1 0 1 0 0 0 0 0 1 0

Week2 0 0 0 1 0 0 1 2 0 1 0 0

By doing this, we contain the following cube:

Temperature cool mild hot

Week1 2 1 1

Week2 2 1 1

Drill-Down
The drill-down operation (also called roll-down) is the reverse operation of roll-up.
Drill-down is like zooming-in on the data cube. It navigates from less detailed record
to more detailed data.

Drill-down can be performed by either stepping down a concept hierarchy for a


dimension or adding additional dimensions.

Because a drill-down adds more details to the given data, it can also be performed by
adding a new dimension to a cube.
For example, a drill-down on the central cubes of the figure can occur by introducing
an additional dimension, such as a customer group.

Example
Drill-down adds more details to the given data:

Temperature cool mild hot

Day 1 0 0 0

Day 2 0 0 0

Day 3 0 0 1

Day 4 0 1 0

Day 5 1 0 0

Day 6 0 0 0

Day 7 1 0 0

Day 8 0 0 0

Day 9 1 0 0

Day 10 0 1 0

Day 11 0 1 0

Day 12 0 1 0

Day 13 0 0 1

Day 14 0 0 0
Slice
A slice is a subset of the cubes corresponding to a single value for one or more
members of the dimension.

For example, a slice operation is executed when the customer wants a selection on
one dimension of a three-dimensional cube resulting in a two-dimensional site.

So, the Slice operations perform a selection on one dimension of the given cube, thus
resulting in a subcube.

For example, if we make the selection, temperature=cool we will obtain the following
cube:

Temperature cool

Day 1 0

Day 2 0

Day 3 0

Day 4 0

Day 5 1

Day 6 1

Day 7 1
Day 8 1

Day 9 1

Day 11 0

Day 12 0

Day 13 0

Day 14 0

Dice
The dice operation describes a subcube by operating a selection on two or more
dimension.

Temperature cool hot

Day 3 0 1

Day 4 0 0

Consider the following diagram, which shows the dice operations.


The dice operation on the cubes based on the following selection criteria involves three
dimensions.

o (location = "Toronto" or "Vancouver")


o (time = "Q1" or "Q2")
o (item =" Mobile" or "Modem")

Pivot
The pivot operation is also called a rotation. Pivot is a visualization operations
which rotates the data axes in view to provide an alternative presentation of
the data.

Types of OLAP
There are three main types of OLAP servers are as following:

ROLAP stands for Relational OLAP, an application based on relational


DBMSs.
It uses relational or extended DBMS to store and manage warehouse data.
ROLAP has basically 3 main components: Database Server, ROLAP server,
and Front-end tool.
Advantages of ROLAP –
 ROLAP is used for handle the large amount of data.
 ROLAP tools don’t use pre-calculated data cubes.
 Data can be stored efficiently.
 ROLAP can leverage functionalities inherent in the relational database.

Disadvantages of ROLAP –
 Performance of ROLAP can be slow.
 In ROALP, difficult to maintain aggregate tables.
 Limited by SQL functionalities.

MOLAP stands for Multidimensional OLAP, an application based on


multidimensional DBMSs.
The storage utilization may be low with multidimensional data stores. Many
MOLAP server handle dense and sparse data sets by using two levels of
data storage representation. MOLAP has 3 components: Database Server,
MOLAP server, and Front-end tool.
HOLAP stands for Hybrid OLAP, an application using both relational and
multidimensional techniques.
Hybrid is a combination of both ROLAP and MOLAP.
It offers functionalities of both ROLAP and as well as MOLAP like faster
computation of MOLAP and higher scalability of ROLAP.
The aggregations are stored separately in MOLAP store. Its server allows
storing the large data volumes of detailed information.
Advantages of HOLAP –
 HOLAP provides the functionalities of both MOLAP and ROLAP.
 HOLAP provides fast access at all levels of aggregation.
Disadvantages of HOLAP –
HOLAP architecture is very complex to understand because it supports both
MOLAP and ROLAP.

You might also like