You are on page 1of 15

DATA WAREHOUSING

Public

Introduction to DWH
Introduction, Course agenda and assumption
Discussions and Clarifications
What is Data warehouse?
Need of Data warehouse ( reporting aspects,
BI application types (ad hoc, standard reporting, analytic
applications,
dashboards) and audience
Different Approach of Data warehouse design
Advantages and disadvantages of different Data warehouse design
Layers OF DWH OLTP,OLAP,DATAMART,ODS,etc
Dimensional modeling Fundamentals
Basic characteristic of Fact and Dimension Table
Types of Dimension
Types of Facts
Public

What is a DWH:
A Data Warehouse is
Subject oriented
Integrated
Time Variant
Non-Volatile

Public

OLAP

Vs

OLTP

It involves historical processing of


information.

It involves day-to-day processing

OLAP systems are used by knowledge


workers such as executives, managers, and
analysts.

OLTP systems are used by clerks, DBAs, or


database professionals.

It is used to analyze the business.

OLTP systems are used by clerks, DBAs, or


database professionals.

It is used to analyze the business.

It is used to run the business.

It provides summarized and consolidated


data.

It provides primitive and highly detailed


data.

It provides summarized and


multidimensional view of data

It provides detailed and flat relational view


of data.

The number of users is in hundreds.

The number of users is in thousands.

The number of records accessed is in


millions.

The number of records accessed is in tens.

The database size is from 100GB to 100 TB.

The database size is from 100 MB to 100


GB.

It is based on Star Schema, Snowflake


Schema.

It is based on Entity Relationship Model.

Public

DWH Architecture:

Data Warehouses can be architected in many different ways,


depending on the specific needs of a business. The model shown
below is the Bus architecture" Data Warehousing architecture
that is popular in many organizations.

Public

DWH Architecture:
Different data warehousing systems have different structures.
Some may have an ODS, while some may have multiple Data
Marts.
In general, all data warehouse systems have the following layers:

Public

Staging Area:
Staging area is a place where you hold temporary tables on
data warehouse server.
We basically need staging area to hold the data, and perform
data cleansing and merging, before loading the data into
warehouse.
In the absence of a staging area, the data load will have to go
from the OLTP system to the OLAP system directly

Public

ODS:
An operational data store (ODS) is a type of database
that's often used as an interim logical area for a data
warehouse.
An Operational Data Store (ODS) is an integrated
database of operational data.
Its sources include legacy systems and it contains current
or near term data.
An ODS may contain 5-10 years of information, while a
data warehouse typically contains years of data.
ODS is specially designed such that it can quickly
perform relatively simply queries on smaller volumes of
data such as finding orders of a customer or looking for
available items in the retails store.
Public

Data Mart:
A data mart is the access layer of the DWH environment that is used
to get data out to the users.
The data mart is a subset of the data warehouse that is usually
oriented to a specific business line or team.
DWH

Data Mart

Holds multiple subject areas

Holds only one subject area

Holds very detailed


information

May hold more summarized


data

Works to integrate all data


sources

Concentrates on integrating
information from a given
subject area or set of source
systems
Public

Dimensional Modeling:
Dimensional modeling is the design concept used by many
data warehouse designers to build their data warehouse.
Dimensional model is the underlying data model used by
many of the commercial OLAP products available today in
the market.
In this model, all data is contained in two types of tables:
Dimension
Fact

There are two types of Dimensional Modeling being


followed in DWH:
Star Schema
Snowflake Schema

Public

Dimensional Modeling:
Star Schema
A star schema model can be
depicted as a simple star: a
central table contains fact
data and multiple tables
radiate out from it, connected
by the primary and foreign
keys of the database.
Snowflake Schema
The snowflake schema
represents a dimensional
model which is also composed
of a central fact table and a
set of constituent dimension
tables which are further
normalized into sub-dimension
tables.
Public

Public

Dimension:
A dimension Table consists of the attributes about the facts.
Dimensions store the textual descriptions of the business.
The different types of dimensions are:
Conformed Dimension:
A dimension that has exactly the same meaning and content when being referred
to from different fact tables.
Junk Dimension:
A junk dimension is a collection of random transactional codes flags and/or text
attributes that are unrelated to any particular dimension. The junk dimension is
simply a structure that provides a convenient place to store the junk attributes.
Degenerate Dimensions:
A degenerate dimension is when the dimension attribute is stored as part of fact
table, and not in a separate dimension table.
Role Playing Dimensions:
A role-playing dimension is one where the same dimension key along with its
associated attributes can be joined to more than one foreign key in the fact
table.
Public

Dimension:
Slowly Changing Dimensions:
Attributes of a dimension that would undergo changes over time.
It depends on the business requirement whether particular
attribute history of changes should be preserved in the data
warehouse.

We have different strategies to handle Slowly Changing


Dimensions:
oSCD 1
oSCD 2
oSCD 3

Rapidly Changing Dimensions:


A dimension attribute that changes frequently is a Rapidly
Changing Attribute.
Public

Fact:
A fact table is the one which consists of the measurements, metrics or facts of
business process.
These measurable facts are used to know the business value and to forecast the future
business.
The different types of facts are:

Additive:
Additive facts are facts that can be summed up through all of the dimensions in the
fact table. A sales fact is a good example for additive fact.

Semi-Additive:
Semi-additive facts are facts that can be summed up for some of the dimensions in the
fact table, but not the others. Eg: Daily balances fact can be summed up through the
customers dimension but not through the time dimension.

Non-Additive:
Non-additive facts are facts that cannot be summed up for any of the dimensions
present in the fact table. Eg: Facts which have percentages, ratios calculated.

Factless Fact Table:


In the real world, it is possible to have a fact table that contains no measures or facts.
These tables are called "Factless Fact tables".

Public

Public