You are on page 1of 5

Data Modeling for Data Warehouse

 How to structure the data in your data warehouse?


 Process that produces abstract data models for one or more database
components of the data warehouse
 Modeling for Warehouse is different from that for Operational database
 Dimensional Modeling, Star Schema Modeling or Fact/Dimension
Modeling

Modeling Techniques

 Entity-Relationship Modeling
 Traditional modeling technique
 Technique of choice for OLTP
 Suited for corporate data warehouse
 Dimensional Modeling
 Analyzing business measures in the specific business context
 Helps visualize very abstract business questions
 End users can easily understand and navigate the data structure

In simple terms,
Dimensional modeling is one of the methods of data modeling that help us store the data in
such a way that it is relatively easy to retrieve the data from the database.
ER Modeling gives us the advantage of storing data is such a way that there is less redundancy
Goals and Benefits of Dimensional Modelling

1. Faster Data Retrieval


2. Better Understandability
3. Extensibility

Dimensions and Facts


In dimensional model, everything is divided in 2 distinct categories - dimension or measures.
Anything we try to model, must fit in one of these two categories.

Dimensions:
Dimensions are the object or context. That is - dimensions are the 'things' about which
something is being spoken.

Facts/Measures:
Measures are the quantifiable subjects and these are often numeric in nature.
Measures are not stored in the dimension tables. A separate table is created for storing
measures. This table is called Fact Table.

Schema Types:

Why?
In Dimensional modeling, we can create different schema to suit our requirements. We need
various schema to accomplish several things like accommodating hierarchies of a dimension or
maintaining change histories of information etc.

Star Flake Schema


Star schema is the simplest kind of schema where one fact table is present in the center of the
schema surrounded by multiple dimension tables.
In a star schema all the dimension tables are connected only with the fact table and no
dimension table is connected with any other dimension table.
Benefits of Star Flake Schema
Star schema is probably most popular schema in dimensional modeling because of its simplicity
and flexibility.
In a Star schema design, any information can be obtained just by traversing a single join, which
means this type of schema will be ideal for information retrieval (faster query processing).
Here, note that all the hierarchies (or levels) of the members of a dimension are stored in the
single dimension table
SNOW-FLAKE Schema
Snow flake schema is just like star schema but the difference is, here one or more dimension
tables are connected with other dimension table as well as with the central fact table.

This has obvious disadvantage in terms of information retrieval since we need to read more
tables (and traverse more SQL joins) in order to get the same information.
Example, if you wish to find out all the food, food type sold from store 1, the SQL queries from
star and snowflake schemata will be like below:
SQL Query for Star Schema:
SELECT DISTINCT f.name, f.type
FROM food f, sales_fact t
WHERE f.key = t.food_key
AND t.store_key = 1
SQL Query For SnowFlake Schema
SELECT DISTINCT f.name, tp.type_name
FROM food f, type tp, sales_fact t
WHERE f.key = t.food_key
AND f.type_key = tp.key
AND t.store_key = 1

As you can see in this example, compared to star schema, snowflake schema requires one more
join (to connect one more table) to retrieve the same information. This is why snowflake
schema is not good performance wise.

Slowly Changing Dimensions (Data in dimension tables changes bit slowly or


rarely)

3 Types
Certain kinds of dimension attribute changes need to be handled differently in Data Warehouse

 Type I – Overwrite the existing data


 e.g. Name Correction, Description changes
 Type II - Partition History (Preserves history data)
 Packing change, Customer movement
 Create a new dimension record with new surrogate key
 Type III - Organizational changes (Keeps previous and current record)
 Sales Force Reorganization
 Show by sales broken by new and old organizations
 Need to create an old and a new field

You might also like