You are on page 1of 27

Design and implement Data warehouse:

TEAM MENTOR –SHAHID A,PRACHI SARANG


TEAM MEMBERS – ASWINI , MONY TOPPO

JAN 22, 2019


Data Warehouse
• A Data warehouse (DW or DWH), also known as an enterprise data
warehouse (EDW), is a system used for reporting and data analysis, and
is considered a core component of business intelligence.

• DWs are central repositories of integrated data from one or more


sources.

• They store current and historical data in one single place and are used
for creating analytical reports for knowledge workers throughout the
enterprise.

2
Design of Data Warehouse:
There are 3 strategies of implementing a Data Warehouse
Bottom Up Design:
In the bottom-up approach, data marts are first created to provide
reporting and analytical capabilities for specific business processes.
Top Down Design:
The top-down approach is designed using "Atomic" data, that is, data at
the greatest level of detail, are stored in the data warehouse.
Hybrid Design:
A hybrid DW database is kept on third normal form to eliminate data
redundancy and makes use of features of both the above mentioned
designs.
3
Implementation of Database
They follows the process of ETL i.e. Extraction, Transform and Loading.

Designing and implementation of Database includes these 7 steps:

Step 1: Determine Business Objectives


Step 2: Collect and Analyze Information
Step 3: Identify Core Business Processes
Step 4: Construct a Conceptual Data Model
Step 5: Locate Data Sources and Plan Data Transformations
Step 6: Set Tracking Duration
Step 7: Implement the Plan

4
Dimension and Fact tables
Fact Table:
• A fact table is a primary table in a dimensional model.
• A Fact Table contains
• Measurements/facts
• Foreign key to dimension table
Dimension table:
• A dimension table contains dimensions of a fact.
• They are joined to fact table via a foreign key.
• Dimension tables are de-normalized tables.
5
Contd.
Dimension attributes should be:
1. Verbose (labels consisting of full words)
2. Descriptive
3. Complete (having no missing values)
4. Discretely valued (having only one value per dimension table row)
5. Quality assured (having no misspellings or impossible values)

6
Star and Snowflake Schema
Star Schema
• The star schema architecture is the simplest data warehouse schema.

• It is called a star schema because the diagram resembles a star, with


points radiating from a center.

• The center of the star consists of fact table and the points of the star
are the dimension tables.

8
Star Schema
Example
The main characteristics of star schema:

• Simple structure
• Great query effectives
• Relatively long time of loading data into dimension tables

10
Snowflake Schema
• The snowflake schema architecture is a more complex variation of the
star schema used in a data warehouse, because the tables which
describe the dimensions are normalized.

• The snowflake schema is represented by centralized fact tables which


are connected to multiple dimensions.

• In the snowflake schema, dimensions are normalized into multiple


related tables,whereas the star schema’s dimensions are de-normalized
with each dimension represented by a single table

11
Snowflake
Schema
Slowly Changing Dimensions
• Dimension attributes that change slowly over a period rather than changing regularly.
• Data captured by Slowly Changing Dimensions (SCDs) change slowly but unpredictably,
rather than according to a regular schedule.
E.g. Transfer of a person causing a change in his regional office's id.

• These scenarios can sometimes cause referential integrity problems and can be dealt
with many methodologies:

• Type 0: retain original


• Type 1: overwrite
• Type 2: add new row
• Type 3: add new attribute
• Type 4: add history table
• Type 6: (1+2+3)

13
Examples
• Type 1:

• Type 2:

• Type 3:

14
Contd.
• Type 4:

• Type 6:

15
Dimension Relationships
Dimension Relationships
• A relationship between a dimension and a measure group consists of
the dimension and fact tables participating in the relationship and a
granularity attribute that specifies the granularity of the dimension in
the particular measure group.
• Types of Dimensions -
1.Conformed dimensions
2.Junk dimensions
3.Role Playing dimensions
4.Degenerate dimensions

17
Conformed and Junk Dimensions
•Conformed dimension
–Shared by multiple fact tables.
–Used when all business users have
the same definitions for the dimension.
Figure : Conformed Dimension
• Junk dimension
–Dimension table targeted to a single fact
table.
–Used when dimensions have different
Figure : Junk Dimension
definitions for different business units.
18
Role Playing and Degenerate Dimensions
•Role Playing dimension
–Has multiple valid relationships
with a fact table.
–Play different roles in a fact table
depending on the context. Figure : Role Playing Dimension

•Degenerate dimension
–Used by a single fact table.
–Dimension value is stored directly in
the fact table.
Figure : Degenerate Dimension
–No corresponding dimension table.
19
Facts
•Facts are the key metrics used to measure business results:
–Sales
–Production
–Inventory
•Can be additive e.g. sales
•semi-additive e.g. inventory
•non-additive e.g. profit percent

20
What are Fact Tables?
• In data warehousing, a Fact table consists of the measurements,
metrics or facts of a business process.
• It is located at the center of a star schema or a snowflake
schema surrounded by dimension tables.
• A fact table typically has two types of columns: those that contain facts
and those that are a foreign key to dimension tables. The primary key
of a fact table is usually a composite key that is made up of all of its
foreign keys.
• Fact tables contain the content of the data warehouse and store
different types of measures like additive, non additive, and semi
additive measures.

21
Figure1: Fact Table Figure2: Fact Table Example

22
Granularity

•Granularity refers to the level of detail in which facts are recorded.


•Facts can be at different levels of granularity.
• Granularity is determined based on business needs.
•It is the lowest level of information stored in the fact table.
Example- year , month , quarter, period , week ,day (date dimension)

23
Various Types of Measures
1. Additive measures : Measures that can be added across all
dimensions.

24
2.Semi-additive measures

• Measures that can be added across some, but not all dimensions.

25
3.Non-additive measures
• measures that cannot be added across any dimensions.

26
Discussion

You might also like