Professional Documents
Culture Documents
Each of these units must be treated separately and in combination, and since there may be multiple components in
each (multiple feeds to ETL, multiple databases or data repositories that constitute the warehouse, multiple data
marts), each of these subsystems must be individually validated.
Subject oriented : Data warehouse is maintained different subject areas (Sales, product, location ect...)
Integrated : Data collected from multiple sources integrated into a User readable unique format.
Non volatile : Maintain Historical date.
Time variant : data display the weekly, monthly, quarterly, and yearly.
Subject oriented: DWH is a subject-oriented database which supports the business needs of Individual
Departments in the enterprise
Example: SALES, HR, ACCOUNTS, CLAIMS etc….
Integrated:
Non Volatile:
Time variant:
According to Ralph Kimball, A DWH is a relational DB, which is specifically designed for analyzing the business
But not for business transactional processing. A DWH is designed to support decision making process.
Since the DB contains historical data which requires for business analysis process hence it is called historical DB.
Since the DB is designed to support decision making process, hence it is called decision support system (DSS).
A subset of data warehouse is called Data mart. This supports the business needs of individual departments
within the enterprise.
Corporate/Enterprise-wide Departmental
Union of all data marts A single business process
Data received from staging
area Star-join (facts &
dimensions)
Data
Structure to suit the
Advantage:
Faster and easier implementation of manageable pieces Favorable return on
investment and proof of concept less risk of failure.
Disadvantage: Each data mart has its own narrow view of data
Confirmed dimension
A conformed dimension is a dimension that has exactly the same meaning and content when being referred from
different fact tables. AS YOU CAN SEE IN THE BELOW FIGURE,THE TIME AND CUST DIMENSIONS ARE CALLED CONFIRMED
DIMENSIONS AS THEY ARE SHARED ACROSS MULTIPLE FACT TABLES WITH THE SAME MEANING.
Junk Dimension:
your source legacy systems and review the individual fields in source data structures
for customer, product, order, sales territories, promotional campaigns, and so on.
Most of these fields wind up in the dimension tables. You will notice that some fields like
miscellaneous flags and textual fields are left in the source data structures. These include
yes/no flags, textual codes, and free form texts.
Some of these flags and textual data may be too unclear to be of real value. These
may be leftovers from past conversions from manual records created long ago. However,
many of the flags and texts could be of value once in a while in queries. These may not
be included as significant fields in the major dimensions. At the same time, these flags
and texts cannot be discarded either. So, what are your options? Here are the main
choices:
Exclude and discard all flags and texts. Obviously, this is not a good option for the
Simple reason that you are likely to throw away some useful information.
Place the flags and texts unchanged in the fact table. This option is likely to swell up
The fact table to no specific advantage.
Make each flag and text a separate dimension table on its own. Using this option,
the number of dimension tables will greatly increase.
Keep only those flags and texts that are meaningful; group all the useful flags into a
Single “junk” dimension. “Junk” dimension attributes are useful for constraining
Queries based on flag/text values.
Degenerate dimension:
QUALITY THOUGHT Website: www. Qualtythoughttechnologies.com
Phone: 9963486280, 040- 40025423 Email: qthought99@gmail.com
QUALITY THOUGHT
Look closely at the example of the fact table. You find the attributes of order_number and order_line. These are not
measures or metrics or facts. Then why are these attributes in the fact table? When you pick up attributes for the
dimension tables and the fact tables from operational systems, you will be left with some data elements in the
operational systems that are neither facts nor strictly dimension attributes. Examples of such attributes are
reference numbers like order numbers, invoice numbers, order line numbers, and so on. These attributes are
useful in some types of analyses. For example, you may be looking for average number of products per order. Then
you will have to relate the products to the order number to calculate the average. Attributes such as order_number
and order_line in the example are called degenerate dimensions and these are kept as attributes of the
fact table.
Types of Facts
Additive fact: Additive facts are facts that can be summed up through all of the dimensions in the fact table.
Date
Store
Product
Sales_Amount
Sales_Amount is an additive fact, because you can sum up this fact along any of the three dimensions present in
the fact table -- date, store, and product
Semi additive fact: Semi-additive facts are facts that can be summed up for some of the dimensions in the fact
table, but not the others.
Non additive fact: Non-additive facts are facts that cannot be summed up for any of the dimensions present in
the fact table.
Current_Balance is a semi-additive fact, as it makes sense to add them up for all accounts (what's the total current
balance for all accounts in the bank?), but it does not make sense to add them up through time
QUALITY THOUGHT Website: www. Qualtythoughttechnologies.com
Phone: 9963486280, 040- 40025423 Email: qthought99@gmail.com
QUALITY THOUGHT
Profit_Margin is a non-additive fact, for it does not make sense to add them up for the account level or the day
level.
Types of fact tables
Cumulative fact table: This type of fact tables generally describes what was happened over the period of time.
They contain addition facts.
Snapshot fact table: This type of fact table deals with the particular period time. They contain non-additive and
semi-additive facts.
Fact less fact table: A factless fact table is a fact table that does not have any measures.
Let us say we are building a fact table to track the attendance of students.
OLTP system is basically application orientation (eg, purchase order it is functionality of an application)
Where as in DWH concern is subject orient (subject in the sense custorer, product, item, time)
Star schema:
A star schema is the one in which a central fact table is surrounded by denormalized dimensional tables. A star
schema can be simple or complex. A simple star schema consists of one fact table where as a complex star
schema have more than one fact table.
SNOWFLAKE SCHEMA
Advantages:
Normalized structures are easier to update and maintain
Disadvantages:
Degraded query performance because of additional joins
Dimension Table:
Dimension tables contain textual information that represents the attributes of the business.
Dimension tables are joined to a fact able through foreign key reference.
Retail – store name, zip code, product name, product category, day of week
Fact Table: