You are on page 1of 11

QUALITY THOUGHT

Data Warehouse Concepts


Data Warehouse Testing:
 A data warehouse is a repository of transactional data that has been extracted from original sources and
transformed so that query, analysis and reporting on trends within historic data are possible and efficient.
 The analyses provided by data warehouses may involve strategic planning, decision support, and
monitoring the outcomes of a chosen strategy.
 Typically, data that is loaded into a data warehouse is derived from diverse sources of operational data,
which may consist of data from databases, feeds, application files or flat files.
 The data must be extracted from these diverse sources, transformed to a common format, and loaded into
the data warehouse.
 It is further aggregated into a data mart for efficient reporting.
 The ETL (Extract, transform and load) process is a critical step in any data warehouse implementation, and
continues to be an area of major significance whenever the ETL code is updated.
 Once the data warehouse and data marts are populated, business intelligence applications facilitate
querying, analysis and reporting.
An effective data warehouse testing strategy focuses on the main structures within the data warehouse
architecture:

1. The ETL layer


2. The data warehouse itself
3. Associated data marts

4. The front-end business intelligence/reporting applications

Each of these units must be treated separately and in combination, and since there may be multiple components in
each (multiple feeds to ETL, multiple databases or data repositories that constitute the warehouse, multiple data
marts), each of these subsystems must be individually validated.

Generic Architecture for DWH:

QUALITY THOUGHT Website: www. Qualtythoughttechnologies.com


Phone: 9963486280, 040- 40025423 Email: qthought99@gmail.com
QUALITY THOUGHT

What is Data warehouse / what are the Characteristics of DWH.


According to Inmon, famous author for several data warehouse books, Data warehouse is Subject-oriented,
Integrated, Non-volatile, Time variant.

Subject oriented : Data warehouse is maintained different subject areas (Sales, product, location ect...)
Integrated : Data collected from multiple sources integrated into a User readable unique format.
Non volatile : Maintain Historical date.
Time variant : data display the weekly, monthly, quarterly, and yearly.

Subject oriented: DWH is a subject-oriented database which supports the business needs of Individual
Departments in the enterprise
Example: SALES, HR, ACCOUNTS, CLAIMS etc….

QUALITY THOUGHT Website: www. Qualtythoughttechnologies.com


Phone: 9963486280, 040- 40025423 Email: qthought99@gmail.com
QUALITY THOUGHT

Integrated:

Non Volatile:

Time variant:

QUALITY THOUGHT Website: www. Qualtythoughttechnologies.com


Phone: 9963486280, 040- 40025423 Email: qthought99@gmail.com
QUALITY THOUGHT

According to Ralph Kimball, A DWH is a relational DB, which is specifically designed for analyzing the business
But not for business transactional processing. A DWH is designed to support decision making process.
Since the DB contains historical data which requires for business analysis process hence it is called historical DB.
Since the DB is designed to support decision making process, hence it is called decision support system (DSS).

What is Data mart?

A subset of data warehouse is called Data mart. This supports the business needs of individual departments
within the enterprise.

Reason for creating a Data mart


Easy access to frequently needed data
Improves end-user response time
Lower cost than implementing a full data warehouse
Data warehouse versus data mart:

DATA WAREHOUSE DATA MART

Corporate/Enterprise-wide Departmental
Union of all data marts A single business process
Data received from staging
area Star-join (facts &
dimensions)

Structure for corporate view


of

QUALITY THOUGHT Website: www. Qualtythoughttechnologies.com


Phone: 9963486280, 040- 40025423 Email: qthought99@gmail.com
QUALITY THOUGHT

Data
Structure to suit the

departmental view of data

What are the types of data marts?

There are 2 types of Data marts


1. Dependent Data marts
2. Independent Data marts

Dependent Data marts: (Inmon approach)


Here Data Marts are developed by using DWH.
The data flow begins with data extraction from the operational data sources. This data is loaded into the staging
area and validated and consolidated for ensuring a level of accuracy and then transferred to the Enterprise Data
warehouse, from the Enterprise Data warehouse data marts will be created.
Here data mart development depends on Enterprise Data warehouse hence this approach is also called as top-
down approach.
Advantage:
A truly corporate effort, an enterprise view of data.
Disadvantage:
Takes longer to build.
Needs high level of cross-functional skills.

QUALITY THOUGHT Website: www. Qualtythoughttechnologies.com


Phone: 9963486280, 040- 40025423 Email: qthought99@gmail.com
QUALITY THOUGHT

Independent Data marts (Kimball approach)

Here DWH will populate from Data Marts DWH.


First we need to design department specific DB know as Data Marts, then integrates the data marts into an
enterprise DWH. Here Data mart development is a independent hence it is called bottom up approach.

Advantage:
Faster and easier implementation of manageable pieces Favorable return on
investment and proof of concept less risk of failure.

Disadvantage: Each data mart has its own narrow view of data

QUALITY THOUGHT Website: www. Qualtythoughttechnologies.com


Phone: 9963486280, 040- 40025423 Email: qthought99@gmail.com
QUALITY THOUGHT

What are the types of dimensions?


Confirmed dimension
Junk dimension
Degenerate dimension
Slowly changing dimension

Confirmed dimension
A conformed dimension is a dimension that has exactly the same meaning and content when being referred from
different fact tables. AS YOU CAN SEE IN THE BELOW FIGURE,THE TIME AND CUST DIMENSIONS ARE CALLED CONFIRMED
DIMENSIONS AS THEY ARE SHARED ACROSS MULTIPLE FACT TABLES WITH THE SAME MEANING.

Junk Dimension:
your source legacy systems and review the individual fields in source data structures
for customer, product, order, sales territories, promotional campaigns, and so on.
Most of these fields wind up in the dimension tables. You will notice that some fields like
miscellaneous flags and textual fields are left in the source data structures. These include
yes/no flags, textual codes, and free form texts.
Some of these flags and textual data may be too unclear to be of real value. These
may be leftovers from past conversions from manual records created long ago. However,
many of the flags and texts could be of value once in a while in queries. These may not
be included as significant fields in the major dimensions. At the same time, these flags
and texts cannot be discarded either. So, what are your options? Here are the main
choices:

 Exclude and discard all flags and texts. Obviously, this is not a good option for the
Simple reason that you are likely to throw away some useful information.
 Place the flags and texts unchanged in the fact table. This option is likely to swell up
The fact table to no specific advantage.
 Make each flag and text a separate dimension table on its own. Using this option,
the number of dimension tables will greatly increase.
 Keep only those flags and texts that are meaningful; group all the useful flags into a
Single “junk” dimension. “Junk” dimension attributes are useful for constraining
Queries based on flag/text values.

Degenerate dimension:
QUALITY THOUGHT Website: www. Qualtythoughttechnologies.com
Phone: 9963486280, 040- 40025423 Email: qthought99@gmail.com
QUALITY THOUGHT

Look closely at the example of the fact table. You find the attributes of order_number and order_line. These are not
measures or metrics or facts. Then why are these attributes in the fact table? When you pick up attributes for the
dimension tables and the fact tables from operational systems, you will be left with some data elements in the
operational systems that are neither facts nor strictly dimension attributes. Examples of such attributes are
reference numbers like order numbers, invoice numbers, order line numbers, and so on. These attributes are
useful in some types of analyses. For example, you may be looking for average number of products per order. Then
you will have to relate the products to the order number to calculate the average. Attributes such as order_number
and order_line in the example are called degenerate dimensions and these are kept as attributes of the
fact table.

Types of Facts

Additive fact: Additive facts are facts that can be summed up through all of the dimensions in the fact table.
Date
Store
Product
Sales_Amount
Sales_Amount is an additive fact, because you can sum up this fact along any of the three dimensions present in
the fact table -- date, store, and product
Semi additive fact: Semi-additive facts are facts that can be summed up for some of the dimensions in the fact
table, but not the others.
Non additive fact: Non-additive facts are facts that cannot be summed up for any of the dimensions present in
the fact table.

Example of Semi-Additive or Non Additive Fact


Date
Account
Current_Balance
Profit_Margin

Current_Balance is a semi-additive fact, as it makes sense to add them up for all accounts (what's the total current
balance for all accounts in the bank?), but it does not make sense to add them up through time
QUALITY THOUGHT Website: www. Qualtythoughttechnologies.com
Phone: 9963486280, 040- 40025423 Email: qthought99@gmail.com
QUALITY THOUGHT

Profit_Margin is a non-additive fact, for it does not make sense to add them up for the account level or the day
level.
Types of fact tables
Cumulative fact table: This type of fact tables generally describes what was happened over the period of time.
They contain addition facts.
Snapshot fact table: This type of fact table deals with the particular period time. They contain non-additive and
semi-additive facts.
Fact less fact table: A factless fact table is a fact table that does not have any measures.
Let us say we are building a fact table to track the attendance of students.

Difference between OLTP and DWH?

OLTP/OPDB OLAP/DSS (Desission Support System)


Application Oriented data Subject Oriented data
Used to run business Used to analyze business
Detailed data Summarized data
Current data Snapshot data/historical data
Isolated Data Integrated Data
Clerical User Knowledge User
Few Records accessed at a time Large volumes of records accessed at a
(tens) time(millions)
Read/Update Access Mostly Read (Batch Update)
No data redundancy Redundancy data/ repeated data
Database Size 100MB-100 GB Database Size 100 GB - few terabytes

QUALITY THOUGHT Website: www. Qualtythoughttechnologies.com


Phone: 9963486280, 040- 40025423 Email: qthought99@gmail.com
QUALITY THOUGHT

OLTP system is basically application orientation (eg, purchase order it is functionality of an application)

Where as in DWH concern is subject orient (subject in the sense custorer, product, item, time)

Star schema:

A star schema is the one in which a central fact table is surrounded by denormalized dimensional tables. A star
schema can be simple or complex. A simple star schema consists of one fact table where as a complex star
schema have more than one fact table.

SNOWFLAKE SCHEMA

“Snowflaking” is a method of normalizing the dimension tables in a STAR schema. When


you completely normalize all the dimension tables, the resultant structure resembles a snowflake with the fact
table in the middle.

Advantages:
Normalized structures are easier to update and maintain

Disadvantages:
Degraded query performance because of additional joins

QUALITY THOUGHT Website: www. Qualtythoughttechnologies.com


Phone: 9963486280, 040- 40025423 Email: qthought99@gmail.com
QUALITY THOUGHT

Dimension Table:

Dimension tables contain textual information that represents the attributes of the business.

Dimension tables are joined to a fact able through foreign key reference.

Dimension Table Examples

 Retail – store name, zip code, product name, product category, day of week

 Telecommunications -- call origin, call destination

 Banking –customer name, account number, branch, account officer

 Insurance –policy type, insured party

Fact Table:

 Contains facts, foreign keys

 Can hold large volumes of data


Measures: Key measurements are the measures that are used for business analysis and monitoring.

QUALITY THOUGHT Website: www. Qualtythoughttechnologies.com


Phone: 9963486280, 040- 40025423 Email: qthought99@gmail.com

You might also like