You are on page 1of 5

Data warehouse:

A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection


of data in support of management's decision making process

a large store of data accumulated from a wide range of sources within a company
and used to guide management decisions.

Facts

The term FACT represents a single business measure. E.g. Sales, Qty Sold

E.g. Grain of “Sales” could be “for each PRODUCT, at each STORE, on each/ every DAY”.

A FACT TABLE is the primary table in a dimensional model where the business measures or FACTS
are stored.

A business measure or FACT is a row in a FACT TABLE. All FACTS in a FACT TABLE must be at the
SAME GRAIN.

Dimensions

The term DIMENSION represents a single category or perspective by which associated FACTS are
interpreted and understood.

E.g. “Store” is a perspective by which sales are understood. It is the answer to the question
“Where did the sales occur?”

A DIMENSION TABLE is a table which holds a list of attributes or qualities of the dimension most
often used in queries and reports.

E.g. The “Store” dimension can have attributes such as the street and block number, the city, the
region and the country where it is located in addition to its name.

Every row in the DIMENSION TABLE represents a unique instance of that DIMENSION and has a
unique identifier called the DIMENSION KEY.

fact tables - normalized

dimension tables - denormalized, to reduce the number of joins

surrogate key
Surrogate key is a unique identification key, it is like an artificial or alternative key to production key, bz the
production key may be alphanumeric or composite key but the surrogate key is always single numeric key.
Assume the production key is an alphanumeric field if you create an index for this fields it will occupy more
space, so it is not advisable to join/index, bz generally all the datawarehousing fact table are having historical
data. These factable are linked with so many dimension table. if it's a numerical fields the performance is high

OLTP - transactional database

- directly accessed by users

- uses live information

Uses of data warehouse:

 A data warehouse is a database, which is kept separate from the organization's operational
database. There is no frequent updating done in a data warehouse.
 It possesses consolidated historical data, which helps the organization to analyze its
business.
 A data warehouse helps executives to organize, understand, and use their data to take
strategic decisions.
 Data warehouse systems help in the integration of diversity of application systems.
 A data warehouse system helps in consolidated historical data analysis.

Star schema vs. Snowflake Schema


Star Schema Snowflake Schema
Maybe more difficult for business
Easier for business users and users and analysts due to a number of
Understandability analysts to query data. tables they have to deal with.
Only has one dimension table May have more than 1 dimension table
for each dimension that groups for each dimension due to the further
related attributes. Dimension normalization of each dimension table.
tables are not in the third Dimension tables are in the third
Dimension table normal form. normal form (3NF).
More complex query due to multiple
The query is very simple and foreign keys joins between dimension
Query complexity easy to understand tables
High performance. The
database engine can optimize
and boost the query More foreign key joins, therefore,
Query performance based on a longer execution time of query in
performance predictable framework. compare with star schema
When dimension tables store a large
When dimension tables store a number of rows with redundancy data
relatively small number of rows, and space is such an issue, we can
space is not a big issue we can choose snowflake schema to save
When to use use star schema. space.
Foreign Key Joins Fewer Joins Higher number of joins
Star schema vs. Snowflake Schema
Star Schema Snowflake Schema
Data warehouse Work best in any data Better for small data warehouse/ data
system warehouse/data mart mar

Star schema vs snowfake schema:


http://www.vertabelo.com/blog/technical-articles/data-warehouse-modeling-star-schema-
vs-snowflake-schema

DW vs OLTP:
An operational data store (or "ODS") is a database designed to integrate data from multiple
sources for additional operations on the data, for reporting, controls and operational decision
support. Unlike a production master data store, the data is not passed back to operational
systems. It may be passed for further operations and to the data warehouse for reporting.
Because the data originate from multiple sources, the integration often involves cleaning,
resolving redundancy and checking against business rules for integrity. An ODS is usually
designed to contain low-level or atomic (indivisible) data (such as transactions and prices) with
limited history that is captured "real time" or "near real time" as opposed to the much greater
volumes of data stored in the data warehouse generally on a less-frequent basis.
The general purpose of an ODS is to integrate data from disparate source systems in a single
structure, using data integration technologies like data virtualization, data federation, or extract,
transform, and load (ETL). This will allow operational access to the data for operational
reporting, master data or reference data management.
An ODS is not a replacement or substitute for a data warehouse or for a data hub but in turn
could become a source.

Data Warehouse Models


From the perspective of data warehouse architecture, we have the
following data warehouse models −

 Virtual Warehouse

 Data mart

 Enterprise Warehouse

Virtual Warehouse
The view over an operational data warehouse is known as a virtual
warehouse. It is easy to build a virtual warehouse. Building a virtual
warehouse requires excess capacity on operational database servers.

Data Mart
Data mart contains a subset of organization-wide data. This subset of
data is valuable to specific groups of an organization.

In other words, we can claim that data marts contain data specific to a
particular group. For example, the marketing data mart may contain data
related to items, customers, and sales. Data marts are confined to
subjects.

Points to remember about data marts −

 Window-based or Unix/Linux-based servers are used to implement data


marts. They are implemented on low-cost servers.
 The implementation data mart cycles is measured in short periods of time,
i.e., in weeks rather than months or years.
 The life cycle of a data mart may be complex in long run, if its planning and
design are not organization-wide.
 Data marts are small in size.
 Data marts are customized by department.
 The source of a data mart is departmentally structured data warehouse.
 Data mart are flexible.

Enterprise Warehouse
 An enterprise warehouse collects all the information and the subjects
spanning an entire organization
 It provides us enterprise-wide data integration.
 The data is integrated from operational systems and external information
providers.
 This information can vary from a few gigabytes to hundreds of gigabytes,
terabytes or beyond.

You might also like