DATA WAREHOUSING

Data Warehousing Concepts

What is a Data Warehouse?
A Data warehouse is a subject-oriented, integrated, time-varying, non-volatile collection of data in support of the management's decision –making process (OR) Data warehouse is a relational database for Query and Analysis Rather than for transaction processing. Subject-oriented(customer,products,sales,etc) Non-volatile Time-varying Integrated
(William.Inmon –1993)

   

Subject-oriented

A Data Warehouse is organized around major subjects, such as customer, supplier, product and sales. Rather than concentrating on day-to-day operations and transaction processing of an organization.

Integrated

A Data Warehouse is usually constructed by integrating multiple heterogeneous sources, such as relational databases, flat files, and on-line transaction records. data cleaning and data integration techniques are applied to ensure consistency in naming conventions, encoding structures, attribute measures and so on.

Time Variant

Data are stored to provide information from a historical perspective (example: the past 5 -10 years). Every key structure in the data warehouse contains, either implicitly or explicitly an element of time

Non-Volatile

A data warehouse is always a physically separate store of data transformed from the application data found in the operational environment. due to this separation, a data warehouse does not require transaction processing,recovery,and concurrency control mechanisms. it usually requires only two operations in data accessing :initial loading loading of data and access of data.

Comparison of OLTP systems and data warehousing systems
OLTP Systems Data warehousing systems

Holds Current data Stores detailed data Data is dynamic Transaction-driven Application -oriented Supports day-to-day decisions Serves large number of clerical/operational users

Holds Historical data stores detailed,lightly,and highly summarized data Data is largely static Analysis driven Subject -oriented Supports strategic decisions Serves relatively low number of managerial users

Dimension Modeling
Dimension: Dimension is a structure which consists of levels, and hierarchies is defined on each level. Example:

SEX

MALE

FEMALE

Dimension Modeling
Example:
Profession Level 0

Engineer

Secretary

Teacher

Level 1

Chemical

Civil

Executive

Junior

Elementary

High School

Level 2

FACTS

Fact: Fact consists of whole data with primary key, foreign key relation ship with dimensions and also consists of measures. There are Three types of facts
1.ADDITIVE FACTS 2.SEMI ADDITIVE FACTS 3.NON ADDITIVE FACTS

Fact less Fact & Conform dim

Fact less fact is a fact it does not contain Measures. A Dimension which can share more than one Fact is called Conform Dimension Collection of Star Schemas and Snowflake Schemas is called Galaxy.

Star schema & Snow flake Schema

Star schema: A Centralized fact table surrounded with dimension tables having Primary, Foreign key relation ship between them is called star schema. Snow flake Schema: A normalized star schema Is called Snow flake Schema Galaxy: Collection of Star schemas and snow flake Schema is called Galaxy.

Star Schema
Sex-Dim
Sex key Sex

Date-Dim
Date key Current year Current month Current week

Fact Table
Profession Key ----------------------------Sex key ---------------------------Address key ----------------------------Date key --------------------------------------Measures (Numeric)

Conform Dim

Address Dim
Address key Country State City

Profession-Dim
Profession-key Profession-class Title Level discipline

Snow Flake Schema
Item Dim Item key Item name Type Supplier key Fact Table Time key Item key Location key ------------------------------------Rupees sold Units sold Time Dim Time key Year Quarter Month day

Supplier Dim Supplier key Supplier name Supplier address Supplier type

Location Dim Location key Street City key

City Dim City key City name State Country Pin code

TYPES OF MAPPINGS
  

History Simple pass through (None) Slowly growing target (Full) Slowly changing dimension (depends)

Types of SCD’S
 

Slowly changing dimension—1 Slowly changing dimension—2 Time stamping versioning Flaging Slowly changing dimension—3

4. 5. 6. 

Slowly changing Dimension-1

SCD-1:When you does not want History use this kind of mapping (Only insert else Update takes place) it inserts the new row or Update the existing dimensions.

Slowly changing Dimension-2

SCD -2 (Time stamp): When you want

maintain full history use this kind of mapping. Inserts new and changed dimensions. Creates an effective date range to track changes.

SCD -2 (Versioning): Inserts new and

changed dimensions. Creates a version number and increments the primary key to track changes.

SCD -2 (Flaging): Inserts new and changed
dimensions. Flags the current version and increments the primary key to track changes.

Slowly changing Dimension-3
SCD -3 : when you want partial history use
this kind of mapping. It Inserts new dimensions. Updates changed values in existing dimensions. Optionally uses the load date to track changes.

Data Warehouse Execution Architecture

Architecture

DB2/400

Oracle ODBC SQL Server Native FTP Informatica (ETL) UNIX-HP ODBC Native

NCR Tera Data Warehouse

Flat files

Reporting( DSS) Servers

Source Systems

ETL

DATA WAREHOUSE

ODBC

Sign up to vote on this title
UsefulNot useful