Professional Documents
Culture Documents
Concepts
1
WHY DATAWAREHOUSE?
• Multi Dimensional Analysis of Data –Reporting
Sales Product-Char
Hyderabad Informatica
E Staging
OLTP DB- SQL Data
Server SaleID-Decimal
Warehouse
Product-String L
E
Sales
Chennai
Sale ID –Numeric
Product-Varchar2
OLTP DB-
Oracle Server
Integrated View Is The Essence Of A Data Warehouse
Non-volatile - Characteristics of a Data Warehouse
insert change
Operational Data
Warehouse
insert
delete
load
read only
access
replace
change
Operational Data
Warehouse
Data
Warehouse
SALES ACCOUNTS
LOANS HR
* Data Extraction
* Data Transformation
* Data Loading
DATA ACQUISITION
Relational
Source Staging (Buffer)
Mainframe
Data Acquisition
DATA ACQUISITION –Data Extraction
Data Extraction:
It is a process of reading the data from various types of sources
Such as relational sources, ERP sources, Mainframe sources,
XML file and Flat files.
Data Transformation:
It is a process of cleaning the data and transforming the data into
A required business format.
* Data Merging
* Data Cleansing
* Data Scrubbing
* Data Aggregation
DATA ACQUISITION --DATA TRANSFORMATION
ata Merging:
It is a process of combining the data from multiple inputs and
ad into a single output. There are two types of Data Merging Activities.
Join
Union
Data Cleansing:
It is a process of removing unwanted data from Staging
OR
It is a process of changing inconsistencies and inaccuracies
Data Scrubbing:
It is a process of deriving new data definitions using existing data.
Data Aggregation:
1. Dependent DM
2. Independent DM
Data Marts
FK1
FK2
FK3
FK4
FACTS
DIMENSION 3 DIMENSION 2
Pk3
Pk2
STAR SCHEMA DATABASE DESIGN TIME
CUSTOMER
DATE_ID(PK)
Customer_id(pk) YEAR
Cust_name QUARTER
Address MONTH
Phone WEEK
SALES FACT DAY
fax
CUSTOMER_ID(FK)
STORE_ID(FK)
PRODUCT_ID(FK)
DATE_ID(FK)
QUANTITY (fact)
REVENUE (fact)
STORE
Store_id (pk)
PRODUCT
Country
Product_id(pk) Region
Category State
Sub Category
Product
City
Store
Snowflake Schema
CUSTOMER_ID(FK)
STORE_ID(FK)
PRODUCT_ID(FK)
DATE_ID(FK)
QUANTITY (fact)
REVENUE (fact)
PRODUCT
STORE
Product_id(pk)
Category Store_id (pk)
Sub Category
Product Country
Region
State
City
Category
Store
Sub Category
Integrated Schema
D A X
PK FK
PK-FK
C B Y
SCD captures the changes which takes place over the period of
time.
1. SCD Type 1
2. SCD Type 2
Type 2 dimension maintain the full history in the target. For each
update it inserts a new record in the target tables.
3. SCD Type 3 :