You are on page 1of 12

Virtual University of Pakistan

Data Warehousing
Lecture-4
Introduction and Background
Ahsan Abdullah
Assoc. Prof. & Head
Center for Agro-Informatics Research
www.nu.edu.pk/cairindex.asp
FAST National University of Computers & Emerging Sciences, Islamabad

1
DWH-Ahsan Abdullah
Introduction and Background

2
DWH-Ahsan Abdullah
How is it Different?
 Starts with a 6x12 availability requirement ...
but 7x24 usually becomes the goal.
 Decision makers typically don’t work 24 hrs a day and 7
days a week. An ATM system does.

 Once decision makers start using the DWH, and start


reaping the benefits, they start liking it…

 Start using the DWH more often, till want it available


100% of the time.

3
DWH-Ahsan Abdullah
How is it Different?
 Starts with a 6x12 availability requirement ...
but 7x24 usually becomes the goal.
 For business across the globe, 50% of the world may be
sleeping at any one time, but the businesses are up 100%
of the time.

 100% availability not a trivial task, need to take into


account loading strategies, refresh rates etc.

4
DWH-Ahsan Abdullah
How is it Different?
 Does not follows the traditional development
model
Requirements

 Program

Classical SDLC

 Requirements gathering
 Analysis
 Design
 Programming
 Testing
 Integration
 Implementation
5
DWH-Ahsan Abdullah
How is it Different?
 Does not follows the traditional development
model
DWH

Program

 Requirements
DWH SDLC (CLDS)

 Implement warehouse
 Integrate data
 Test for biasness
 Program w.r.t data
 Design DSS system
 Analyze results
 Understand requirement
6
DWH-Ahsan Abdullah
Data Warehouse Vs. OLTP

OLTP (On Line Transaction Processing)


Select tx_date, balance from tx_table
Where account_ID = 23876;

7
DWH-Ahsan Abdullah
Data Warehouse Vs. OLTP

DWH
Select balance, age, sal, gender from
customer_table, tx_table
Where age between (30 and 40) and
Education = ‘graduate’ and
CustID.customer_table =
Customer_ID.tx_table;

8
DWH-Ahsan Abdullah
Data Warehouse Vs. OLTP

OLTP DWH
Primary key used Primary key NOT used
No concept of Primary Index Primary index used
Few rows returned Many rows returned
May use a single table Uses multiple tables
High selectivity of query Low selectivity of query
Indexing on primary key Indexing on primary index
(unique) (non-unique)

9
DWH-Ahsan Abdullah
Data Warehouse Vs. OLTP
OLTP: OnLine Transaction Processing (MIS or Database System)

Data Warehouse OLTP


Scope * Application –Neutral * Application specific
* Single source of “truth” * Multiple databases with repetition
* Evolves over time * Off the shelf application
* How to improve business * Runs the business

Data * Historical, detailed data * Operational data


Perspective * Some summary * No summary
* Lightly denormalized * Fully normalized

Queries * Hardly uses PK * Based on PK


* Number of results * Number of results returned in
returned in thousands hundreds

Time factor * Minutes to hours * Sub seconds to seconds


* Typical availability 6x12 * Typical availability 24x7
10
DWH-Ahsan Abdullah
Comparison of Response Times
 On-line analytical processing (OLAP) queries must be
executed in a small number of seconds.
 Often requires denormalization and/or sampling.

 Complex query scripts and large list selections can


generally be executed in a small number of minutes.

 Sophisticated clustering algorithms (e.g., data mining)


can generally be executed in a small number of hours
(even for hundreds of thousands of customers).

11
DWH-Ahsan Abdullah
Putting the pieces together

Data Data Warehouse Server OLAP Servers Clients


(Tier 0) (Tier 1) (Tier 2) (Tier 3)
Semistructured


MOLAP
Sources Query/Reporting

www data
Meta
Data 
 Extract
Data Analysis 








IT
Archived
data
Transform
Load
(ETL)
Warehouse
ROLAP 
Data Mining

Business
Users
Users
Operational
Data Bases 

Data sources Data Marts  Tools
Business Users

12
DWH-Ahsan Abdullah

You might also like