You are on page 1of 23

Data Warehouse

Fundamentals
Outline

1. Part 1: Fundamentals
1. Motivations
2. Requirements
3. Introduction

Database Laboratory 2
1- Motivation

Decision making:
 Decisionsupport systems
 Datamining

 Data analysis to deciders:


Transformation of data in strategic information
 Complex calculations but simple queries
 Totalcalls by region
 Social category of the best clients in each country

 Evolution of a market part of a product ….

Database Laboratory 3
Decision making

 Handle and visualize data in a natural way for deciders


(non computer scientist) and quickly

region year
Multidimensional view
x:sales
product

Database Laboratory 4
Decision making

 Necessitates to quickly retrieve and analyze data from


various sources
=> Transversal system
Data has to be:
 Extracted
 Grouped together and organized
 Correlated between them
 Transformed (summary, aggregation)

Database Laboratory 5
Business Intelligence Applications
F
r Business engine models
o Analysis Business
n Query/ manager
data t reporting
Data Business
e Transaction DB mining analyst
DW
n
d

Telecom: transaction data = call detail records


BI applications: traffic analysis, customer loyalty analysis,
fraud detection, promotion effectiveness measure
Database Laboratory 6
Data issues

 Massive volume
5 terabytes already exists
 10 terabytes expected soon

 Dispersed, often difficult to access


 Badly or not at all integrated
 Complex
 Not structured for business queries

Database Laboratory 7
2- Requirements

 Stores summary data to support decision making


 Subject oriented
 Integrated
 Non-volatile
 Covers a large time span
 Fast access

Database Laboratory 8
Subject orientation

Sales Employee
system data

Payroll Customer
system data

Vendor
Purchasing data
system

Operational data DW
Database Laboratory 9
Integrated data set

Sales
system

Payroll Customer
system data

Purchasing
system

Database Laboratory 10
Non-volatile
DBMS DW

create access

update Sales delete Customer


system data

insert load

Database Laboratory 11
3- Introduction

 Database Management Systems:


Dedicated to OLTP (On-Line Transaction Processing),
Services:
 Definition and manipulation of information
 Efficient query language
 Security

 Add, retrieve and delete tuples identified with a key


 A lot of users in a same time
 Operation executed quickly for each user

Database Laboratory 12
Relational databases

 Simple data model (Codd 70): table


 The most used
 Adapted to OLTP (SQL)
 A lot of transactions and simple queries invocating few data at each time

 OLAP (ROLAP, MOLAP, HOLAP)


 Less queries but more complex, long, necessitating a transformation of
the data (aggregation), analysis.

=> Relational database not adapted to OLAP

Database Laboratory 13
Example
Table (relation) Calls
Attributes

Region Product Calls


Vaud Mobiles 180
Vaud Fax 244
Neuchatel Mobile 318
tuples Neuchatel Standard 204
Fribourg Standard 131
Fribourg Fax 153

If we need details on a product ….. In an other table


=> Not adapted to complex query with spatially referenced information

Database Laboratory 14
DB-OLTP vs. DB-OLAP
DB- OLTP DB-OLAP

Objectives data collaction read query and analyze


operations day by day
Users a department (Employe) transversal

Data types managment data data analysis


(common data) (with their history)

Information detailed detailed + aggregated

Invocated tuples About millions


ten
Operations simple queries, fixed complex queries, ad-hoc
Selections and updates selections
A lot of transactions few transactions
Short transactions long time transactions
Real time batch
Queries on detailed tuples aggregations and group by

Database Laboratory 15
DB-OLTP vs. DB-OLAP

 DB-OLTP represent the multi-dimensional aspects of


the real world as a flat view
 OLAP requirements:
 Visualizedata in several dimensions
 Define and add dimensions
 Manipulate data in a easy, efficient and simple way

=>The multi-dimensional databases (‘Cube’)

Database Laboratory 16
The Multidimensional Idea

Region

Sales granularity
Year Product
category
Quarter Product
type
Product
3 dimensions

Database Laboratory 17
Data warehouse

 Long life span


 Includes source databases transformed
 Optimized to answer to complex queries to analysis
and deciders

Database Laboratory 18
General Architecture
OLAP BI
External Data Server
Sources
acquisition OLAP

queries/
Query reports
Data and
Integration Data Data Analysis
Component Warehouse Component
data
mining

Metadata

Internal Monitoring
Sources Administration
Construction &
Database Laboratory maintenance 19
DW Monitoring
 Identify growth factors and rate
 Identify what data is being used
 Identify who is using the data, and when

  Avoid constant growth


  Plan for evolution (trends)

 Identify useful granularity levels


 Control response time (latency)

Database Laboratory 20
Partitioning

 To improve performances & flexibility without giving up


on the details

DW

 By date, business type, geography, …


 Data marts

Database Laboratory 21
Conclusion

 Open questions and problems:


 Leading the DWH
need to simplify the process
 Maintenance
consistency and update problems
 Schema integration
several sources -> how have an unique global schema?
 Size effect
grow very quickly -> tera-octets, how mange this?
 Really expansive

Database Laboratory 22
Thank You

Database Laboratory 23

You might also like