Professional Documents
Culture Documents
support systems
1
Agenda
• Data warehouse
• Characteristic, modelling and basic functionality
• Decision support system
• Definition, operation and comparisons
2
Data warehouse definition
3
Data warehouse definition
4
Data warehouse definition
5
Data warehouse characteristics
6
Data warehouse characteristics
7
Data warehouse characteristics
• ELT vs ETL:
Extraction – Load - Transform Extraction – Transform - Load
Load the data into the staging before Transforming the data before loading into the
transforming staging
Take place within the warehouse Take place outside the warehouse ( external
application )
Using SQL Query Language to perform Using external application language ( Python,
R, … )
Suitable for larger amount of data Suitable for smaller amount of data
8
Data warehouse characteristics
9
Data warehouse characteristics
10
Data warehouse modelling
• DW are modelled in a multi dimensional
model:
• 1D: Sales over time
• 2D: Sales over time for each products
• 3D: Sales over time for each products per
location
• 4D: Sales over time for each products per
location per manufacturer
11
Data warehouse modelling
Time Product Location Manufacturer Quantity
Q1 C Z 1 83
Q2 A X 1 81
Q1 B Y 1 17
Q1 B Z 1 84
Q3 B Z 2 88
Q3 B X 2 32
Q1
Q2
Q2
C
C
B
Y
X
X
2
2
1
80
58
27
• For example, here are some data from the sale of an
Q3
Q3
B
B
Y
Z
2
1
7
20
organization in 4D, represented by a 2D table
Q2 C Z 2 8
Q1 B Z 2 64
Q3 C X 1 87
Q3 C X 2 73
Q2 B Z 1 86
Q1 B X 2 48
Q3 B X 1 88
Q3 A Z 2 13
Q2 B Y 2 38
Q2 A Y 1 92
Q2 C Y 1 12
12
Data warehouse
modelling
• Multidimensional data can be
modelled in 2 way:
• The star schema:
• Central fact table pointing to
outside table
13
Data warehouse
modelling
• Multidimensional data can be
modelled in 2 way:
• The snowflake schema:
• Branching from the fact table
14
Data warehouse designing
15
Data warehouse designing
16
Data warehouse designing
17
Data warehouse functionality
18
Data warehouse benefits and
downside
• Benefits:
• Relieve resources from the main database
• Prevent conflict when performing transaction during analysis query
• Act as a single point of contact for analytical data
• Store historical data
• Allow for tuning of the query for analysis requirements
19
Data warehouse benefits and
downsides
• Downsides:
• Additional task in construction and administration of the DB
• Must adapt to evolution of the source database
• Difficult to manage large multidimensional data
• Difficult to modify with change in requirements
20
Beyond Data Warehouse
21
Decision
Support
System
Characteristics of DSS
• DSS compose of DW, OLAP and DM technologies.
• DSS attempts to combine the use of models or analytic techniques with traditional data
access and retrieval functions.
• DSS specifically focuses on features which make them easy to use by non-computer
people in an interactive mode.
• Bottom-tier
• Data warehouse server → A central repository of information
that can be analyzed to make more informed decisions.
• Middle-tier:
• OLAP Server → for fast querying of the data warehouse.
• Top-tier:
• Query and reporting tools
• Analysis tools
• Data mining tools
Complete DSS
OLAP vs OLTP
Parameters OLTP OLAP
It is an online transactional system. It OLAP is an online analysis and data
Process
manages database modification. retrieving process.
It is characterized small part of the It is characterized by a large volume of
Characteristic
database and transactions data.
Method OLTP uses traditional DBMS. OLAP uses the data warehouse.
Extraction, processing, and presentation
Insert, Update, and Delete data from
Query data for analytic and decision-making
the database.
purposes
OLTP and its transactions are the Different OLTP databases become the
Source
sources of data. source of data for OLAP.
OLAP database does not get frequently
OLTP database must maintain data
Data Integrity modified. Hence, data integrity is not an
integrity constraint.
issue.
Bases on the complex of data
Response time Short period of time.
warehouses.
The data in the OLTP database is always The data in OLAP process might not be
Data quality
detailed and organized. organized.
7. OLAP Operations
• There are five basic analytical
operations that can be performed
on an OLAP cube:
1. Roll up (or drill-up)
2. Drill down
3. Dice
4. Slice
5. Pivot
7. OLAP Operations
• Roll-up (also drill-up): Data is
summarized with increasing
generalization. It can be done by:
• Climbing up in the concept hierarchy
• Reducing the dimensions
• Example: In the cube given in the
overview section, the roll-up operation is
performed by climbing up in the concept
hierarchy of Location dimension (City ->
Country)
29
7. OLAP Operations
• Drill down: Increasing levels of detail
are revealed. It can be done by:
• Moving down in the concept hierarchy
• Adding a new dimension
• Example: In the cube given in
overview section, the drill down
operation is performed by moving
down in the concept hierarchy
of Time dimension (Quarter ->
Month).
30
7. OLAP Operations
31
• Pivot: Cross tabulation (also referred to
7. OLAP Operations as rotation) is performed.
• Example: Pivot the sliced sub-cube
32
8. Types of OLAP Server
• Relational OLAP (ROLAP) : for large data volumes of
data → stored in relation tables → Static multi-
dimensional view of data.
• Schema: star, snowflake
• Product: MetaCube, Red Brick, AXSYS Suite
33
8. Types of OLAP Server
• Multidimensional OLAP (MOLAP) : limited data
volumes → stored in multidimensional array → Dynamic
multi-dimensional view of data.
• Schema: cube
• Product: Oracle Essbase, IBM Cognos, and Apache Kylin
34
8. Types of OLAP Server
• Hybrid OLAP (HOLAP): combination of ROLAP and
MOLAP → faster performance (by using MOLAP)
and more detailed information (by using ROLAP)
• Product: Microsoft Analysis Services and SAP AG BI
Accelerator
35
9. Approach to OLAP Server
• Relational OLAP (ROLAP):
• Relational DBMS to store and manage data warehouse.
• OLAP middleware to support missing pieces.
• Multidimensional OLAP (MOLAP):
• Array-based storage structures.
• Direct access to array data structures.
• Hybrid OLAP (HOLAP):
• Storing detailed data in RDBMS.
• Storing aggregated data in MDBMS.
• User access via MOLAP tools.
36
ROLAP vs. MOLAP
Characteristics ROLAP MOLAP
Resources
HIGH VERY HIGH
Flexibility
HIGH LOW
Scalability
HIGH LOW
Speed Good with small data sets. Faster for small to medium data sets.
Average for medium to large data set Average for large data sets.
37
CONCLUSION
• Data warehouse (DW) is use for storing analytical data
• DSS is based on OLAP and data mining, significantly improve on query systems
• In DSS, the data analysis and decisions main support technology is mainly OLAP technology
and data mining technology → OLAP is a significant improvement over query systems.
• OLAP has 3 main types: ROLAP, MOLAP and HOLAP with different functions.
Backup Slides
40
Data warehouse modelling
Time
Q1
Product
C
Location
Z
Manufacturer
1
Quantity
83
• To better view this data, we can pivot this data to view
Q2
Q1
A
B
X
Y
1
1
81
17 it from another view
Q1 B Z 1 84
Q3
Q3
B
B
Z
X
2
2
88
32 • Example: Time => Columm; Region => Rows; Quantity =>
Q1 C Y 2 80
Q2 C X 2 58 value
Q2 B X 1 27
Q3
Q3
B
B
Y
Z
2
1
7
20 • The aggregate function is sum of all product
Q2 C Z 2 8
Q1 B Z 2 64
Q3 C X 1 87
Q3 C X 2 73
Q2 B Z 1 86
Q1 B X 2 48 Row Labels Q1 Q2 Q3 Grand Total
Q3 B X 1 88 X 287 339 348 974
Q3 A Z 2 13 Y 265 310 293 868
Q2 B Y 2 38
Z 245 274 210 729
Q2 A Y 1 92
Q2 C Y 1 12 Grand Total 797 923 851 2571
41
Data warehouse modelling
Time
Q1
Product
C
Region
Z
Manufacturer
1
Quantity
83
• Another way to view multidimensional data is to slice
Q2
Q1
A
B
X
Y
1
1
81
17 it
Q1 B Z 1 84
Q3
Q3
B
B
Z
X
2
2
88
32 • Here is the view of the Region and Time dimension, of
Q1 C Y 2 80
Q2 C X 2 58 product A, made by manufacturer 2
Q2 B X 1 27
Q3 B Y 2 7
Q3 B Z 1 20
Product A
Q2 C Z 2 8 Manufacturer 2
Q1 B Z 2 64
Q3 C X 1 87
Q3 C X 2 73 Row Labels Q1 Q2 Q3 Grand Total
Q2 B Z 1 86 X 11 0 66 77
Q1 B X 2 48 Y 16 33 29 78
Q3 B X 1 88
Q3 A Z 2 13
Z 11 65 33 109
Q2 B Y 2 38 Grand Total 38 98 128 264
Q2 A Y 1 92
Q2 C Y 1 12
42
Data warehouse modelling
• The data can also be roll-up or drill-down
Time and Product
Q1 Q2 Q3
Time
Drill-down A B C A B C A B C
Q1 Q2 Q3
• Drill-down: provide finer view, of the data (limited by the lowest level of data )
• Roll-up: grouping data along a dimension, provide a more general view
43