You are on page 1of 43

Data warehouse and Decision

support systems

• Le Nguyen Phan Long


• Trinh Tran Nguyen Chuong
• Instructor: Assoc. Prof Tran Minh Quang

1
Agenda
• Data warehouse
• Characteristic, modelling and basic functionality
• Decision support system
• Definition, operation and comparisons

2
Data warehouse definition

• A database that house analytical data


• Data is aggregate and format for analysis
and reports
• Using relational Database structure
• Can only be populate through a specific
mean( ELT )

3
Data warehouse definition

4
Data warehouse definition

• Data warehouse is typically used in online analytical


processing (OLAP) applications
• Decision support system (DSS) is also one application of
data warehouse

5
Data warehouse characteristics

• Data warehouse characteristics:


• Can only be populated through ELT or ETL process
• Data is not real time
• Data is store for a long time ( Historical )
• Loaded data doesn’t change
• Same relational structure and query language as normal DB

6
Data warehouse characteristics

• Variation of the data warehouse:


• Enterprise-wide data warehouses – provide analytics across the whole enterprise
• Virtual data warehouses – views of the transactional DB, build inside the DB itself
• Logical data warehouses - a system that provide data federation, distribution and virtualization
• Data marts – smaller DW, less input sources, targeting a subset of the enterprise

• Some important terms:


• Operational data store (ODS): DB that store transactional data before they transform for DW
• Analytical Data Store (ADS): DB that build for conducting analysis and reports

7
Data warehouse characteristics

• ELT vs ETL:
Extraction – Load - Transform Extraction – Transform - Load

Load the data into the staging before Transforming the data before loading into the
transforming staging

Take place within the warehouse Take place outside the warehouse ( external
application )

Using SQL Query Language to perform Using external application language ( Python,
R, … )

Suitable for larger amount of data Suitable for smaller amount of data

8
Data warehouse characteristics

• For loading the data warehouse, there are 2 kind:


• Initial loading:
• Load historical transactional data into the warehouse
• Only perform 1 time at the beginning of the DW
• Very time consuming due to the amount of data
• Delta loading:
• Update the DW by loading new data
• Periodically perform, based on the business need
• Smaller amount of data => faster loading time

9
Data warehouse characteristics

• Data warehouse characteristics:


• Multidimensional conceptual view
• Unlimited dimensions and aggregation levels
• Unrestricted cross-dimensional operations
• Dynamic sparse matrix handling
• Client/server architecture
• Multiuser support
• Accessibility
• Transparency
• Intuitive data manipulation
• Inductive and deductive analysis
• Flexible distributed reporting

10
Data warehouse modelling
• DW are modelled in a multi dimensional
model:
• 1D: Sales over time
• 2D: Sales over time for each products
• 3D: Sales over time for each products per
location
• 4D: Sales over time for each products per
location per manufacturer

11
Data warehouse modelling
Time Product Location Manufacturer Quantity
Q1 C Z 1 83
Q2 A X 1 81
Q1 B Y 1 17
Q1 B Z 1 84
Q3 B Z 2 88
Q3 B X 2 32
Q1
Q2
Q2
C
C
B
Y
X
X
2
2
1
80
58
27
• For example, here are some data from the sale of an
Q3
Q3
B
B
Y
Z
2
1
7
20
organization in 4D, represented by a 2D table
Q2 C Z 2 8
Q1 B Z 2 64
Q3 C X 1 87
Q3 C X 2 73
Q2 B Z 1 86
Q1 B X 2 48
Q3 B X 1 88
Q3 A Z 2 13
Q2 B Y 2 38
Q2 A Y 1 92
Q2 C Y 1 12

12
Data warehouse
modelling
• Multidimensional data can be
modelled in 2 way:
• The star schema:
• Central fact table pointing to
outside table

13
Data warehouse
modelling
• Multidimensional data can be
modelled in 2 way:
• The snowflake schema:
• Branching from the fact table

14
Data warehouse designing

• DW must support ad hoc querying = accessing data


with any filter and conditions apply on the attributes in
the fact table (dimension)
• The fact table design must fit the intended purpose of
the organization

15
Data warehouse designing

• The data extraction must involve these steps:


1. Extracting data from multiple sources
2. Formatted data for consistency in the DW
3. Cleaning the data’s errors ( and correct it at the source )
4. Fit the data to the DW (conversion)
5. Load the data to the DW
• These steps must be performed carefully, at an optimized frequency

16
Data warehouse designing

• Meta data of a data warehouse:


• Technical meta-data: how the data was collected, structure and operation of the DW
• Business meta-data: business rules, ownership, …

17
Data warehouse functionality

• Store historical data and analytical data


• Support quick and efficient query using:
• Query transformation
• Index intersection and union
• Special ROLAP (relational OLAP) and MOLAP (multidimensional OLAP) functions
• SQL extensions
• Advanced join methods
• And intelligent scanning

18
Data warehouse benefits and
downside
• Benefits:
• Relieve resources from the main database
• Prevent conflict when performing transaction during analysis query
• Act as a single point of contact for analytical data
• Store historical data
• Allow for tuning of the query for analysis requirements

19
Data warehouse benefits and
downsides
• Downsides:
• Additional task in construction and administration of the DB
• Must adapt to evolution of the source database
• Difficult to manage large multidimensional data
• Difficult to modify with change in requirements

20
Beyond Data Warehouse

• Data mart – A smaller DW for specific subset of an


enterprise
• Data lake – Also contain data from multiple source, but
unstructured
• Cloud – Also unstructured but doesn’t rely on local
server

21
Decision
Support
System
Characteristics of DSS
• DSS compose of DW, OLAP and DM technologies.

• DSS should give well structured information.

• DSS attempts to combine the use of models or analytic techniques with traditional data
access and retrieval functions.

• DSS specifically focuses on features which make them easy to use by non-computer
people in an interactive mode.

• DSS emphasizes flexibility and adaptability to accommodate changes in the environment


and the decision-making approach of the user.
Application of DSS
• Finance → Credit Card Analysis

• Insurance → Claims, Fraud Analysis

• Telecommunication → Call record analysis

• Transport → Logistics management

• Consumer goods → Promotion analysis

• Data Service providers → Value added data


Three-Tier DSS

• Bottom-tier
• Data warehouse server → A central repository of information
that can be analyzed to make more informed decisions.
• Middle-tier:
• OLAP Server → for fast querying of the data warehouse.
• Top-tier:
• Query and reporting tools
• Analysis tools
• Data mining tools
Complete DSS
OLAP vs OLTP
Parameters OLTP OLAP
It is an online transactional system. It OLAP is an online analysis and data
Process
manages database modification. retrieving process.
It is characterized small part of the It is characterized by a large volume of
Characteristic
database and transactions data.
Method OLTP uses traditional DBMS. OLAP uses the data warehouse.
Extraction, processing, and presentation
Insert, Update, and Delete data from
Query data for analytic and decision-making
the database.
purposes
OLTP and its transactions are the Different OLTP databases become the
Source
sources of data. source of data for OLAP.
OLAP database does not get frequently
OLTP database must maintain data
Data Integrity modified. Hence, data integrity is not an
integrity constraint.
issue.
Bases on the complex of data
Response time Short period of time.
warehouses.
The data in the OLTP database is always The data in OLAP process might not be
Data quality
detailed and organized. organized.
7. OLAP Operations
• There are five basic analytical
operations that can be performed
on an OLAP cube:
1. Roll up (or drill-up)
2. Drill down
3. Dice
4. Slice
5. Pivot
7. OLAP Operations
• Roll-up (also drill-up): Data is
summarized with increasing
generalization. It can be done by:
• Climbing up in the concept hierarchy
• Reducing the dimensions
• Example: In the cube given in the
overview section, the roll-up operation is
performed by climbing up in the concept
hierarchy of Location dimension (City ->
Country)

29
7. OLAP Operations
• Drill down: Increasing levels of detail
are revealed. It can be done by:
• Moving down in the concept hierarchy
• Adding a new dimension
• Example: In the cube given in
overview section, the drill down
operation is performed by moving
down in the concept hierarchy
of Time dimension (Quarter ->
Month).

30
7. OLAP Operations

• Slide and Dice: Projection operations are


performed on the dimensions.
• Example:
• - Dice is performed by selecting following dimensions
with criteria: Location = “Delhi” or “Kolkata”; Time =
“Q1” or “Q2”; Item = “Car” or “Bus”
• - Slice is performed on the dimension Time = “Q1”.

31
• Pivot: Cross tabulation (also referred to
7. OLAP Operations as rotation) is performed.
• Example: Pivot the sliced sub-cube

32
8. Types of OLAP Server
• Relational OLAP (ROLAP) : for large data volumes of
data → stored in relation tables → Static multi-
dimensional view of data.
• Schema: star, snowflake
• Product: MetaCube, Red Brick, AXSYS Suite

33
8. Types of OLAP Server
• Multidimensional OLAP (MOLAP) : limited data
volumes → stored in multidimensional array → Dynamic
multi-dimensional view of data.
• Schema: cube
• Product: Oracle Essbase, IBM Cognos, and Apache Kylin

34
8. Types of OLAP Server
• Hybrid OLAP (HOLAP): combination of ROLAP and
MOLAP → faster performance (by using MOLAP)
and more detailed information (by using ROLAP)
• Product: Microsoft Analysis Services and SAP AG BI
Accelerator

35
9. Approach to OLAP Server
• Relational OLAP (ROLAP):
• Relational DBMS to store and manage data warehouse.
• OLAP middleware to support missing pieces.
• Multidimensional OLAP (MOLAP):
• Array-based storage structures.
• Direct access to array data structures.
• Hybrid OLAP (HOLAP):
• Storing detailed data in RDBMS.
• Storing aggregated data in MDBMS.
• User access via MOLAP tools.
36
ROLAP vs. MOLAP
Characteristics ROLAP MOLAP

User star Schema User Data cubes


SCHEMA
Additional dimensions can be added dynamically Addition dimensions require recreation of data cube.

Database Size Medium to large Small to medium

Access Support ad-hoc requests


Limited to pre-defined dimensions

Resources
HIGH VERY HIGH

Flexibility
HIGH LOW

Scalability
HIGH LOW

Speed Good with small data sets. Faster for small to medium data sets.
Average for medium to large data set Average for large data sets.

37
CONCLUSION
• Data warehouse (DW) is use for storing analytical data

• DW store data in a multidimensional relation

• DW data is loaded using ETL or ELT

• Decision support system (DSS) composes of DW, OLAP, DM.

• DSS is based on OLAP and data mining, significantly improve on query systems

• In DSS, the data analysis and decisions main support technology is mainly OLAP technology
and data mining technology → OLAP is a significant improvement over query systems.

• OLAP has 5 basic analytical operations → performed on data cube.

• OLAP has 3 main types: ROLAP, MOLAP and HOLAP with different functions.
Backup Slides

Containing unused materials


Data warehouse modelling
Time Product Location Manufacturer Quantity
Q1 C Z 1 83
Q2 A X 1 81
Q1 B Y 1 17
Q1 B Z 1 84
Q3 B Z 2 88
Q3 B X 2 32
Q1
Q2
Q2
C
C
B
Y
X
X
2
2
1
80
58
27
• For example, here are some data from the sale of an
Q3
Q3
B
B
Y
Z
2
1
7
20
organization in 4D, represented by a 2D table
Q2 C Z 2 8
Q1 B Z 2 64
Q3 C X 1 87
Q3 C X 2 73
Q2 B Z 1 86
Q1 B X 2 48
Q3 B X 1 88
Q3 A Z 2 13
Q2 B Y 2 38
Q2 A Y 1 92
Q2 C Y 1 12

40
Data warehouse modelling
Time
Q1
Product
C
Location
Z
Manufacturer
1
Quantity
83
• To better view this data, we can pivot this data to view
Q2
Q1
A
B
X
Y
1
1
81
17 it from another view
Q1 B Z 1 84
Q3
Q3
B
B
Z
X
2
2
88
32 • Example: Time => Columm; Region => Rows; Quantity =>
Q1 C Y 2 80
Q2 C X 2 58 value
Q2 B X 1 27
Q3
Q3
B
B
Y
Z
2
1
7
20 • The aggregate function is sum of all product
Q2 C Z 2 8
Q1 B Z 2 64
Q3 C X 1 87
Q3 C X 2 73
Q2 B Z 1 86
Q1 B X 2 48 Row Labels Q1 Q2 Q3 Grand Total
Q3 B X 1 88 X 287 339 348 974
Q3 A Z 2 13 Y 265 310 293 868
Q2 B Y 2 38
Z 245 274 210 729
Q2 A Y 1 92
Q2 C Y 1 12 Grand Total 797 923 851 2571

41
Data warehouse modelling
Time
Q1
Product
C
Region
Z
Manufacturer
1
Quantity
83
• Another way to view multidimensional data is to slice
Q2
Q1
A
B
X
Y
1
1
81
17 it
Q1 B Z 1 84
Q3
Q3
B
B
Z
X
2
2
88
32 • Here is the view of the Region and Time dimension, of
Q1 C Y 2 80
Q2 C X 2 58 product A, made by manufacturer 2
Q2 B X 1 27
Q3 B Y 2 7
Q3 B Z 1 20
Product A
Q2 C Z 2 8 Manufacturer 2
Q1 B Z 2 64
Q3 C X 1 87
Q3 C X 2 73 Row Labels Q1 Q2 Q3 Grand Total
Q2 B Z 1 86 X 11 0 66 77
Q1 B X 2 48 Y 16 33 29 78
Q3 B X 1 88
Q3 A Z 2 13
Z 11 65 33 109
Q2 B Y 2 38 Grand Total 38 98 128 264
Q2 A Y 1 92
Q2 C Y 1 12

42
Data warehouse modelling
• The data can also be roll-up or drill-down
Time and Product
Q1 Q2 Q3
Time
Drill-down A B C A B C A B C
Q1 Q2 Q3

Region and manufacturer


1 79 89 41 98 17 83 99 89 63
X
X 287 339 348 2 11 24 43 0 62 79 66 2 29
Region Y 265 310 293 1 89 26 62 24 64 87 67 5 93
Roll-up Y
2 16 43 29 33 47 55 29 97 2
Z 245 274 210
1 34 21 94 88 4 56 6 3 75
Z
2 11 84 1 65 19 42 33 87 6

• Drill-down: provide finer view, of the data (limited by the lowest level of data )
• Roll-up: grouping data along a dimension, provide a more general view

43

You might also like