Professional Documents
Culture Documents
Aalborg University 2007 - DWML course 3 Aalborg University 2007 - DWML course 4
Data Analysis Problems Data Warehousing
• The same data found in many different systems • Solution: new analysis environment (DW) where data are
Example: customer data across different departments
Subject oriented (versus function oriented)
The same concept is defined differently
Integrated (logically and physically)
• Heterogeneous sources Time variant (data can always be related to time)
Relational DBS, On-Line Transaction Processing (OLTP) Stable (data not deleted, several versions)
Unstructured data in files (e.g., MS Excel) and Supporting management decisions (different organization)
documents (e.g., MS Word)
• A good DW is a prerequisite for successful BI
• Data is suited for operational systems
Accounting, billing, etc.
• “Getting multidimensional data into the DW”
Do not support analysis across business functions • Data from the operational systems are
• Data quality is bad Extracted
Cleansed
Missing data, imprecise data, different use of systems
Transformed
• Data are “volatile” Aggregated?
Data deleted in operational systems (6 months)
Loaded into DW
Data change over time – no historical information
Aalborg University 2007 - DWML course 5 Aalborg University 2007 - DWML course 6
Bus architecture
Aalborg University 2007 - DWML course 11 Aalborg University 2007 - DWML course 12
n x m versus n + m Top-down vs. Bottom-up
Appl. Appl.
D-App D-Appl.
DB DM DB DM
Trans.
Appl. Appl.
D-Appl.
DB DB
D-App
DM
DM Trans.
Appl.
DB Appl.
DB DW
Trans.
In-between:
Appl. D-App Appl. 1. Design of DW for D-Appl.
DM
DM1 DM
DB DB
Trans. 2. Design of DM2 and Bottom-up:
Appl. Appl. integration with DW 1. Design of DMs
DB Top-down:DB 3. Design of DM3 and 2. Maybe integration
1. Design of DW integration with DW of DMs in DW
inflexible, expensive 2. Design of DMs 4. ... 3. Maybe no DW
Aalborg University 2007 - DWML course 13 Aalborg University 2007 - DWML course 14
Aalborg University 2007 - DWML course 15 Aalborg University 2007 - DWML course 16
On-Line Analytical Processing (OLAP) Performance Optimization
• Performance optimization
• On-Line Analytical Processing
Fine tune performance for important queries
Interactive analysis 102 Aggregates, indexing, other optimizations (environment,
Explorative discovery partitioning)
Fast response times required 250 • Using aggregates
• OLAP operations How can aggregates improve performance?
Aggregation, e.g., SUM
All Time • Choosing aggregates
Starting level, (Year, City) Which aggregates should we materialize?
Roll Up: Less detail • Maintaining views
Drill Down: More detail How do we keep the (aggregate) views up to date?
20 25
Slice/Dice: Selection, Year=2000
70 57
• Bitmapped indices
Aalborg University 2007 - DWML course 17 Aalborg University 2007 - DWML course 18
Aalborg University 2007 - DWML course 19 Aalborg University 2007 - DWML course 20
Data’s Way To The DW DW Applications: Visualization
• Extraction • Graphical presentation of complex result
Extract from many heterogeneous systems • Color, size, and form help to give a better overview
• Staging area
Large, sequential bulk operations => flat files best?
• Cleansing
Data checked for missing parts and erroneous values
Default values provided and out-of-range values marked
• Transformation
Data transformed to decision-oriented format
Data from several sources merged, optimize for querying
• Aggregation?
Are individual business transactions needed in the DW?
• Loading into DW
Large bulk loads rather than SQL INSERTs
Fast indexing (and pre-aggregation) required
Aalborg University 2007 - DWML course 21 Aalborg University 2007 - DWML course 22
Aalborg University 2007 - DWML course 23 Aalborg University 2007 - DWML course 24
Common DW Issues Summary
• Metadata management
Need to understand data = metadata needed
• Why Business Intelligence?
Greater need that in OLTP applications as “raw” data is used
Need to know about: • Data analysis problems
◆ Data definitions, dataflow, transformations, versions, usage, security • Data Warehouse (DW) introduction
• DW project management • Analysis technologies that use the DW
DW projects are large and different from ordinary SW projects OLAP
◆ 12-36 months and US$ 1+ million per project Data mining
◆ Data marts are smaller and “safer” (bottom up approach) Visualization
Reasons for failure
• BI can provide many advantages to your organization
◆ Lack of proper design methodologies
A good DW is a prerequisite for BI
◆ High HW+SW cost (not so much anymore)
◆ Deployment problems (lack of training) But, a DW is a means rather than a goal…it is only when it is
heavily used that success is achieved
◆ Organizational change is hard… (new processes, data ownership,..)
◆ Ethical issues (security, privacy,…)
Aalborg University 2007 - DWML course 25 Aalborg University 2007 - DWML course 26
Aalborg University 2007 - DWML course 27 Aalborg University 2007 - DWML course 28