Professional Documents
Culture Documents
Session Objectives
Objectives: At the end of this session, you will be able to: > Define On Line Analytical Processing > Understand the need for OLAP and applications of OLAP in BI > Describe the various OLAP solutions and Architecture > Comparison of different OLAP architectures > Evaluation parameters to be considered for selecting an OLAP tool
What is OLAP?
> OLAP (On Line Analytical Processing) applications - designed for online ad-hoc data access and analysis. > Data organized into multiple dimensions. > Access to analytical content such as time series and trend analysis views and summary level information. > A set of functionality that attempts to facilitate multidimensional analysis. > Offers drill-down, drill-across and slice and dice capabilities.
3
Dimensions can we think in ? E.g. analysis by branch, product, agent, year !!! 2 or 3 Types of values we can handle ? E.g. Sales, Profit, Cost 1 or 2 How many levels can we handle ? E.g. number of products we can analyze
Many parameters affect a Measure (value) e.g Sales influenced by product, region, time, distribution channel, etc., Linear analysis = reports Many totals are at one level Difficult to identify the key parameters
OLAP in an Enterprise
Uses of OLAP
Departments: Finance Marketing Sales Manufacturing
Analytical Capabilities: > Used by analysts and managers. > Offers aggregated view of the data, such as total revenues by customer profile, by product line, by geographical regions.
8
> Provides the decision support front-end for data warehousing. > Advanced statistical, financial, and analytical calculations. > Appropriate tools to access data from a relational database. > Appropriate tools to access or manage multidimensional data.
10
Evolution of OLAP
Star Schema
> A Star Schema is a dimensional model created by mapping data entities from operational systems > It has a central table (fact table) that links all the other tables (dimension tables) together > Dimension: The same category of information. For example, year, month, day, and week are all part of the Time Dimension. > Measure: The property that can be summed or averaged using pre computed aggregates.
12
Gros s
it
Marg
Sa
in
P ro fita bili ty
ost C
> Facts or Measures are the Key Performance Indicators of an enterprise > Factual data about the subject area > Numeric, summarized
13
Dimension
e nu e ev e) r sR su le Sa M e a (
What was sold ? Whom was it sold to ? When was it sold ? Where was it sold ?
> Dimensions put measures in perspective > What, when and where qualifiers to the measures > Dimensions could be products, customers, time, geography etc.
14
Star Schema
15
16
17
CUBE
Cube Multi dimensional databases store information in the form of cubes. A cube is a collection of facts and related dimensions stored together in arrays. Geography
Sales
HR
Time Product
> Hierarchy: A hierarchy defines the navigating path for drilling up and drilling
down. All attributes in a hierarchy belong to the same dimension.
> Levels: These are organized into one or more hierarchies, typically from a
coarse-grained level (for example, Year) down to the most detailed one (for example, Day).
> Members: The individual category values (for example, 2002 or 21Jan2002). > Measures: These are the data values that are summarized and analyzed.
Examples of measures are sales figures or operational costs.
> Cells: These are the intersection of one member for every dimension and
store the data for measures.
19
Time
2001
Q1 Q2 Q3 Q4 Q1 QUARTER
Member
20
Aggregates
Add up amounts for day 1 In SQL: SELECT sum(amt) FROM SALE WHERE date = 1
sale prodId p1 p2 p1 p2 p1 p1 storeId s1 s1 s3 s2 s1 s2 date 1 1 1 1 2 2 amt 12 11 50 8 44 4
81
21
Aggregates
Add up amounts by day In SQL: SELECT date, sum(amt) FROM SALE GROUP BY date
sale prodId p1 p2 p1 p2 p1 p1 storeId s1 s1 s3 s2 s1 s2 date 1 1 1 1 2 2 amt 12 11 50 8 44 4
ans
date 1 2
sum 81 48
22
Another Example
Add up amounts by day, product In SQL: SELECT date, sum(amt) FROM SALE GROUP BY date, prodId
sale prodId p1 p2 p1 p2 p1 p1 storeId s1 s1 s3 s2 s1 s2 date 1 1 1 1 2 2 amt 12 11 50 8 44 4
sale
prodId p1 p2 p1
date 1 1 2
amt 62 19 48
rollup drill-down
23
Aggregates
> Operators: sum, count, max, min, median and avg > Having clause > Using dimension hierarchy average by region (within store) maximum by month (within date)
24
Multi-dimensional cube:
p1 p2 s1 12 11 s2 8 s3 50
dimensions = 2
25
3-D Cube
Multi-dimensional cube:
day 2 day 1
p1 p2
p1 p2
s1 44 s1 12 11 s2 8
s2 4 s3 50
s3
dimensions = 3
26
Example
roll-up to region
r to S
e
LA
NY SF 10 34 56 32 12 56 M T W Th F S S
Dimensions: Time, Product, Store roll-up to brand Attributes: Product (upc, price, ) Store Hierarchies: Product Brand Day Week Quarter roll-up to week Store Region Country
Product
Time
56 units of bread sold in LA on M
27
day 2 day 1
p1 p2
p1 p2
s1 44 s1 12 11 s2 8
s2 4 s3 50
s3
p1 p2
s1 56 11
s2 4 8
s3 50
sum
s1 67
s2 12
s3 50
129
p1 p2 sum 110 19
28
rollup drill-down
day 2 day 1
p1 p2
p1 p2
s1 44
s1 12 11 s2 8
s2 4
s3 50
s3
p1 p2
region A region B 56 54 11 8
29
day 2 day 1
p1 p2
p1 p2
s1 44 s1 12 11 s2 8
s2 4 s3 50
s3
TIME = day 1
s1 12 11 s2 8 s3 50
p1 p2
30
OLAP - Classification
Online Analytical Processing (OLAP) can be done on: > Relational databases > Multidimensional databases OLAP products are grouped into three categories: > Relational OLAP (ROLAP) > Multidimensional OLAP (MOLAP) > Hybrid OLAP (HOLAP)
32
MOLAP
Brand
Geography
Multi-dimensional OLAP MOLAP is a technology which uses a multi-dimensional database that stores data as n-dimensional cube
Ag e
Gr ou p
33
Architecture of MOLAP
l ica it Cr e Siz
non-live connection Used for updating the MOLAP data cube only
e ub C
LAN
Data Mart Server RDBMS Connectivity Middleware MOLAP Server MDDBMS/Data Cube MOLAP Application
Issues: Size of Data Cube Cubes deployment Size of Update Data Set
Router Firewall
34
MOLAP Products
35
Architecture of ROLAP
LAN
Router / Firewall
Intranet Internet Thin Clients WWW Browser
36
ROLAP Products
Brio Query Enterprise Business Objects Metacube DSS Server Information Advantage
37
Architecture of HOLAP
LAN
38
HOLAP Products
SAS
39
MOLAP Vs ROLAP
Comparison of Architectures
A rchitectural Features
N ber of D ensions um im Support for Large num ber of users Scalability C plex om M ultidim ensional analysis Volum of D e ata storage
M LA O P
Ten or Less Lim ited support Poor Easier to achieve U to 50 G p B
R LA O P
U nlim ited G ood G ood D ifficult to achieve H undreds of G igabytes and Terabytes SQ result sets L N orm al SQ L Stores D etailed as w as sum arized ell m data
Storage of Inform ation U ser Interface & functionality C m access om on language N ature of D ata
41
MOLAP
Essentially the definition of dimensional model and calculation rules
ROLAP
It uses twodimensional tables that are stored in RDBMSs. (Data is stored in Star schema or Snow flake schema.) Summary tables are implemented in the relational database
Aggregation techniques
Measures are precalculated and stored at each hierarchy summary level during load time Drill down, Drill up, Drill across and Slicing /Dicing Instant response Supports complex functions like %change, ranking etc., Calculated from cubes
Drill down, Drill up, Slicing and Dicing Slower Limited value added functions Calculated (On the fly )from the database
42
Parameters
Processing Over head for large input data sets Support for frequent updates Resource requirements Industry standard Access to the database through ODBC
MOLAP
High Cannot handle frequent update of cubes High No current standards The databases have proprietary API and do not provide access through ODBC.
ROLAP
Low Suitable for frequent updates Low SQL standard Provides access through ODBC
43
45
Ability to support various deployments such as stand-alone, high speed client/server, intranet, extranet, Internet
46
Which is Preferred ?
Features
Calculation intensity, complexity Data Sparsity Database Update Data Volatility Volume of Data Development time, learning curve Standards, interoperability Query response time Consistency, Reliability Data Loading time Security Network impact Vendor Stability
MOLAP
ROLAP
47
OLAP - Summary
> Offers Fast, flexible data summarization and analysis. > OLAP servers are a superior technology for BI applications. > Ability to summarize data in multiple ways and view trends over time. > OLAP servers and relational databases can work in harmony.
48
Session Summary
In this session, We have > Understood the need for OLAP and significance of Multidimensional analysis in a Data Warehouse. > Discussed about the evolution of OLAP. > Explained architectures, characteristics as well as the merits and demerits of various OLAP solutions.
49
Thank you