OLAP – On Line Analytical Processing

Session Objectives
Objectives: At the end of this session, you will be able to: > Define On Line Analytical Processing > Understand the need for OLAP and applications of OLAP in BI > Describe the various OLAP solutions and Architecture > Comparison of different OLAP architectures > Evaluation parameters to be considered for selecting an OLAP tool


What is OLAP?
> OLAP (On Line Analytical Processing) applications - designed for online ad-hoc data access and analysis. > Data organized into multiple dimensions. > Access to analytical content such as time series and trend analysis views and summary level information. > A set of functionality that attempts to facilitate multidimensional analysis. > Offers drill-down, drill-across and slice and dice capabilities.


OLAP - Fast Analysis
On Line Analytical Processing

No piles of paper, please! Establish patterns Data-based

Fast Analysis of Shared Multidimensional Information


• Dimensions can we think in ? E.g. year !!! 2 or 3 • Types of values we can handle ? E.g. Cost 1 or 2 • How many levels can we handle ? E.g. Sales, Profit, number of products we can analyze 5

Many parameters affect a Measure (value) e.g Sales influenced by product, region, distribution channel, time, etc. Linear analysis = reports Many totals are at one level Difficult to identify the key parameters

OLAP in an Enterprise

Uses of OLAP Departments:  Finance  Marketing  Sales  Manufacturing Analytical Capabilities: > Used by analysts and managers. > Offers aggregated view of the data, such as total revenues by customer profile, by geographical regions, by product line.

Functionality of OLAP Tools > Provides the decision support front-end for data warehousing. > Appropriate tools to access data from a relational database. > Advanced statistical, financial, and analytical calculations. > Appropriate tools to access or manage multidimensional data.

Features of OLAP Applications OLAP analytical features > Multi-dimensional views of data > Calculation intensive capabilities > Time intelligence The OLAP Calculation engine in OLAP tools have a wide range of built-in calculations such as: > Ratios > Time calculations > Statistics > Ranking > Custom formulas/algorithms > Forecasting and modeling

Evolution of OLAP .

> A Star Schema is a dimensional model created by mapping data entities from operational systems > It has a central table (fact table) that links all the other tables (dimension tables) together > Dimension: The same category of information. For example, month, day, week, and year are all part of the Time Dimension. > Measure: The property that can be summed or averaged using pre computed aggregates.

Facts and Measures > Facts or Measures are the Key Performance Indicators of an enterprise > Factual data about the subject area > Numeric, summarized

Dimension What was sold ? Whom was it sold to ? When was it sold ? Where was it sold ? > Dimensions put measures in perspective > What, when and where qualifiers to the measures > Dimensions could be products, customers, time, geography etc.

Star Schema

Star Schema Example

Star Schema with Sample Data

CUBE – Multi dimensional databases store information in the form of cubes. – A cube is a collection of facts and related dimensions stored together in arrays. Geography Sales HR Time Product

Basic Terminology of a Cube > Measures: These are the data values that are summarized and analyzed. Examples of measures are sales figures or operational costs. > Hierarchy: A hierarchy defines the navigating path for drilling up and drilling down, typically from a coarse-grained level (for example, Year) down to the most detailed one (for example, Day). All attributes in a hierarchy belong to the same dimension. > Levels: These are organized into one or more hierarchies. > Members: The individual category values (for example, 2002 or 21Jan2002). > Cells: These are the intersection of one member for every dimension and store the data for measures.

Basic Terminology of a Cube Time > Dimensions consist of – Dimension Name 1999 Level 2000 YEAR Of Detail 2001 – Level – Hierarchy – Member Q3 Q4 Q1 Q2 Q3 Q4 Q1 QUARTER Q2

Aggregates  Add up amounts for day 1  In SQL: SELECT sum(amt) FROM SALE WHERE date = 1 sale prodId p1 p2 p1 p2 p1 p1 storeId s1 s1 s3 s2 s1 s2 date 1 1 1 1 2 2 amt 12 11 50 8 44 4 81

Aggregates  Add up amounts by day  In SQL: SELECT date, sum(amt) FROM SALE GROUP BY date sale prodId p1 p2 p1 p2 p1 p1 storeId s1 s1 s3 s2 s1 s2 date 1 1 1 1 2 2 amt 12 11 50 8 44 4 ans date 1 2 sum 81 48

Another Example  Add up amounts by day, product  In SQL: SELECT date, prodId sale prodId p1 p2 p1 p2 p1 p1 storeId s1 s1 s3 s2 s1 s2 date 1 1 1 1 2 2 amt 12 11 50 8 44 4 sale prodId p1 p2 p1 date 1 1 2 amt 62 19 48 rollup drill-down

Aggregates > Operators: sum, min, max, count, median and avg > "Having" clause > Using dimension hierarchy – average by region (within store) – maximum by month (within date)

The MOLAP Cube Fact table view: sale prodId p1 p2 p1 p2 storeId s1 s1 s3 s2 amt 12 11 50 8 Multi-dimensional cube: p1 p2 s1 12 11 s2 8 s3 50 dimensions = 2

3-D Cube Fact table view: sale prodId p1 p2 p1 p2 p1 p1 storeId s1 s1 s3 s2 s1 s2 date 1 1 1 1 2 2 amt 12 11 50 8 44 4 Multi-dimensional cube: day 2 day 1 p1 p2 s1 p1 12 p2 11 s1 44 s2 8 s2 4 s3 50 s3 dimensions = 3

Example roll-up to region NY SF LA Juice Milk Coke Cream Soap Bread 10 34 56 32 12 56 M T W Th F S S Dimensions: Time, Product, Store roll-up to brand Attributes: Product (upc, price, …) Store … … Hierarchies: Product  Brand  … Day  Week  Quarter 56 units of bread sold in LA on M

Cube Aggregation: Roll-up Example: computing sums day 2 day 1 p1 p2 s1 p1 12 p2 11 s1 44 s2 8 s2 4 s3 50 s3 .. sum p1 p2 s1 56 11 s2 4 8 s3 50 s1 67 s2 12 s3 50 129 p1 p2 sum 110 19 rollup drill-down

Aggregation Using Hierarchies day 2 day 1 p1 p2 s1 p1 12 p2 11 s1 44 s2 8 s2 4 s3 50 s3 store region country p1 p2 region A region B 56 54 11 8 (store s1 in Region A, stores s2, s3 in Region B)

Slicing  In SQL: SELECT * FROM SALE WHERE date = 1 day 2 day 1 p1 p2 s1 p1 12 p2 11 s1 44 s2 8 s2 4 s3 50 s3 TIME = day 1 s1 12 11 s2 8 s3 50 p1 p2

OLAP Solutions and Architecture .

OLAP - Classification Online Analytical Processing (OLAP) can be done on: > Relational databases > Multidimensional databases OLAP products are grouped into three categories: > Relational OLAP (ROLAP) > Multidimensional OLAP (MOLAP) > Hybrid OLAP (HOLAP)

MOLAP Brand Geography  Multi-dimensional OLAP  MOLAP is a technology which uses a multi-dimensional database that stores data as n-dimensional cube

Architecture of MOLAP non-live connection •Used for updating the MOLAP data cube only LAN Data Mart Server •RDBMS •Connectivity Middleware MOLAP Server •MDDBMS/Data Cube •MOLAP Application Desktop Systems MOLAP Client Tools Router Firewall Issues: • Size of Data Cube • Cubes deployment • Size of Update Data Set Intranet Internet Thin Clients •WWW Browser

MOLAP Products  Oracle's Oracle Express Server  Cognos - Powerplay Transformer  Essbase (Hyperion Software)  Holos (Seagate Software)

Architecture of ROLAP LAN Data Mart Server •RDBMS •Connectivity Middleware ROLAP Server •ROLAP Application Desktop Systems ROLAP Client Tools Issues: • Aggregate Awareness • Response Time •Network Capacity Router / Firewall Intranet Internet Thin Clients •WWW Browser

ROLAP Products  Brio Query Enterprise  Business Objects  Metacube  DSS Server  Information Advantage

Architecture of HOLAP LAN MOLAP Server •MDDBMS/Data Cube •MOLAP Application ROLAP Server •ROLAP Application Desktop Systems HOLAP Client Tools Router/Fire wall Issues: •Cube elements •Integration with RDBMS

HOLAP Products  Holos (Seagate Software)  Microsoft SQL Server OLAP Services  Pilot Software's Pilot Decision Support Suite  SAS


Comparison of Architectures Architectural Features Number of Dimensions MOLAP Ten or Less ROLAP Unlimited Support for Large number of users Scalability Complex Multidimensional analysis Volume of Data storage Limited support Good Poor Easier to achieve Good Difficult to achieve Up to 50 GB Hundreds of Gigabytes and Terabytes Storage of Information Through cubes SQL result sets User Interface & functionality Common access language Nature of Data Good Normal NA SQL Stores summarized data Stores Detailed as well as summarized data

Strength and Weakness of MOLAP/ROLAP Parameters Application design MOLAP Essentially the definition of dimensional model and calculation rules ROLAP It uses twodimensional tables that are stored in RDBMSs. (Data is stored in Star schema or Snow flake schema.) Summary tables are implemented in the relational database Aggregation techniques Measures are precalculated and stored at each hierarchy summary level during load time Drill down, Drill up, Drill across and Slicing /Dicing Instant response Supports complex functions like %change, ranking etc. Calculated from cubes Multidimensional analysis Query performance Value added functions Drill down, Drill up, Slicing and Dicing Slower Limited value added functions User – defined calculations Calculated (On the fly )from the database

Strength and Weakness of MOLAP/ROLAP Parameters Processing Over head for large input data sets Support for frequent updates Resource requirements Industry standard Access to the database through ODBC MOLAP High Cannot handle frequent update of cubes High No current standards The databases have proprietary API and do not provide access through ODBC. ROLAP Low Suitable for frequent updates Low SQL standard Provides access through ODBC

Session Summary In this session, We have > Understood the need for OLAP and significance of Multidimensional analysis in a Data Warehouse. > Discussed about the evolution of OLAP. > Explained architectures, characteristics as well as the merits and demerits of various OLAP solutions.

Thank you .

