This action might not be possible to undo. Are you sure you want to continue?
Data Warehousing OLAP Technology
What is Data Warehouse?
Defined in many different ways, but not rigorously. A decision support database that is maintained separately from the organization’s operational database Support information processing by providing a solid platform of consolidated, historical data for analysis. ―A data warehouse is a subject-oriented, integrated, timevariant, and nonvolatile collection of data in support of management’s decision-making process.‖—W. H. Inmon Data warehousing: The process of constructing and using data warehouses
January 28, 2013
Organized around major subjects, such as customer, product, sales. Focusing on the modeling and analysis of data for
decision makers, not on daily operations or
Provide a simple and concise view around
particular subject issues by excluding data that
are not useful in the decision support process.
January 28, 2013 3
Constructed by integrating multiple, heterogeneous data sources relational databases, flat files, on-line transaction records Data cleaning and data integration techniques are applied. Ensure consistency in naming conventions, encoding structures, attribute measures, etc. among different data sources
E.g., Hotel price: currency, tax, breakfast covered, etc.
When data is moved to the warehouse, it is converted.
January 28, 2013
Data Warehouse—Time Variant
The time horizon for the data warehouse is significantly longer than that of operational systems.
Operational database: current value data. Data warehouse data: provide information from a historical perspective (e.g., past 5-10 years) Contains an element of time, explicitly or implicitly But the key of operational data may or may not contain ―time element‖.
Every key structure in the data warehouse
January 28, 2013
January 28.Data Warehouse—Non-Volatile A physically separate store of data transformed from the operational environment. Operational update of data does not occur in the data warehouse environment. recovery. 2013 6 . and concurrency control mechanisms Requires only two operations in data accessing: initial loading of data and access of data. Does not require transaction processing.
compete for resources Data warehouse: update-driven.Data Warehouse vs. and the results are integrated into a global answer set Complex information filtering. a metadictionary is used to translate the query into queries appropriate for individual heterogeneous sites involved. high performance Information from heterogeneous sources is integrated in advance and stored in warehouses for direct query and analysis 7 January 28. 2013 . Heterogeneous DBMS Traditional heterogeneous DB integration: Build wrappers/mediators on top of heterogeneous databases Query driven approach When a query is posed to a client site.
payroll.Data Warehouse vs. consolidated Database design: ER + application vs. evolutionary. market Data contents: current. Operational DBMS OLTP (on-line transaction processing) Major task of traditional relational DBMS Day-to-day operations: purchasing. read-only but complex queries 8 OLAP (on-line analytical processing) Distinct features (OLTP vs. integrated Access patterns: update vs. banking. historical. star + subject View: current. Major task of data warehouse system Data analysis and decision making User and system orientation: customer vs. registration. etc. accounting. 2013 . detailed vs. manufacturing. inventory. OLAP): January 28. local vs.
response 9 usage access unit of work # records accessed #users DB size metric January 28. key short. simple transaction tens thousands 100MB-GB transaction throughput OLAP knowledge worker decision support subject-oriented historical. flat relational isolated repetitive read/write index/hash on prim. up-to-date detailed. summarized. OLAP OLTP users function DB design data clerk.OLTP vs. consolidated ad-hoc lots of scans complex query millions hundreds 100GB-TB query throughput. 2013 . multidimensional integrated. IT professional day to day operations application-oriented current.
Data Warehouse Design .
2013 12 .Objectives What is OLAP Need for OLAP Features & functions of OLAP Different OLAP models OLAP implementations January 28.
2013 13 .Demand for OLAP To develop DM. Data Marts rest on Dimensional Model Data Marts are sufficient for basic data analysis Users need to go beyond such basic analysis January 28. three approaches In all approaches.
2013 14 .Demand for OLAP Need for Multidimensional Analysis Fast Access & Powerful Calculations Limitations of other analysis methods like: SQL Spreadsheets Report Writers January 28.
spreadsheets. Tools used with OLTP and basic DW environments do not match up to the task January 28.Demand for OLAP Traditional tools of report writers. query products. 2013 15 . & language interfaces do not match the user expectations as far as performing multidimensional analysis with complex calculations is concerned.
interactive. 2013 16 . and executives to gain insight into the data through fast. managers. consistent. January 28.OLAP is the Answer! OLAP is a category of software technology that enables analysts. access in a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user.
2013 17 .Why is OLAP useful? Facilitates multidimensional data analysis by pre-computing aggregates across many sets of dimensions Provides for: Greater speed and responsiveness Improved user interactivity January 28.
an n-D base cube is called a base cuboid. 18 January 28. 2013 . is called the apex cuboid. which holds the highest-level of summarization.Data Warehouses A data warehouse is based on a multidimensional data model which views data in the form of a data cube A data cube allows data to be modeled and viewed in multiple dimensions In data warehousing literature. The lattice of cuboids forms a data cube. The top most 0-D cuboid.
2013 19 .item.supplier time.supplier 4-D(base) cuboid time.supplier location.location.supplier time.location item. location.supplier time.location item.location.item.Lattice of Cuboids all time item location supplier 0-D(apex) cuboid 1-D cuboids time. item.supplier 3-D cuboids item. supplier January 28.item time.location 2-D cuboids time.
2013 20 .CUBE Fact table view: sale prodId p1 p2 p1 p2 p1 p1 storeId c1 c1 c3 c2 c1 c2 date 1 1 1 1 2 2 amt 12 11 50 8 44 4 Multi-dimensional cube: c1 44 c2 8 c2 4 c3 50 c3 day 2 day 1 p1 p2 c1 p1 12 p2 11 dimensions = 3 January 28.
2013 21 .Aggregates • Add up amounts for day 1 • In SQL: SELECT sum(amt) FROM SALE WHERE date = 1 sale prodId storeId p1 c1 p2 c1 p1 c3 p2 c2 p1 c1 p1 c2 date 1 1 1 1 2 2 amt 12 11 50 8 44 4 81 January 28.
sum(amt) FROM SALE GROUP BY date sale prodId storeId p1 c1 p2 c1 p1 c3 p2 c2 p1 c1 p1 c2 date 1 1 1 1 2 2 amt 12 11 50 8 44 4 ans date 1 2 sum 81 48 January 28. 2013 22 .Aggregates • Add up amounts by day • In SQL: SELECT date.
max. 2013 23 . min. median. count. avg “Having” clause Using dimension hierarchy average by region (within store) maximum by month (within date) January 28.Aggregates Operators: sum.
.Cube Aggregation Example: computing sums day 2 day 1 p1 p2 c1 p1 12 p2 11 c1 44 c2 8 c2 4 c3 50 c3 . p1 p2 c1 56 11 c2 4 8 c3 50 sum c1 67 c2 12 c3 50 129 p1 p2 sum 110 19 24 rollup drill-down January 28.. 2013 .
Cube Operators day 2 day 1 p1 p2 c1 p1 12 p2 11 c1 44 c2 8 c2 4 c3 50 c3 .*) sum c1 67 c2 12 c3 50 p1 p2 c1 56 11 c2 4 8 c3 50 129 p1 p2 sum 110 19 sale(c2. sale(c1.*.*) sale(*.*.*) January 28.p2...*) 25 . 2013 sale(*.p1.
Extended Cube * day 2 day 1 p1 p2 * p1 p2 * c1 12 11 23 p1 p2 * c1 44 c1 56 11 67 c2 4 c2 4 8 12 c3 c3 50 * 50 48 48 * 110 19 129 44 c2 8 8 4 c3 50 50 sale(*.*) * 62 19 81 January 28.p2. 2013 26 .
customers c2.Aggregation Using Hierarchies day 2 day 1 p1 p2 c1 p1 12 p2 11 c1 44 c2 8 c2 4 c3 50 c3 customer region country p1 p2 region A region B 56 54 11 8 (customer c1 in Region A. c3 in Region B) January 28. 2013 27 .
Pivoting Fact table view: sale prodId storeId p1 c1 p2 c1 p1 c3 p2 c2 p1 c1 p1 c2 date 1 1 1 1 2 2 amt 12 11 50 8 44 4 Multi-dimensional cube: day 2 day 1 c1 44 c2 8 c2 4 c3 50 c3 p1 p2 c1 p1 12 p2 11 p1 p2 c1 56 11 c2 4 8 c3 50 January 28. 2013 28 .
date product. 2013 .Cube Aggregates Lattice 129 all p1 c1 67 c2 12 c3 50 city product date city. date use greedy algorithm to decide what to materialize 29 day 2 day 1 c1 c2 c3 p1 44 4 p2 c1 c2 c3 p1 12 50 p2 11 8 city. date January 28. product. product p1 p2 c1 56 11 c2 4 8 c3 50 city.
Dimension Hierarchies all cities city c1 c2 state CA NY state city January 28. 2013 30 .
date product. January 28. product.. product city. product.Dimension Hierarchies all city product date city. date state. date not all arcs shown. product state. 2013 31 . date city. date state state..
Interesting Hierarchy all years weeks quarters time day 1 2 3 4 5 6 7 8 week 1 1 1 1 1 1 1 2 month 1 1 1 1 1 1 1 1 quarter 1 1 1 1 1 1 1 1 year 2000 2000 2000 2000 2000 2000 2000 2000 months conceptual dimension table days January 28. 2013 32 .
A.S. 2013 In Mexico Total Q2 sales Total Q1 sales all countries In TOTAL SALES Country Total annual sales of TV in U.A VCR in U. Total sales Total annual sales of In U.SAMPLE CUBE TV PC VCR Total sum Q1 sales 1Qtr 2Qtr Date 3Qtr In U.A Total Q1 sales In Canada Total Q1 sales In Canada Total sales Mexico sum In Mexico In all countries January 28.S. Canada Total sales 33 .S.S.S.A.A.A of PC in U.S. 4Qtr sum Total annual sales U.
OLAP Operations Roll-Up Drill-Down Slice & Dice Pivot Drill-Across Drill-Through January 28. 2013 34 .
2013 35 .OLAP Operations January 28.
Slicing January 28. 2013 36 .
Dicing (Sub-cube) January 28. 2013 37 .
Roll-Up January 28. 2013 38 .
2013 39 .Drill-Down January 28.
Other OLAP Operations o Drill-Across: Queries involving more than one fact table o Drill-Through: Makes use of SQL to drill through the bottom level of a data cube down to its back-end relational tables o Pivot (rotate): Pivot (also called "rotate") is a visualization operation which rotates the data axes in view in order to provide an alternative presentation of the data. Other examples include rotating the axes in a 3-D cube. or transforming a 3-D cube into a series of 2D planes. 2013 40 . January 28.
Other OLAP Operations o Moving Averages o Growth Rates o Depreciation o Currency Conversion o Statistical Functions o Top N or Bottom N queries January 28. 2013 41 .
Actual The “cube” is a logical way of visualizing the data in an OLAP setting Not how the data is actually represented on disk Two ways of storing data: ROLAP: Relational OLAP MOLAP: Multidimensional OLAP January 28. 2013 42 .Conceptual vs.
2013 43 .OLAP & CUBE Construction of the data cube is key to the operation of OLAP The computation process creates a set of aggregates on the various dimensions of the data The CUBE operator January 28.
2013 44 .An example of the CUBE Operator January 28.
Pirahesh. Data Mining and Knowledge Discovery. Layman. S. 1:29-54. cross-tab and sub-totals.The CUBE Operator Proposed by Gray et al* Effectively involves a series of GROUP-BY operations to aggregate data Creates power set on all attributes according to: A measure An aggregator function *J. Bosworth. Gray. Chaudhuri. Data cube: A relational aggregation operator generalizing group-by. A. A.D. 1997. January 28. M. 2013 45 . Venkatrao. F. Reichart. Pellow and H.
consume less memory.CUBING Problem Problem: this generates a lot of data and work (2n sets in total. where n is the number of dimensions) Solution: optimized algorithms to run faster. January 28. and perform fewer I/Os. 2013 46 .
January 28. M. Ramakrishnan and S. Naughton. R. Deshpande. In VLDB'96. P.Sarawagi. In SIGMOD'97. Agrawal. Deshpande. On the computation of multidimensional aggregates. Zhao.Efficient Computation of Data Cubes o o ROLAP-based cubing algorithms (Agarwal et al’96) Array-based cubing algorithm (Zhao et al’97) S.Gupta. 2013 47 . M. R. J. Naughton. F. A. P. F. An array-based algorithm for simultaneous multidimensional aggregates. Y. Agarwal. and J.
where Li is the number of levels January 28. 2013 48 .Efficient Computation of Data Cubes o o o o How many cuboids in a cube with 3 dimensions? Answer: As many group by operations? No hierarchies involved!! o o o associated with dimension I 10 dimensions & 4 levels for each dimension Total Cuboids = 510 π (Li +1).
HOLAP January 28. 2013 49 .Approaches to OLAP Servers It is all about which DBMS you choose to store your data warehouse data RDBMS – ROLAP MDDB – MOLAP BOTH .
2013 50 .Approaches to OLAP Servers Three possibilities for OLAP servers (1) Relational OLAP (ROLAP) Relational and specialized relational DBMS to store and manage warehouse data OLAP middleware to support missing pieces (2) Multidimensional OLAP (MOLAP) Array-based storage structures Direct access to array data structures (3) Hybrid OLAP (HOLAP) Storing detailed data in RDBMS Storing aggregated data in MDBMS User access via MOLAP tools January 28.
DBMS).ROLAP Special schema design: star. tend to outperform specialized MDDB especially on large data sets Products IBM DB2. Sybase IQ. Oracle. multi-table join Proven technology (relational model. Informix 51 January 28. RedBrick. snowflake Special indexes: bitmap. 2013 .
2013 52 .MOLAP MDDB: a special-purpose data model Facts stored in multi-dimensional arrays Dimensions used to index array Sometimes on top of relational DB Products Pilot. Gentia January 28. Arbor Essbase.
2013 53 . MOLAP January 28.ROLAP vs.
HOLAP o Best of both worlds Storing detailed data in RDBMS Storing aggregated data in MDBMS User access via MOLAP tools o o o January 28.Hybrid OLAP . 2013 54 .
HOLAP RDBMS Server MDBMS Server Multidimensional access Multidimensional data SQL-Reach Through Client SQL-Read User data Meta data Derived data Multidimensional Viewer Relational Viewer SQL-Read January 28. 2013 55 .
Your data is over 100 GB B. Lowest level already aggregated E. Data access on aggregated level F. 2013 56 . or HOLAP IF A. IF A. You’re developing a general-purpose application for inventory movement or assets management THEN Consider an MDD /MOLAP solution for your data mart IF A. Your data is under 50 GB C. Your timetable to implement is 60-90 days D. Different user groups C. You have a "read-only" requirement C. Data assigned to lowest level elements THEN Consider an RDBMS/ROLAP solution for your data mart.ROLAP. Detailed access. Historical data at the lowest level of granularity D. You require write access B. OLAP on aggregated and detailed data B. Ease of use and detailed data THEN Consider an HOLAP for your data mart January 28. MOLAP. long-running queries E.
HOLAP is emerging as the OLPA server of choice January 28. though currently data warehouses are predominantly built using RDBMSs/ROLAP. 2013 57 .Conclusions ROLAP: RDBMS -> star/snowflake schema MOLAP: MDDB -> Cube structures ROLAP or MOLAP: Data models used play major role in performance differences MOLAP: for summarized and relatively lesser volumes of data (100GB) ROLAP: for detailed and larger volumes of data Both storage methods have strengths and weaknesses The choice is requirement specific.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.