Quotient Cube: How to Summarize the Semantics of a Data Cube

Laks V.S. Lakshmanan (Univ. of British Columbia)* Jian Pei (State Univ. of New York at Buffalo)* Jiawei Han (Univ. of Illinois at Urbana-Champaign)+
* The work is partially supported by NSERC and NCE/IRIS
+

The work is partially supported by NSF, UI, and Microsoft Research

Outline
• • • • • • Introduction and motivation Cube lattice partitions Semantics preserving partitions Algorithms Experimental results Discussion and summary

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

2

Data Cube
Base table
Dimensions Store S1 S1 S2 Product P1 P2 P1 Season Spring Spring Fall Measure Sales 6 12 9 Store S1 S1 S2 S1 … * Dimensions Product P1 P2 P1 * … * Season Spring Spring Fall Spring … * Measure AVG(Sales) 6 12 9 9 … 9

Aggregation

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

3

Previous Work: Efficient Cube Computation
• Compute a cube from a base table: e.g. (Agarwal et al. 98), (Zhao et al. 97) • View materialization with space constraint: e.g. Harinarayann et al. 96 • Handling scarcity (Ross & Srivastava 97) • Cube compression: e.g. (Sismanis et al. 02), (Shanmugasundaram et al. 99), (Want et al. 02) • Approximation: e.g. (Barbara & Sullivan 97), (Barbara & Xu 00), (Vitter et al. 98) • Constrained cube construction: e.g. (Beyer & Ramakrishnan 99)
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 4

Previous Work: Extracting Semantics From Cubes
• General contexts of patterns (Sathe & Sarawagi 01) • Generalize association rules (Imielinski et al. 00) • Cube gradient analysis (Dong et al. 01)

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

5

Cube (Cell) Lattice
• Many cells have same aggregate values • Can we summarize the semantics of the cube by grouping cells by aggregate values? (S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9
(S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*):9(*,P1,f):9

(S1,*,*):9

(*,*,s):9

(*,P1,*):7.5

(*,P2,*):12

(*,*,f):9

(S2,*,*):9

(*,*,*):9
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 6

A Naïve Attempt
• Put all cells having same aggregate value in a class
(S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9

C1

C2

C3

(S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*):9(*,P1,f):9

C4
(S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9 (*,*,*):9

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

7

Problems w/ the Naïve Attempt
• The result is not a lattice anymore!
– Anomaly C 3  rollup → C 4  rollup → C 3     – The rollup/drilldown semantics is lost
(S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9

C1
(S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*):9(*,P1,f):9

C2

C3

(S1,*,*):9

(*,*,s):9

(*,P1,*):7.5

(*,P2,*):12

(*,*,f):9

(S2,*,*):9

(*,*,*):9

C4
8

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

A Better Partitioning
• Quotient cube: partitioning reserving the rollup/drilldown semantics
(S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9

C1 C4

C2

C3

(S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*) (*,P1,f):9

C5
(S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9 (*,*,*):9

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

9

Problem Statement
• Given a cube, characterize a good way (quotient cube) of partitioning its cells into classes such that
– The partition generates a reduced lattice preserving the rollup/drilldown semantics – The partition is optimal: # classes as small as possible

• Compute quotient cubes efficiently
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 10

Why A Quotient Cube Useful?
• Semantic compression • Semantic OLAP browsing
(S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9

C3
(S1,*,s):9 (S1,P1,*):6(*,P1,s):6 (S1,P2,*):12(*,P2,s):12(S2,*,f):9 (S2,P1,*) (*,P1,f):9

C1 C4 C5

C2

(S1,*,*):9

(*,*,s):9

(*,P1,*):7.5

(*,P2,*):12

(*,*,f):9

(S2,*,*):9

(*,*,*):9

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

11

Why A Quotient Cube Useful?
(S2,P1,f):9

• Semantic compression • Semantic OLAP browsing
(S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9

(S2,*,f):9

(S2,P1,*)

(*,P1,f):9

(*,*,f):9

(S2,*,*):9

(S1,*,s):9 (S1,P1,*):6(*,P1,s):6 (S1,P2,*):12(*,P2,s):12(S2,*,f):9 (S2,P1,*) (*,P1,f):9

C1 C4 C5

C2

(S1,*,*):9

(*,*,s):9

(*,P1,*):7.5

(*,P2,*):12

(*,*,f):9

(S2,*,*):9

(*,*,*):9

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

12

Outline
• • • • • • Introduction and motivation Cube lattice partitions Semantics preserving partitions Algorithms Experimental results Discussion and summary

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

13

Convex Partitions
• A convex partition retains semantics
rollup c1  → c2 rollup → c3 , c1 , c3 ∈ CLS ⇒ c2 ∈ CLS   

(S1,P1,s):6

(S1,P2,s):12

(S2,P1,f):9

C1 C4

C2

C3

(S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*) (*,P1,f):9

C5
(S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9 (*,*,*):9

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

14

A Non-convex Partition
• Anomaly • The rollup/drilldown semantics is lost
(S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9

C 3  rollup → C 4  rollup → C 3    

C1
(S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*):9(*,P1,f):9

C2

C3

(S1,*,*):9

(*,*,s):9

(*,P1,*):7.5

(*,P2,*):12

(*,*,f):9

(S2,*,*):9

(*,*,*):9

C4
15

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

Connected Partitions
• Cells c1 and c2 are connected if a series of rollup/drilldown operation starting from c1 can touch c2 • Intuitively, (each class of) a partition should be connected

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

16

Cover Partition
• For a cell c, a tuple t in base table is in c’s cover if t can be rolled up to c
– E.g., Cov(S1,*,spring)={(S1,P1,spring), (S1,P2,spring)}
Dimensions Store S1 S1 S2 Product P1 P2 P1 Season Spring Spring Fall Measure Sales 6 12 9
17

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

Cover Partitions Are Convex
• All cells having the same cover are in a class • (S1,P2,s) and (*,P2,*) cover same tuples in the base table à (S1,P2,*) and (*,P2,s) are in the same class.
(S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9

(S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*) (*,P1,f):9

(S1,*,*):9

(*,*,s):9

(*,P1,*):7.5

(*,P2,*):12

(*,*,f):9

(S2,*,*):9

(*,*,*):9

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

18

Cover Partitions Are Connected
• Cells c1 and c2 have the same cover à there must be some common ancestor c3 of c1 and c2 st c3 has the same cover
– Cells c1 and c2 are in the same class and connected (S1,P1,s):6 (S1,P2,s):12
(S2,P1,f):9

(S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*) (*,P1,f):9

(S1,*,*):9

(*,*,s):9

(*,P1,*):7.5

(*,P2,*):12

(*,*,f):9

(S2,*,*):9

(*,*,*):9

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

19

Cover Partitions & Aggregates
• All cells in a cover partition carry the same aggregate value w.r.t. any aggregate function
– But cells in a class of MIN() may have different covers

• For COUNT() and SUM() (positive), cover equivalence coincides with aggregate equivalence
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 20

Outline
• • • • • • Introduction and motivation Cube lattice partitions Semantics preserving partitions Algorithms Experimental results Discussion and summary

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

21

Weak Congruence
• Weak congruence preserves semantics
c rollup Class 1 c’ rollup imply c rollup Class 1 = Class 2 c’ rollup d’
22

d

Class 2 d’

d

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

Weak Congruence = Convex
• Convex ⇔ no “hole” in the class ⇔ weak congruence • They preserve the rollup/drilldown semantics • Quotient cube lattice is the lattice of convex classes • How to derive the coarsest quotient cube?
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 23

Monotone Aggregate Functions
• Monotone functions
– S ⊆ T à f(S) ≥ f(T) – S ⊆ T à f(S) ≤ f(T) – MIN(), MAX(), COUNT(), PSUM(), …

• The aggregate function f is monotone à ≡f is the unique coarsest partition
– MIN(): put all cells having the same MIN() value into a class
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 24

Non-monotone Functions
• Bad news: ≡f may or may not be a convex/weak congruence. L • Good news: cover partition is convex (I.e., weak congruence) and always yields a quotient cube w.r.t. any aggregate function! J

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

25

Outline
• • • • • • Introduction and motivation Cube lattice partitions Semantics preserving partitions Algorithms Experimental results Discussion and summary

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

26

How to Compute A QC
• Aggregate functions
– Monotone functions – Non-monotone functions

• Settings
– The cube is available – Only the base table is available

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

27

Monotone Functions
• The cube is available à grab all cells with the same aggregate value and put them into a class • Only the base table is available à bottom-up, depth-first search
– For a cell, compute its cover, find the upper bound having the same aggregate value – Group lower bounds by upper bounds
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 28

Example: Cover QC
(S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9

(S1,*,s):9

(S1,P1,*):6

(*,P1,s):6

(S1,P2,*):12

(*,P2,s):12 (S2,*,f):9

(S2,P1,*)

(*,P1,f):9

(S1,*,*):9

(*,*,s):9

(*,P1,*):7.5

(*,P2,*):12

(*,*,f):9

(S2,*,*):9

(*,*,*):9
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 29

Non-monotone Functions
• Class merging • Find cover partition classes • Merge classes as long as convexity is retained

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

30

Example: AVG QC
(S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9

(S1,*,s):9

(S1,P1,*):6

(*,P1,s):6

(S1,P2,*):12

(*,P2,s):12 (S2,*,f):9

(S2,P1,*)

(*,P1,f):9

(S1,*,*):9

(*,*,s):9

(*,P1,*):7.5

(*,P2,*):12

(*,*,f):9

(S2,*,*):9

(*,*,*):9
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 31

Outline
• • • • • • Introduction and motivation Cube lattice partitions Semantics preserving partitions Algorithms Experimental results Discussion and summary

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

32

Reduction Ratio vs. Dimensionality
100 90 Reduction ratio (% ) 80 70 60 50 40 30 20 10 0 2 3 4 5 6 7 Dimensionality 8 9 10 MinCube QC_Cov QC_MIN

# base tuples = 200k Zipf factor = 2.0
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 33

Reduction Ratio vs. Zipf Factor
60 50 R d c nra ( ) e u tio tio % 40 30 20 10 0 0 0.5 1 1.5 Zipf factor 2 2.5 3 MinCube QC_Cov QC_MIN

# base tuples = 200k # dimensions = 6
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 34

Reduction Ratio vs. Base Table Size
80 70 R duction ra ( ) e tio % 60 50 40 30 20 10 0 0 200 400 600 800 1000 1200 1400 Number of tuples (k) MinCube QC_Cov QC_MIN

Zipf factor = 2.0 # dimensions = 6
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 35

Runtime
3000 2500 R n e(s c n s u tim e o d ) 2000 1500 1000 500 0 0 200 400 600 800 1000 1200 1400 Number of tuples (k) MinCube QC_Cov QC_MIN BUC

Zipf factor = 2.0 # dimensions = 6
Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 36

Compression Ratio on Weather Data Set
60 50 Reduction ratio (%) 40 30 MinCube QC_Cov

100 90 80 Reduction ratio (%) 70 60 50 40
9 30

QC_Cov QC_AVG

20 10 0 2 3 4 5 6 7 Number of dimensions 8

20 10 0 2 3 4 5 6
37

7

Number of dimensions Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

Outline
• • • • • • Introduction and motivation Cube lattice partitions Semantics preserving partitions Algorithms Experimental results Discussion and summary

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

38

Semantic Cube Exploration
• Theoretical foundation for semantic summarization in data cube
– concept and properties of quotient cubes

• Efficient algorithms for quotient cube construction
– Quotient cubes can be computed directly from base tables

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

39

Ongoing Research
• Efficient implementation of quotient cube-based OLAP system
– Data warehouse built using quotient cubes

• • • •

Hierarchies and constraints Incremental maintenance Semantics based OLAP and mining Efficient query answering
40

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

References (1)
• • R. Agrawal and R. Srikant. Fast Algorithms for Mining Association Rules in Large Databases. VLDB 1994 S. Agarwal, R. Agrawal, P.M. Deshpande, A. Gupta, J.F. Naughton, R. Ramakrishnan, and S. Sarawagi. On the computation of multidimensional aggregates. VLDB, 1996. D. Barbara and M. Sullivan. Quasi-cubes: Exploiting approximation in multidimensional databases. SIGMOD Record, 26:12--17, 1997. D. Barbara and X. Wu. Using loglinear models to compress datacube. In WAIM'2000}, pages 311--322, 2000. K. Beyer and R. Ramakrishnan. Bottom-up computation of sparse and iceberg cubes. In SIGMOD'99.
41

• •

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

Reference (2)
• • G. Birkhoff, Lattice Theory, 2nd edition, New York, American Mathematical Society (Colloquium Publications, vol. 25), 1948. S. Geffner, D. Agrawal, A. El Abbadi, and T. R. Smith. Relative prefix sums: An efficient approach for querying dynamic OLAP data cubes. In ICDE'99. Jim Gray, Adam Bosworth, Andrew Layman, Hamid Pirahesh. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total. ICDE'96. C.-T. Ho, J. Bruck, and R. Agrawal. Partial-sum queries in data cubes using covering codes. In PODS'97. J. Han, J. Pei, G. Dong, and K. Wang. Efficient Computation of Iceberg Cubes with Complex Measures. In SIGMOD'01.

• •

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

42

Reference (3)
• • V. Harinarayan, A. Rajaraman, and J. D. Ullman. Implementing data cubes efficiently. In SIGMOD'96. T. Imielinski, L. Khachiyan, and A. Abdulghani. Cubegrades: Generalizing Association Rules. Technical Report, Rutgers University, August 2000. H. V. Jagadish, J. Madar, R.T. Ng. Semantic Compression and Pattern Extraction with Fascicles. VLDB'99. K. Ross and D. Srivastava. Fast computation of sparse datacubes. In VLDB'97. G. Sathe and S. Sarawagi. Intelligent Rollups in Multidimensional OLAP Data. VLDB'01.

• • •

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

43

Reference (4)
• J. Shanmugasundaram, U.M. Fayyad, and P. S. Bradley. Compressed Data Cubes for OLAP Aggregate Query Approximation on Continuous Dimensions. SIGKDD’99. J. S. Vitter, M. Wang, and B. R. Iyer. Data cube approximation and historgrams via wavelets. In CIKM'98. W. Wang, H. Lu, J. Feng, and J. X. Yu. Condensed cube: An effective approach to reducing data cube size. In ICDE'02. Y. Zhao, P. M. Deshpande, and J. F. Naughton. An array-based algorithm for simultaneous multidimensional aggregates. In SIGMOD'97. G.K. Zipf. Human Behavior and The Principle of Least Effort Addison-Wesley, 1949.
44

• • •

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube