12 views

Uploaded by bsvg

- Relational Algebra
- UT Dallas Syllabus for cs4347.501 06f taught by Ying Liu (yxl059100)
- MBAA Abdullah
- Swart’s Ten Percent Rule.pdf
- Building a Data Warehouse
- Exemplo Stored Procedure Sp Who3 SQL
- ADBMS-TypicalQueryOptimizer
- oamexport(2)
- DBMS
- Project Management for Construction_ Organization and Use of Project Information
- JA-2-Certified Professional in Advance Accounting
- Multiple Organization Architecture
- How To Eliminate duplication value fro mtable.pdf
- db1-01
- VbImportant Suff
- BIS 245 Week 8 Final Exam
- Gagan Virk Oracle 11.2.0.4 to 12.1.0.2 Upgrade Ste
- 1qfssParameterQueries 1-8
- Formulas Functions Excel
- Button

You are on page 1of 4

J. Gray, al, Microsoft Research

F. Pellow, al, IBM Research

Visualization and dimension reduction

The relational representation of N-dimensional data

What is CUBE

Presented by Jim Cao Why we need ALL value

Discussion led by Otto Summary

“Dimensionality Reduction”

Analyze car sales

Data analysis applications Focus on the role of model, year and color of the cars

Ignore the differences between two sales along the dimensions of date

Looking for anomalies or unusual patterns of sale or car dealership

Four steps to aggregate data across many dimensions

As a result, extensive constructs are used, such as cross-tabulation,

subtotals, roll-up and drill-down

¾formulating a query that extracts relevant data from a database

¾extracting the aggregated data from the database into a table

¾visualizing the results in a graphical way, and

¾analyzing the results and formulating a new query

Represent the dataset as an N-dimensional space

Example: Car sales for year 1994 and 1995 showed in table_1: If we need more dimensional generalization of these operators

Table_1: Table_2:

Model Sales

Chevy 290 Model Year Color Sales

Ford 220 Chevy 1994 black 50

Chevy 1995 black 85

If we need to know the sales for model, we can easily query it by:

Chevy 1994 white 40

SELECT sales

FROM table_1 Chevy 1995 white 115

GROUP BY model

1

Three Dimensional Aggregation (con.t) Three Dimensional Aggregation (con.t)

If we need to query the sales by model, by year, and by color, then how we

For Table_2a:

can do it?

Typically, we can make a report as showed by

Table_2a: • Concepts: going up the levels is called rolling-up the data.

Going down is called drilling-down into the data

• In this table, sales are rolled up by using totals and subtotals.

• Data is aggregated by Model, then by Year, then by Color.

• The report shows data aggregated at three levels, that is, at

Model level, Year level, and Color level.

• Data aggregated at each distinct level produces a sub-total.

The approach by using a pivot table in Excel is showed by table_2c:

What problems with Table_2a approach? Table_2c:

roll-up of N elements. That is, there are six columns in

table_2a

• Also, the representation of Table_2a is not relational,

because the empty cells (presumably NULL values), cannot

form a key What problems with pivot table approach?

• The pivot operator typically aggregating cells based on values in the cells.

• Pivot creates columns based on subsets of column values-this is a much

larger set!

• If one pivots on two columns containing N and M values, the resulting

pivot table has N x M values, that’s, so many columns and such obtuse

column names!

One more approach by adding an ALL value is available Table_3a: Sales summary

• Do not extend the result table to have many new columns

• Avoid the exponential growth of columns by overloading

column values

• The dummy value “ALL” has been added to fill in the

super-aggregation items

For Table_3a:

• This is a 3_dimensional roll-up

• It have three unions

• The fact is that aggregating over N dimensions requires N such UNIONS!

2

ALL value approach (con.t) ALL value approach (con.t)

Since table-3a is a relation, it could be built using SQL, like this statement: How is ALL value approach ?

• Expressing roll-up and cross-tab queries with conventional

SQL is daunting! Why?

• A six dimension cross tab requires a 64-way union of 64

different GROUP BY operators to build the underlying

representation.

• The resulting representation of aggregation is too complex to

analyze for optimization. On most SQL systems this will

result in 64 scans of the data, 64 sorts or hashes, and a long

wait

dimensional generalization of simple

aggregate functions SELECT Model, Year, Color, SUM (Sales) AS sales

• The N-1 lower-dimensional FROM Sales

aggregates appear as points, lines,

planes, cubes WHERE Model in ['Ford', 'Chevy'] AND year BETWEEN 1994 AND 1995

• The data cube operator builds a table GROUP BY CUBE Model, Year, Color

containing all these aggregate values

• The OD data cube is a point. • The cube is a relational operator, with GROUP BY and ROLL UP as degenerate

• The 1D data cube is a line with a forms of the operator. It can be conveniently specified by overloading the SQL

point. GROUP BY and ROLLUP

• The 2D data cube is a cross • It first aggregates over all the <select list> attributes in the GROUP BY clause as in

tabulation, a plane, two lines, and a a standard GROUP BY

point. • It UNIONs in each super-aggregate of the global cube—substituting ALL for the

• The 3D data cube is a cube with three aggregation columns

intersecting 2D cross tabs • If there are N attributes in the <select list>, there will be 2N -1 super-aggregate

value

• The super-aggregates are produced by ROLLUP, like running sum or average

Is the ALL value really needed? Is the ALL value really needed? (con.t)

What is the ALL value for? The introduction of ALL creates substantial complexity

• Each ALL value really represents a set—In the Table 3a Sales Summary • ALL becomes a new keyword denoting the set value

data cube, the respective sets are: • ALL [NOT] ALLOWED is added to the column definition syntax and to

Model. ALL = ALL (Model) = {Chevy, Ford} the column attributes in the system catalogs

Year. ALL = ALL (Year) = {1990,1991,1992} • If ALL presents a set then the other values of that domain must be treated

Color. ALL = ALL (Color) = {red, white, blue} as singleton sets in order to have uniform operators on the domain

• Each ALL value can be interpreted as a context-sensitive token • However, it is impossible to express results of CUBE as a single relation in

representing the set it represents. the current framework of SQL without ALL value!

• ALL value treated as the corresponding set defines the semantics of the • Therefore, the ALL value is needed.

• Relational operators (e.g., equals).

• A function ALL() generates the set associated with this value

3

Summary Discussion Questions

• The cube operator generalizes and unifies several common and • The authors state that "Veteran SQL implementers will be

popular concepts: such as aggregates, group by, histograms, terrified of the ALL value --- like NULL, it will create many

roll-ups and drill-downs and, cross tabs. special cases."

• The cube operator is based on a relational representation of What are some of the special cases that you can imagine are

aggregate data using the ALL value to denote the set over created by NULL?

which each aggregation is computed. What cases can you imagine being created by ALL?

• The data cube is easy to compute for a wide class of functions Do think ALL is a bigger or a lesser concern than NULL?

• SQL’s basic set of five aggregate functions needs careful

extension to include • How many applications can you imagine using Data Cubes?

• Does this strike you as a big or a small change to SQL? What

about to the mentality of relational databases?

- Relational AlgebraUploaded byRAYUDU8
- UT Dallas Syllabus for cs4347.501 06f taught by Ying Liu (yxl059100)Uploaded byUT Dallas Provost's Technology Group
- MBAA AbdullahUploaded byHelpline
- Swart’s Ten Percent Rule.pdfUploaded byCurtis
- Building a Data WarehouseUploaded byRamakrishnan Harihara Venkatasubramania
- Exemplo Stored Procedure Sp Who3 SQLUploaded byEmerson
- ADBMS-TypicalQueryOptimizerUploaded byGaurav Kispotta
- oamexport(2)Uploaded byAnonymous zkIfWMncM
- DBMSUploaded bySivanesh Seelan
- Project Management for Construction_ Organization and Use of Project InformationUploaded byFarid Morgan
- JA-2-Certified Professional in Advance AccountingUploaded byMandvi Yadav
- Multiple Organization ArchitectureUploaded byvsn0208
- How To Eliminate duplication value fro mtable.pdfUploaded byHacene Lamraoui
- db1-01Uploaded bymanikbasha009
- VbImportant SuffUploaded bysuebri
- BIS 245 Week 8 Final ExamUploaded byBerthahrobinson
- Gagan Virk Oracle 11.2.0.4 to 12.1.0.2 Upgrade SteUploaded byLuis Valencia
- 1qfssParameterQueries 1-8Uploaded byJulian Andres
- Formulas Functions ExcelUploaded bywahidasaba
- ButtonUploaded byJoyhill
- SQL-ExercicesdebasesdedonneessurlemodelerelationnelUploaded byHocine ALGERIA
- 2007-08 TE Engg Information TechnologyUploaded byBadgujar Vishal
- Oracle Interview Questions and AnswersUploaded bySudhakar Kaja
- powerviewpowerpivotbasics2013sp1Uploaded byclaudioX
- Cubefunctionsmdxbasics-Uploaded byjapera63
- DBAdapterUploaded bymesale22222
- Performance Tuning for DevelopersUploaded bybijutom
- Microsoft SQL Server 2000 Programming by ExampleUploaded byapi-26661003
- BCA II Syllabus fro ravishakar....Uploaded byNitesh
- Do I need to Learn Spreadsheet Application Like Excel?Uploaded bypaliumskills

- Pandorabot METAbolt Add-On ManualUploaded byMissy Restless
- CSEC Information Technology Syllabus with Specimen Papers (2020).pdfUploaded byFairazooden Mohammed
- Universe Guide to RetrieVeUploaded byJoseph Riccardo
- Configuring Application Isolation on Windows Server 2003Uploaded byKarimun Coklat
- Test PlanUploaded bythaheer0009
- AsperaEnterpriseServer Windows UserGuideUploaded byEdin Hodzic
- 3-cookiesflixUploaded bySamuel Bonilla
- CICS CommandsUploaded bypsmadhusudhan
- HP Non Stop OSUploaded byShaikh Saeed Alam
- By Hook or By Crook Overcoming Resource Limitations.pdfUploaded bygskn4u7183
- Never Change /tg/Uploaded bydusty_thoreau
- Route Scheduling - Techn...Gement (SCM) - SCN WikiUploaded bysuseev
- Manual Hdta2Uploaded byJ Oscar Molina Vargas
- WP_Bluetooth_50004_V01_1204Uploaded byArri Adhy
- PreguntasUploaded byerf7
- Bizhub 25 Message Board GuideUploaded byminoltaep4050
- CISCO CCNA Certifications_ CCNA 1_Module 3Uploaded bymrcuk21
- PcbnewUploaded bychivisalva
- Template CPT2Uploaded byV S Krishna Achanta
- clustering analysis in sas.pdfUploaded bySaad Baloch
- Phenotype Information for Existing GWAS StudiesUploaded byAMIA
- Fingerprint Image Enhancement By Develop Mehtre TechniqueUploaded byAnonymous IlrQK9Hu
- Fortran IntroUploaded byMathias Eder
- At 91 Sam CommunicationsUploaded byashishindemand
- Report of InternshipUploaded bySeemabKanwalQureshi
- UnixUploaded byJoseLuis Rojas
- PostgreSQL Functions by ExampleUploaded byNguyễn Đăng Hưng
- DBA Interview Q&AUploaded bykalimireddy
- tong_hop_kiem_tra (thi)Uploaded byapi-3776796
- A HYBRID FUZZY SYSTEM BASED COOPERATIVE SCALABLE AND SECURED LOCALIZATION SCHEME FOR WIRELESS SENSOR NETWORKSUploaded byJohn Berg