You are on page 1of 20

From Tables and Spreadsheets

to Data Cubes

◼ A data warehouse OLAP tools are based on a multidimensional data


model which views data in the form of a data cube
◼ A data cube, such as sales, allows data to be modeled and viewed
in multiple dimensions
◼ Dimension tables, such as item (item_name, brand, type), or
time(day, week, month, quarter, year)
◼ Fact table contains measures (such as dollars_sold) and keys to
each of the related dimension tables

Han: Data Cubes 1


Data Cube Terminology

◼ A data cube supports viewing/modelling of a variable


(a set of variables) of interest. Measures are used to
report the values of the particular variable with respect
to a given set of dimensions.
◼ A fact table stores measures as well as keys
representing relationships to various dimensions.
◼ Dimensions are perspectives with respect to which an
organization wants to keep record.
◼ A star schema defines a fact table and its associated
dimensions.

Han: Data Cubes 2


Conceptual Modeling of
Data Warehouses
◼ Modeling data warehouses: dimensions & measures
◼ Star schema: A fact table in the middle connected to a
set of dimension tables
◼ Snowflake schema: A refinement of star schema
where some dimensional hierarchy is normalized into a
set of smaller dimension tables, forming a shape
similar to snowflake
◼ Fact constellations: Multiple fact tables share
dimension tables, viewed as a collection of stars,
therefore called galaxy schema or fact constellation
Han: Data Cubes 3
Example of Star Schema
time
time_key item
day item_key
day_of_the_week Sales Fact Table item_name
month brand
quarter time_key type
year supplier_type
item_key
branch_key
branch location
location_key
branch_key location_key
branch_name units_sold street
branch_type city
dollars_sold province_or_street
country
avg_sales
Measures

Han: Data Cubes 4


A Concept Hierarchy: Dimension (location)

all all

region Europe ... North_America

country Germany ... Spain Canada ... Mexico

city Frankfurt ... Vancouver ... Toronto

office L. Chan ... M. Wind

Han: Data Cubes 5


View of Warehouses and Hierarchies

Specification of hierarchies
◼ Schema hierarchy
day < {month <
quarter; week} < year
◼ Set_grouping hierarchy
{1..10} < inexpensive

Han: Data Cubes 6


Multidimensional Data
◼ Sales volume as a function of product, month,
and region
Dimensions: Product, Location, Time
Hierarchical summarization paths

Industry Region Year

Category Country Quarter


Product

Product City Month Week

Office Day

Month
Han: Data Cubes 7
A Sample Data Cube
Total annual sales
Date of TV in U.S.A.
1Qtr 2Qtr 3Qtr 4Qtr sum
TV
PC U.S.A
VCR

Country
sum
Canada

Mexico

sum

All, All, All


Han: Data Cubes 8
Browsing a Data Cube

◼ Visualization
◼ OLAP capabilities
◼ Interactive manipulation
Han: Data Cubes 9
Typical OLAP Operations

◼ Roll up (drill-up): summarize data


◼ by climbing up hierarchy or by dimension reduction
◼ Drill down (roll down): reverse of roll-up
◼ from higher level summary to lower level summary or detailed
data, or introducing new dimensions
◼ Slice and dice:
◼ project and select
◼ Pivot (rotate):
◼ reorient the cube, visualization, 3D to series of 2D planes.
◼ Other operations
◼ drill across: involving (across) more than one fact table
◼ …

Han: Data Cubes 10


A Star-Net Query Model
Customer Orders
Shipping Method
Customer
CONTRACTS
AIR-EXPRESS

ORDER
TRUCK
PRODUCT LINE
Time Product
ANNUALY QTRLY DAILY PRODUCT ITEM PRODUCT GROUP
CITY
SALES PERSON
COUNTRY
DISTRICT

REGION
DIVISION
Location Each circle is
called a footprint Promotion Organization
Han: Data Cubes 11
Discovery-Driven Exploration of Data
Cubes
◼ Hypothesis-driven: exploration by user, huge search space
◼ Discovery-driven (Sarawagi et al.’98)
◼ pre-compute measures indicating exceptions, guide user in the
data analysis, at all levels of aggregation
◼ Exception: significantly different from the value anticipated,
based on a statistical model
◼ Visual cues such as background color are used to reflect the
degree of exception of each cell
◼ Computation of exception indicator (modeling fitting and
computing SelfExp, InExp, and PathExp values) can be
overlapped with cube construction
Han: Data Cubes 12
Examples: Discovery-Driven Data Cubes

Han: Data Cubes 13


Software to Work with Data Cubes

◼ http://www.bi-verdict.com/
◼ http://www.bi-
verdict.com/fileadmin/FreeAnalyses/Comment_
OLAP_revival.htm

Han: Data Cubes 14


Summary
◼ Data warehouse
◼ A subject-oriented, integrated, time-variant, and nonvolatile
collection of data in support of management’s decision-
making process
◼ A multi-dimensional model of a data warehouse
◼ Star schema, snowflake schema, fact constellations
◼ A data cube allows to view measures with respect to a given
set of dimensions
◼ OLAP operations: drilling, rolling, slicing, dicing and
pivoting

Han: Data Cubes 15


◼ Advantages of data cubes:
◼ • Multi-dimensional analysis: Data cubes enable
multi-dimensional analysis of business data,
allowing users to view data from different
perspectives and levels of detail.
◼ • Interactivity: Data cubes provide interactive
access to large amounts of data, allowing users to
easily navigate and manipulate the data to
support their analysis.
◼ • Speed and efficiency: Data cubes are
◼ • Data aggregation: Data cubes support
complex calculations and data aggregation,
enabling users to quickly and easily summarize
large amounts of data.
◼ • Improved decision-making: Data cubes
provide a clear and comprehensive view of
business data, enabling improved decision-
making and business intelligence.
◼ • Accessibility: Data cubes can be accessed
from a variety of devices and platforms, making
it easy for users to access and analyze business
◼ • Helps in giving a summarised view of data.
◼ • Data cubes store large data in a simple
way.
◼ • Data cube operation provides quick and
better analysis,
◼ • Improve performance of data.
◼ Disadvantages of data cube:
◼ • Complexity: OLAP systems can be complex
to set up and maintain, requiring specialized
technical expertise.
◼ • Data size limitations: OLAP systems can
struggle with very large data sets and may
require extensive data aggregation or
summarization.
◼ • Performance issues: OLAP systems can be
slow when dealing with large amounts of data,
especially when running complex queries or
calculations.
◼ • Data integrity: Inconsistent data definitions
and data quality issues can affect the accuracy
of OLAP analysis.
◼ • Cost: OLAP technology can be expensive,
especially for enterprise-level solutions, due to
the need for specialized hardware and
software.
◼ Inflexibility: OLAP systems may not easily
accommodate changing business needs and
may require significant effort to modify or
extend.

You might also like