Professional Documents
Culture Documents
all
0-D(apex) cuboid
time,location,supplier
3-D cuboids
time,item,location
time,item,supplier item,location,supplier
4-D(base) cuboid
time, item, location, supplier
Conceptual Modeling of Data
Warehouses
Modeling data warehouses: dimensions & measures
– Star schema: A fact table in the middle connected to a
set of dimension tables
– Snowflake schema: A refinement of star schema
where some dimensional hierarchy is normalized into
a set of smaller dimension tables, forming a shape
similar to snowflake
– Fact constellations: Multiple fact tables share
dimension tables, viewed as a collection of stars,
therefore called galaxy schema or fact constellation
Example of Star Schema
time
time_key item
day item_key
day_of_the_week Sales Fact Table item_name
month brand
quarter time_key type
year supplier_type
item_key
branch_key
branch location
location_key
branch_key location_key
branch_name units_sold street
branch_type city
dollars_sold state_or_province
country
avg_sales
Measure
Example of Snowflake Schema
time
time_key item
day item_key supplier
day_of_the_week Sales Fact Table item_name supplier_key
month brand supplier_type
quarter time_key type
year item_key supplier_key
branch_key
branch location
location_key
location_key
branch_key
units_sold street
branch_name
city_key
branch_type
dollars_sold city
city_key
avg_sales city
state_or_provinc
Measure e
s country
Example of Fact Constellation
time
time_key item Shipping Fact Table
day item_key
day_of_the_week Sales Fact Table item_name time_key
month brand
quarter time_key type item_key
year supplier_type
shipper_key
item_key
branch_key from_location
all all
Specification of hierarchies
Schema hierarchy
day < {month < quarter;
week} < year
Set_grouping hierarchy
{1..10} < inexpensive
Multidimensional Data
Office Day
Month
A Sample Data Cube
TV
od
PC U.S.A
Pr
VCR
Country
sum
Canada
Mexico
sum
Cuboids Corresponding to the Cube
all
0-D(apex) cuboid
product date country
1-D cuboids
3-D(base) cuboid
product, date, country
Browsing a Data Cube
Visualization
OLAP capabilities
Interactive manipulation
Typical OLAP Operations
Roll up (drill-up): summarize data
– by climbing up hierarchy or by dimension reduction
Drill down (roll down): reverse of roll-up
– from higher level summary to lower level summary or
detailed data, or introducing new dimensions
Slice and dice: project and select
Pivot (rotate):
– reorient the cube, visualization, 3D to series of 2D planes
Other operations
– drill across: involving (across) more than one fact table
– drill through: through the bottom level of the cube to its
back-end relational tables (using SQL)
Fig. 3.10 Typical
OLAP Operations
A Star-Net Query Model
Customer Orders
Shipping Method
Customer
CONTRACTS
AIR-EXPRESS
ORDER
TRUCK
PRODUCT LINE
Time Product
ANNUALY QTRLY DAILY PRODUCT ITEM PRODUCT GROUP
CITY
SALES PERSON
COUNTRY
DISTRICT
REGION
DIVISION
Location Each circle is
called a Promotion Organization
footprint
Design of Data Warehouse: A
Business Analysis Framework
Four views regarding the design of a data warehouse
– Top-down view
allows selection of the relevant information necessary for the
data warehouse
– Data source view
exposes the information being captured, stored, and managed
by operational systems
– Data warehouse view
consists of fact tables and dimension tables
– Business query view
sees the perspectives of data in the warehouse from the view of
end-user
Data Warehouse Design Process
Top-down, bottom-up approaches or a combination of both
– Top-down: Starts with overall design and planning (mature)
– Bottom-up: Starts with experiments and prototypes (rapid)
From software engineering point of view
– Waterfall: structured and systematic analysis at each step before
proceeding to the next
– Spiral: rapid generation of increasingly functional systems, short
turn around time, quick turn around
Typical data warehouse design process
– Choose a business process to model, e.g., orders, invoices, etc.
– Choose the grain (atomic level of data) of the business process
– Choose the dimensions that will apply to each fact table record
– Choose the measure that will populate each fact table record
Data Warehouse: A Multi-Tiered Architecture
Monitor
Metadata & OLAP Server
Other
sources Integrator
Analysis
Operational Extract Query
DBs Transform Data Serve Reports
Load
Refresh
Warehouse Data mining
Data Marts
Multi-Tier Data
Warehouse
Distributed
Data Marts