Professional Documents
Culture Documents
(1)
data mining
8
Data Warehouse is
S bj
Subject-Oriented
Oi d
TRS
D t W
Data Warehouse
h i Nonvolatile
is N l til
• High
g pperformance for both systems
y
– DBMS— tuned for OLTP: searching for particular records,
indexing, hashing, concurrency control, recovery
– Warehouse—tuned for OLAP: complex OLAP queries,
multidimensional view, consolidation (summarization and
aggregation)
ti )
Topics
dimensions
Table 3.2 A 2-D view of sales data for AllElectronics according to the
di
dimensions
i ti
time and
d item,
it where
h the
th salesl are from
f branches
b h located
l t d in
i
the city of Vancouver. The measure displayed is dollar_sold (in thousands).
20
From Tables and Spreadsheets
t Data
to D t Cubes
C b (3)
Table 3.3 A 3-D view of sales data for AllElectronics according to the
dimensions time, item, and location. The measure displayed is dollar_sold (in
thousands).
21
From Tables and Spreadsheets
t Data
to D t Cubes
C b (4)
Figure 3.1 A 3-D data cube representation of the data in Table 3.3,
according to the dimensions time, item, and location. The measure
displayed is dollar_sold (in thousands). 22
From Tables and Spreadsheets
t Data
to D t Cubes
C b (5)
24
Figure
g 3.14 Lattice of cuboids, making
g up p a 3-D data cube. Each
cuboid represents a different group-by. The base cuboid contains
the three dimensions city, item, and year.
25
The Curse of Dimensionality
26
Conceptual Modeling of Data
W h
Warehouses
• M
Modeling
d li data
d t warehouses:
h di
dimensions
i & measures
– Star schema: A fact table in the middle connected to a set
of dimension tables
– Snowflake schema: A refinement of star schema where
some dimensional hierarchy is normalized into a set of
smaller dimension tables, forming a shape similar to
snowflake
– Fact
F t constellations:
t ll ti M lti l fact
Multiple f t tables
t bl share
h di
dimensioni
tables, viewed as a collection of stars, therefore called
ggalaxyy schema or fact constellation
Star Schema
time
time_keyy item
day item_key
day_of_the_week Sales Fact Table
item_name
month brand
quarter
q time key
time_key type
year supplier_type
item_key
branch_key
branch location
location_key
branch_key location_key
dollars_sold street
branch_name
branch_type cit
city
unit_sold province_or_street
country
28
Snowflake Schema
time
supplier
time_key item supplier_key
supplier key
day
item_key supplier_type
day_of_the_week Sales Fact Table
month item_name
quarter time_key brand
year type
item_key supplier_key
branch_key
location key
location_key
branch
dollars_sold
branch_key location
branch_name units_sold
location_key city
branch_type
b h t street city_key
city city
province_or_street
country
Figure 3.4 Snowflake schema of a data warehouse for sales.
29
Fact Constellation Shipping Fact Table
time
item_key
time_key Sales Fact Table item
day time_key
ti k
day_of_the_week item_key shipper_key
month time_key item_name
quarter brand from_location
year item_key
k type to_location
supplier_type
branch_key dollars_sold
location key
location_key unit_shipped
pp
branch location
dollars_sold
branch_key location_key
branch_name unit_sold street
branch type
branch_type city shipper
hi
shipper
province_or_street shipper_key
country shipper_name
location key
location_key
shipper_type
Figure 3.5 Fact constellation schema of a data warehouse for sales 30
and shipping.