Lec4 - Dimensional Modeling

Data Warehousing
1
Data Warehousing Design
• Since the 1980s, data warehouses have evolved their own
design techniques, distinct from transaction-processing
systems
• Dimensional design techniques have emerged as the
dominant approach for most data warehouse databases
• Designing a data warehouse database is highly complex
• To begin a data warehouse project, we need answers for
questions such as:
– which user requirements are most important
– which data should be considered first
• For many enterprises the solution is data marts
• Few designers are willing to commit to an enterprise-wide
design that must meet all user requirements at one time 2
Dimensional Modeling
• The database component of a data warehouse is
described using a technique called
dimensionality modeling.
• Dimensionality modeling: A logical design
technique that aims to present the data in a
standard, intuitive form that allows for high-
performance access.
• Dimensionality modeling uses the concepts of
Entity–Relationship (ER) modeling with some
important restrictions
3
• Dimensional modeling provides set of
methods and concepts that are used in dwh
design.
• Dimensional modeling is a design technique
for databases intended to support end-user
queries in a data warehouse.
• Every dimensional model (dm) is composed
of one table with a composite primary key,
called the fact table, and a set of smaller
tables called dimension tables.
4
Fact Tables
• Facts are numerical values which can be
aggregated and analyzed on the fact values
• A Fact table has two types of columns: facts
and foreign key to dimension tables.
• Contains two or more foreign keys
• Tend to have huge numbers of records
• Useful facts tend to be numeric and additive
Example
•This fact table contains foreign keys for time dimension,

product dimension, customer dimension and measurement
value unit sold.
•Suppose a company sells products to customers. Every sale is a
fact that happens within the company, and the fact table is used
to record these facts.
6
Dimension Tables
• Dimensions define hierarchies and description on fact
values.
• Contain text and descriptive information
• 1 in a 1-M relationship
• Generally the source of interesting constraints
• Typically contain the attributes for the SQL answer set.
• Dimension table has a primary key that uniquely
identifies each dimension row.
– This key is used to associate the Dimension table to a Fact
table.
• Dimension tables are normally de-normalized
Example
In the above dimension table, the customer dimension normally includes

the name of customers, address, customer id, gender, income group,
education levels, etc
8
The Multi-Dimensional Model
Store Info Key columns joining fact table

to dimension tables Numerical Measures
Prod Code Time Code Store Code Sales Qty

Fact table for
Product Info
measures
Dimension tables Time Info
...
9
• Each dimension table has a primary key that
corresponds exactly to one of the components
of the composite key in the fact table.
– In other words, the primary key of the fact table is
made up of two or more foreign keys.
• Another important feature of a DM is that all
natural keys are replaced with surrogate
keys.
– This means that every join between fact and
dimension tables is based on surrogate keys, not
natural keys
10
Dimensional Modeling
• Dimensions are organized into hierarchies
– E.g., Time dimension: days  weeks  quarters
– E.g., Product dimension: product  product line 
brand
• Dimensions have attributes
Time Store
Date
Month StoreID
Year City
State
Country
Region
CSE601 11
Dimension Hierarchies
Store Dimension Product Dimension
Total Total
Region Manufacturer
District Brand
Stores Products
Analysts tend to look at the data through dimension at a particular “level” in the
hierarchy 12
Schema Design
• Schema is a logical description of the entire
database.
• It includes the name and description of records of
all record types including all associated data-items
and aggregates.
• Much like a database, a data warehouse also
requires to maintain a schema.
• A database uses relational model, while a data
warehouse uses Star, Snowflake, and Fact
Constellation schema.
• Most data warehouses use a star schema to
represent multi-dimensional model.
CSE601 13
Star Schema
• The links between the fact table in the center and the
dimension tables in the extremities form a shape like a star.
• Bulk of data in a data warehouse is represented as facts,
the fact tables can be extremely large relative to the
dimension tables.
• It is important to treat fact data as read-only reference data
that will not change over time.
• The most useful fact tables contain one or more numerical
measures, or ‘facts’, that occur for each record
• Dimension tables, by contrast, generally contain
descriptive textual information.
• Dimension attributes are used as the constraints in data
warehouse queries
14
• Star schema: a logical structure that has a
fact table containing factual data in the
center, surrounded by dimension tables
containing reference data
• Each dimension in a star schema is
represented with only one-dimension table.
– This dimension table contains the set of
attributes.
– There is a fact table at the center. It contains the
keys to each of dimensions.
– The fact table also contains the attributes,
namely dollars sold and units sold.
15
Star Schema
CSE601 16
Star Schema Example
17
Star Schema Example
CSE601 18
Star Schema with Sample Data
CSE601 19
Need for Aggregates
• Sizes of typical tables:
– Time dimension: 5 years x 365 days = 1825
– Store dimension: 300 stores reporting daily sales
– Production dimension: 40,000 products in each store
(about 4000 sell in each store daily)
– Maximum number of base fact table records: 2 billion
(lowest level of detail)
• A query involving 1 brand, all store, 1 year:
retrieve/summarize over 7 million fact table rows.
CSE601 20
Aggregating Fact Tables
• Aggregate fact tables are summaries of the
most granular data at higher levels along the
dimension hierarchies.
ra rchy
Hie
ls Product key
leve Store key
Product Store name
Category Territory
Department Product key
Region
Time key
Store key
Unit sales
Multi-way aggregates:
Time key Sale dollars
Territory – Category – Month
Date Month
Quarter (Data values at higher level)
Year CSE601 21
Families of Stars
Dimension
Dimension Dimension table
table table
Fact table
Fact table
Dimension Dimension
table table
Fact table
Dimension
Dimension table
Dimension
table
table
CSE601 22
Snowflake Schema
• A variant of the star schema where dimension
tables do not contain denormalized data.
– The normalization splits up the data into additional
tables.
– Unlike Star schema, the dimensions table in a
snowflake schema are normalized.
– Due to normalization in the Snowflake schema, the
redundancy is reduced and therefore, it becomes
easy to maintain and save the storage space
23
The item dimension table in star schema is normalized and split into two
dimension tables, namely item and supplier table.
24
Snowflake Schema
• Snowflake schema is a type of star schema
but a more complex model.
• “Snowflaking” is a method of normalizing
the dimension tables in a star schema.
• The normalization eliminates redundancy.
• The result is more complex queries and
reduced query performance.
CSE601 25
Sales: Snowflake Schema
Category key
Product category
Brand key Region key
Brand name Region name
Category key
Product key Territory key

Sales fact
Product name Territory name
Product code Region key
Brand key Product key
Time key Salesrep key
Product
Customer key Salesperson name
…. Territory key
Salesrep
CSE601 26
Snowflaking
• The attributes with low cardinality in each
original dimension table are removed to
form separate tables. These new tables are
linked back to the original dimension table
through artificial keys.
Product key Brand key

Product name Category key
Brand name Product category
Product code Category key
Brand key
CSE601 27
Snowflake Schema
• Advantages:
– Normalized structures are easier to update and
maintain
• Disadvantages:
– Ability to browse through the contents difficult
– Degrade query performance because of additional joins
CSE601 28
Fact Constellation Schema
• A fact constellation has multiple fact tables.
It is also known as galaxy schema.
• The following diagram shows two fact
tables, namely sales and shipping.
29
30
The “Fact Constellation” Schema
Sto re Dime nsio n Fa c t Ta ble Tim e Dim e nsio n
STORE KEY STORE KEY
PERIOD KEY
Sto re De sc rip tio n PRODUCT KEY
City PERIOD KEY Pe rio d De sc
Sta te Ye a r
Do lla rs Qua rte r
Distric t ID
Units
Distric t De sc . Mo nth
Pric e
Re g io n_ID Da y
Re g io n De sc . Curre nt Fla g
Re g io na l Mg r.
Pro duc t Dim e nsio n
Se que nc e
PRODUCT KEY
Pro d uc t De sc .
Bra nd District Fact Table
Co lo r
Region Fact Table
Size District_ID
Ma nufa c ture r Region_ID
PRODUCT_KE
PRODUCT_KEY
Y
PERIOD_KEY
PERIOD_KEY
Dollars
Dollars
Units Units
Price Price
CSE601 31
What is the Best Design?
• Performance benchmarking can be used to
determine what is the best design.
• Snowflake schema: easier to maintain dimension
tables when dimension tables are very large
(reduce overall space). It is not generally
recommended in a data warehouse environment.
• Star schema: more effective for data cube
browsing (less joins): can affect performance.
CSE601 32
Starflake Schema
• A hybrid structure that contains a mixture
of star and snowflake.
– The most appropriate database schemas use a
mixture of de-normalized star and normalized
snowflake schemas.
33

Lec4 - Dimensional Modeling

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lec4 - Dimensional Modeling

Uploaded by

Copyright:

Available Formats

Data Warehousing

•This fact table contains foreign keys for time dimension,

In the above dimension table, the customer dimension normally includes

Store Info Key columns joining fact table

Prod Code Time Code Store Code Sales Qty

Dimension tables Time Info

Product key Territory key

Product key Brand key

You might also like