Professional Documents
Culture Documents
DWH-Concepts New PDF
DWH-Concepts New PDF
Details
1
Star Schema Dimension Tables
Dimension tables Dimension
Dimension tables
usually referred to
simply as Dimension
'dimensions'
Spend extra effort to
add dimensional
attributes
2
Dimension Keys
Synthetic keys Dimension
key
Each table assigned a Dimension
unique primary key, key
specifically generated
for the data warehouse
3
Dimension Columns
Dimension
Dimension attributes
Key
Specify the way in Dimension
attribute
which measures are Key
attribute
viewed: rolled up, attribute
attribute
broken out or attribute
summarized attribute
Often follow the word
“by” as in “Show me Dimension
Quarter” attribute
attribute
Frequently referred to
as 'Dimensions' attribute
4
Star Schema Fact Table
Process measures
Start by assigning one
fact table per business Fact Table
subject area
Fact tables store the
process measures (aka fact1
Facts) fact2
Compared to fact3
5
Fact Table Primary Key
Every fact table
Multi-part primary key
added Fact Table
6
Fact Table Sparsity
Sparsity
Term used to describe the very common situation
where a fact table does not contain a row for
every combination of every dimension table row
for a given time period
7
Fact Table Grain
Grain
The level of detail
represented by a row in Fact Table
the fact table
Must be identified early
Cause of greatest
confusion during design
process
Example
Each row in the fact table
represents the daily item
sales total
8
Designing a Star Schema
Five initial design steps
Based on Kimball's six steps
Start designing in order
Re-visit and adjust over project life
9
Step One
10
Step Two
11
Step Three
3. Identify dimensions
12
Step Four
4. Select facts
13
Step Five
5. Identify dimensional
attributes
14
Fact Table Details
15
Example Fact Table
Sales Facts
model_key
dealer_key
time_key
revenue
quantity
16
Facts
Fully additive
Can be summed across any and all dimensions
Stored in fact table
Examples: revenue, quantity
17
Facts
Semi-additive
Can be summed across most dimensions but not
all
Anything that measures a “level”
Must be careful with ad-hoc reporting
Often aggregated across the “forbidden
dimension” by averaging
18
Facts
Non-Additive
Cannot be summed across any dimension
All ratios are non-additive
Break down to fully additive components, store
them in fact table
19
Factless Fact Table
A fact table with no measures in it
Nothing to measure...
…Except the convergence of dimensional
attributes
Sometimes store a “1” for convenience
Examples: Attendance, Customer
Assignments, Coverage
20
Dimension Table
Details
21
Example Dimension Tables
Time
Model time_key
model_key year
quarter
brand month
category date
line
model
Dealer
dealer_key
region
state
city
dealer
22
Dimension Tables
Characteristics
Hold the dimensional attributes
Usually have a large number of attributes (“wide”)
Add flags and indicators that make it easy to
perform specific types of reports
Have small number of rows in comparison to fact
tables (most of the time)
23
Don’t Normalize Dimensions
Saves very little space
Impacts performance
Can confuse matters when multiple
hierarchies exist
A star schema with normalized dimensions is
called a "snowflake schema"
Usually advocated by software vendors
whose product require snowflake for
performance
24
Slowly Changing Dimensions
Dimension source data may change over time
Relative to fact tables, dimension records
change slowly
Allows dimensions to have multiple 'profiles'
over time to maintain history
Each profile is a separate record in a
dimension table
25
Slowly Changing Dimension
Example
Example: A woman gets married
Possible changes to customer dimension
• Last Name
• Marriage Status
• Address
• Household Income
Existing facts need to remain associated with her
single profile
New facts need to be associated with her married
profile
26
Slowly Changing Dimension
Types
Three types of slowly changing dimensions
Type 1
• Updates existing record with modifications
• Does not maintain history
Type 2
• Adds new record
• Does maintain history
• Maintains old record
Type 3:
• Keep old and new values in the existing row
• Requires a design change
27
Designing Loads to Handle SCD
Design and implementation guidelines
Gather SCD requirements when designing data
mapping and loading
SCD needs to be defined and implemented at the
dimensional attribute level
Each column in a dimension table needs to be
identified as a Type 1 or a Type 2 SCD
If one Type 1 column changes, then all Type 1
columns will be updated
If one Type 2 column changes, then a new record
will be inserted into the dimension table
28
Designing Loads to Handle SCD
Design and implementation guidelines
For large dimension tables, change data capture
techniques may be used to minimize the data
volume
For smaller dimension tables, compare all OLTP
records with dimension table records
Balance data volume with change data capture
logic complexities
29
Conformed Dimensions
Conformed dimensions mean the exact same
thing with every possible fact table to which
they are joined.
Eg: The date dimension table connected to
the sales facts is identical to the date
dimension connected to the inventory facts.
30
Degenerate Dimensions
Dimensions with no other place to go
Stored in the fact table
Are not facts
Common examples include invoice numbers
or order numbers
31
Junk Dimensions
A junk dimension is a collection of random
transactional codes flags and/or text
attributes that are unrelated to any particular
dimension. The junk dimension is simply a
structure that provides a convenient place to
store the junk attributes.
Eg: Gender dimension
32