You are on page 1of 32

Dimensional Design

Details

1

Star Schema Dimension Tables
 Dimension tables Dimension

 Store dimension Dimension
values
 Textual content

 Dimension tables

usually referred to
simply as Dimension
'dimensions'
 Spend extra effort to

add dimensional
attributes
2

Dimension Keys
 Synthetic keys Dimension
key
 Each table assigned a Dimension
unique primary key, key
specifically generated
for the data warehouse

 Primary keys from
Dimension
source systems may
be present in the key

dimension, but are not
used as primary keys
in the star schema

3

attribute attribute broken out or attribute summarized attribute  Often follow the word “by” as in “Show me Dimension Sales by Region and Key Quarter” attribute attribute  Frequently referred to as 'Dimensions' attribute 4 . Dimension Columns Dimension  Dimension attributes Key  Specify the way in Dimension attribute which measures are Key attribute viewed: rolled up.

fact tables usually have a very large number of rows 5 . Star Schema Fact Table  Process measures  Start by assigning one fact table per business Fact Table subject area  Fact tables store the process measures (aka fact1 Facts) fact2  Compared to fact3 dimension tables.

Fact Table Primary Key  Every fact table  Multi-part primary key added Fact Table  Made up of foreign key key keys referencing key dimensions fact1 fact2 fact3 6 .

they are said to be "sparsely populated" or "sparse" 7 . Fact Table Sparsity  Sparsity  Term used to describe the very common situation where a fact table does not contain a row for every combination of every dimension table row for a given time period  Because fact tables contain a very small percentage of all possible combinations.

Fact Table Grain  Grain  The level of detail represented by a row in Fact Table the fact table  Must be identified early  Cause of greatest confusion during design process  Example  Each row in the fact table represents the daily item sales total 8 .

Designing a Star Schema  Five initial design steps  Based on Kimball's six steps  Start designing in order  Re-visit and adjust over project life 9 .

Step One 1. Identify fact table Start by naming the fact table with the name of the business subject area 10 .

in business terms 11 . Identify fact table grain Describe what a row in the fact table represents . Step Two 2.

Step Three 3. Identify dimensions 12 .

Select facts 13 . Step Four 4.

Identify dimensional attributes 14 . Step Five 5.

Fact Table Details 15 .

Example Fact Table Sales Facts model_key dealer_key time_key revenue quantity 16 .

Facts  Fully additive  Can be summed across any and all dimensions  Stored in fact table  Examples: revenue. quantity 17 .

Facts  Semi-additive  Can be summed across most dimensions but not all  Anything that measures a “level”  Must be careful with ad-hoc reporting  Often aggregated across the “forbidden dimension” by averaging 18 .

store them in fact table 19 . Facts  Non-Additive  Cannot be summed across any dimension  All ratios are non-additive  Break down to fully additive components.

Factless Fact Table  A fact table with no measures in it  Nothing to measure.. Coverage 20 . Customer Assignments.  …Except the convergence of dimensional attributes  Sometimes store a “1” for convenience  Examples: Attendance..

Dimension Table Details 21 .

Example Dimension Tables Time Model time_key model_key year quarter brand month category date line model Dealer dealer_key region state city dealer 22 .

Dimension Tables  Characteristics  Hold the dimensional attributes  Usually have a large number of attributes (“wide”)  Add flags and indicators that make it easy to perform specific types of reports  Have small number of rows in comparison to fact tables (most of the time) 23 .

Don’t Normalize Dimensions  Saves very little space  Impacts performance  Can confuse matters when multiple hierarchies exist  A star schema with normalized dimensions is called a "snowflake schema"  Usually advocated by software vendors whose product require snowflake for performance 24 .

dimension records change slowly  Allows dimensions to have multiple 'profiles' over time to maintain history  Each profile is a separate record in a dimension table 25 . Slowly Changing Dimensions  Dimension source data may change over time  Relative to fact tables.

Slowly Changing Dimension Example  Example: A woman gets married  Possible changes to customer dimension • Last Name • Marriage Status • Address • Household Income  Existing facts need to remain associated with her single profile  New facts need to be associated with her married profile 26 .

Slowly Changing Dimension Types  Three types of slowly changing dimensions  Type 1 • Updates existing record with modifications • Does not maintain history  Type 2 • Adds new record • Does maintain history • Maintains old record  Type 3: • Keep old and new values in the existing row • Requires a design change 27 .

Designing Loads to Handle SCD  Design and implementation guidelines  Gather SCD requirements when designing data mapping and loading  SCD needs to be defined and implemented at the dimensional attribute level  Each column in a dimension table needs to be identified as a Type 1 or a Type 2 SCD  If one Type 1 column changes. then all Type 1 columns will be updated  If one Type 2 column changes. then a new record will be inserted into the dimension table 28 .

change data capture techniques may be used to minimize the data volume  For smaller dimension tables. compare all OLTP records with dimension table records  Balance data volume with change data capture logic complexities 29 . Designing Loads to Handle SCD  Design and implementation guidelines  For large dimension tables.

Conformed Dimensions  Conformed dimensions mean the exact same thing with every possible fact table to which they are joined.  Eg: The date dimension table connected to the sales facts is identical to the date dimension connected to the inventory facts. 30 .

Degenerate Dimensions  Dimensions with no other place to go  Stored in the fact table  Are not facts  Common examples include invoice numbers or order numbers 31 .

The junk dimension is simply a structure that provides a convenient place to store the junk attributes. Junk Dimensions  A junk dimension is a collection of random transactional codes flags and/or text attributes that are unrelated to any particular dimension.  Eg: Gender dimension 32 .