COMP 430 Intro. To Database Systems: Denormalization & Dimensional Modeling

COMP 430
Intro. to Database Systems

Denormalization & Dimensional Modeling
Some consequences of normalization
• Data redundancy is reduced or eliminated.
• Relations are broken into smaller, related tables.
• Using all the attributes from the original relation requires joining
these smaller tables.
Denormalization
Deliberately reintroducing some redundancy, so that we can access
data faster.
DENORMALIZED
DATA AHEAD
Example
Technique:
Add duplicate fields
Technique:
Add computed fields
Example
Technique:
Join tables
Less common denormalization techniques
• Duplicating a commonly-used subset of table fields
• Splitting some table rows into different tables
• Frequently- vs. rarely-used data
• Data for different regions
• Common subclasses of data
When to denormalize?
Typically used when some or all of the following apply:

• Many queries need to join the data
• Joining the data is expensive – uses scans, rather than indices
• Computing derived data is expensive – complex queries or complex functions
Dealing with redundant data
Still want data consistency, but now it requires work.
Is full consistency required at all times?
Techniques:
• Stored procedures act as API for updating DB. They add the redundant data.
• Triggers check (and fix?) consistency.
• Application code carefully maintains consistency during updates.
• Reconcile data as background process.
• Reconcile data during system maintenance.
Dimensional modeling
In a tiny, brief nutshell
An alternative to ER modeling
Only a brief overview.

So, we’ll view it through our lens of ER modeling + denormalization.
Emphasizes decision making & use of historical data

• Fast retrieval & aggregation of data
• Less concern with updating data & maintaining consistency while updating
DB design often resembles multiple
starflakes
Starflake = tree of junction table & child tables in 1-to-many relationships
College(collegeid, …)
Course(crn, …) Student(sid, …, collegeid, …, home_stateid)

Enrollment(crn, sid, …) State(stateid, …, country)
Teach(crn, instr_id, …)
Typical:
• Few cycles.
Instructor(instr_id, …) • Super/sub classes implemented
only with superclass table.
Starflakes can be compressed to stars
Star = junction table & one level of child tables in 1-to-many relationships
Course(crn, …) Student(sid, …, college, …, home_state)

Enrollment(crn, sid, …)
Teach(crn, instr_id, …)
Denormalize by joining each child’s tree.
Instructor(instr_id, …)
Fact & dimension tables
Facts: The junction tables are the most important data – the facts
• E.g.: store purchases, class enrollments, click data
• Generally the largest tables
• Key data often numeric & additive – e.g., quantity bought, cost per unit,
advertisement views
Dimensions: The child tables in 1-to-many relationship with facts

• E.g., stores, customers, sales people, sales period
Dimensional modeling process
• Centered on identifying the business model
• Identifying the potential queries
• Identifying the facts – the data used in such queries
• Each query should use only one fact table & its dimension tables.
• Possibly start with an ER model

• Identify which junction tables serve as fact tables
• Use only surrogate keys
• Often add time dimension
• Denormalize into starflake or star schema
Customer Product
customer_ID sku
customer_name
purchase_profile
ER model description
brand
credit_profile category
address
OrderLine
Store Order
order_id
store_id order_id
sku
store_name customer_id
promotion_id
address store_id
dollars_sold
district clerk_id
units_sold
floor_type date
dollars_cost
Clerk Promotion
clerk_id promotion_id
clerk_name promotion_name
clerk_grade price_type
ad_type
Customer Product
customer_key
customer_name
Dimensional product_key
sku
purchase_profile
credit_profile model description
brand
address category
Order
time_key
Store
store_key
store_key Promotion
clerk_key
store_id promotion_key
product_key
store_name promotion_name
customer_key
address price_type
promotion_key
district ad_type
dollars_sold
floor_type
units_sold
dollars_cost
Time
Clerk time_key
clerk_key SQL_date
clerk_id day_of_week
clerk_name month
clerk_grade
View fact table as -dimensional data cube
Facts are the data in Each dimension table

the cube. represents a dimension
of the cube.
Facts might be pre-

aggregated along each
dimension or combination
of dimensions.
Sometimes things are still messy
Not all data fit nicely into facts + 1-to-many dimensions.
Leads to exceptions from this simple presentation.

COMP 430 Intro. To Database Systems: Denormalization & Dimensional Modeling

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

COMP 430 Intro. To Database Systems: Denormalization & Dimensional Modeling

Uploaded by

Copyright:

Available Formats

COMP 430

Intro. to Database Systems

Typically used when some or all of the following apply:

Only a brief overview.

Emphasizes decision making & use of historical data

Course(crn, …) Student(sid, …, collegeid, …, home_stateid)

Course(crn, …) Student(sid, …, college, …, home_state)

Dimensions: The child tables in 1-to-many relationship with facts

• Possibly start with an ER model

Facts are the data in Each dimension table

Facts might be pre-

Not all data fit nicely into facts + 1-to-many dimensions.

Leads to exceptions from this simple presentation.

You might also like