Crack-Smoking Data Models Data Models

Theory Meets Reality

Rules of Normalization (Simplified) (Simplified)
Make sure every table has a primary key (best to have a single value) 1NF - Put any repeating groups into their own table

Don’t have a column with a comma-separated list of values or multiple columns to represent the same value multiple times

2NF/3NF - Put any information that is not dependent on the primary key in its own table If you have a table of employees (id, position, salary, name) and their salary is fully determined by their position, move that into a separate table of position/salary.

4NF/5NF - Make sure all three-way and above joins are indeed valid for 3-way, and don’t need further separation.

Rules of Normalization (Simplified) (Simplified)
• •

ORNF - The data model contains only elemental facts DKNF - The data model fully defines all constraints and is free from “update anomalies” (the data model prevents any logical inconsistencies)

Key - uniquely defines a tuple Constraint - rule governing values of attributes

In DKNF keys fully define tuples, and constraints fully define logical relationships / allowed values, including multi-table constraints

Rules of Normalization (Simplified) (Simplified)
JBNF - Don’t Smoke Crack While Doing Data Models! JBNF2 - Don’t Do Data Models For Customers Who Smoke Crack!
• •

Okay, I break this one a lot But seriously, most business rules/decisions are not based on whether or not it contributes to a logical data model And why should it?

On the other end, when making a data model, we need to realize the flexibility that businesses require, so they don’t have to re-make the data model after every business decision Nor should they have to think too much about the data model when making business decisions

The Goal

Normalized Data Models Everything can be managed via standard operations

No Redundant Data / No Calculated Columns

Strong locality -- don’t have to worry if you forgot to set something

Strict Constraints Enforced

Pushes the business logic back to the data model so it can be easily managed outside of the application logic - no “update anomalies”

Data is modeled as data - fewer text fields

All data can be managed and verified

The Reality

No system can contain all data points This by itself leads to conflicts with theory
• •

Not all information is available Summary columns have to be managed which summarizes data inside and outside the database Some business rules are too complicated/flexible to be modeled, and must be abbreviated with flags and/or text fields

The Reality

System performance demands redundant data Many summaries are too complicated to be recalculated each time

Some of this can be mitigated with functional indices

Some summaries may need to be altered based on data exterior to the database

The Reality

Some data is best stored de-normalized

Management Issues- Do we really need a table for that? Performance Issues- Do we really want to query for that?

Time Issues - Do we really want to build the interface to manage that? In some cases, maybe we can de-normalize our database to save some sanity.

The Reality

Some data models look whacked, because the data they are modeling is whacked. In an ideal world, we would encourage the customer to come up with a more consistent way to manage themselves. But usually we just model what they have because it’s easier than changing 20 years of tradition and infrastructure

Case Study Homebuilder

Builder has several divisions, each division is responsible for an area (kind of - some are on top of each other) Builder has several brands

Builder categorizes houses and communities by lifestyle Builder also needs to track home plans and inventory

JB’s Rules of Practicalization

1PF - Design a well-normalized database (the level is up to you) which describes their data as you understand it. 2PF - If they way that the customer talks about their data is inconsistent, develop a vocabulary to use when talking to them about their project which matches the data model. Be sure to clarify any unclear statements they make using the new vocabulary.

JB’s Rules of Practicalization

3PF - Determine which business rules are too fungible to be implemented well by the database, and instead make manual processes for dealing with exceptions using flags and text fields 4PF - Determine if some values may have exceptions based on incomplete information in the database, and create user-maintainable columns or tables for them

JB’s Rules for Practicalization

5PF - Rejigger your data model so that it matches your development platform nicely. 6PF - Create calculated columns based on real or anticipated performance problems. Be sure there are application-level measures taken to keep these mostly consistent.

7PF - ?