You are on page 1of 32

Part 6

Normalization
Normal Form Overview
Universe of All Data
Relations (normalized / unnormalized
1st Normal Form
2nd Normal Form
3rd Normal Form
Boyce-Codd Normal Form (BCNF)
4th Normal Form
5th Normal Form (PJ/NF)
Domain/Key
Normal Form
(DK/NF)

Copyright  1971-2002 Thomas P. Sturm Normalization Part 6, Page 2


Universe of Relations
Any sequential file is a relation
Not all relations are “well formed”
Normalization provides a set of criteria to evaluate the
“well formedness” of a relation
Normal form is only one criterion for determining a
“good” model
In general, a sequential file may have repeating groups

Example 1 - suppliers:
part suppliers
diode (GE, TRW, Mot)
bulb (GE, Syl)

Implemented as:
part supplier1 supplier2 supplier3
diode GE TRW Mot
bulb GE Syl

Copyright  1971-2002 Thomas P. Sturm Normalization Part 6, Page 3


Problem with a Relation Not in First
Normal Form

Retrieval:
• which supplier of a part should be
retrieved?
• in which order should suppliers be
retrieved?
Storage:
• how many suppliers do you allow for?
• which spaces are kept blank, and how?
Insert:
• to add a supplier, need to retrieve all
suppliers, add the new supplier to an empty
slot, and replace the record
Delete:
• to delete a supplier, need to “adjust” the
vector (read, move around, erase, re-write)
Update:
• to update a supplier name, need to retrieve
all suppliers, find the one to alter, and re-
write entire set of suppliers

Copyright  1971-2002 Thomas P. Sturm Normalization Part 6, Page 4


Solution to Repeating Group Problem

Eliminate repeating groups by repeating the key

Example 1 - suppliers:
part supplier
diode GE
diode TRW
diode Mot
bulb GE
bulb Syl

This new table has a different key than the


old one. It is part plus supplier.

Copyright  1971-2002 Thomas P. Sturm Normalization Part 6, Page 5


First Normal Form
All underlying domains contain atomic values only (no
vectors / repeating groups)

Example 2 - inventory:
part # warehouse # wh_address quantity
100 05 Mpls 200
100 08 StPaul 300
200 05 Mpls 250
200 10 Madison 400
300 08 StPaul 350

Update Anomalies:
UPDATE
• address of warehouse stored in many rows
• if address changes, must change all rows
DELETE
• if the last row for a warehouse is deleted, the address is
lost
INSERT
• to insert a new row, warehouse address must be known

The problem occurs because this table is not focused on


one primary key - it is “about” two things -
warehouses and parts in warehouses.

Copyright  1971-2002 Thomas P. Sturm Normalization Part 6, Page 6


Solution to Multiple Focus Problems

A relation that is in 1NF but not in a higher normal


form has a composite key (more than one
attribute in the key)

Establish 2 relations via projection

Example 2 - inventory:
One table about warehouses:
warehouse# wh_address
05 Mpls
08 StPaul
10 Madison

One table about inventory with a composite key:


part# warehouse# quantity
100 05 200
100 08 300
200 05 250
200 10 400
300 08 350

The original table in 1NF can be reconstructed by a join

Copyright  1971-2002 Thomas P. Sturm Normalization Part 6, Page 7


Second Normal Form

1NF + every non-key attribute is fully functionally


dependent on the primary key

Example 3 - departments:
name dept dept_loc
smith 402 100
jones 401 200
king 402 100
turner 400 200
olson 401 200

Problem: Functional dependency is transitive


• The primary key is name
• dept is functionally dependent on name
• dept_loc is also functionally dependent on name, but it is
transitive because dept functionally determines
dept_loc

Copyright  1971-2002 Thomas P. Sturm Normalization Part 6, Page 8


Problems with 2NF Relations

Update Anomalies:

UPDATE - location appears many times - if location of


a department changes, must fetch and change all
rows containing that location

DELETE - if the last row for a department is deleted,


the department location information is lost

INSERT - to insert a new row, department location


must be known

Solution: Establish 2 relations via projection


Example 3 - departments:
name dept and dept dept_loc
smith 402 400 200
jones 401 401 200
king 402 402 100
turner 400
olson 401

Copyright  1971-2002 Thomas P. Sturm Normalization Part 6, Page 9


Third Normal Form

2NF + every non-key attribute is non-transitively


functionally dependent on the primary key

OR

Every non-key attribute is


• mutually independent (none is functionally dependent
on any of the others)
• fully functionally dependent on the primary key

OR

(Kent) Each attribute in the relation is


functionally dependent on the key, the whole
key, and nothing but the key

A relation that is 2NF but not 3NF


• can be split into a collection of 3NF relations by
projection
• can be reconstructed by join

Copyright  1971-2002 Thomas P. Sturm Normalization Part 6, Page 10


3NF Examples

Example 4 - locations:
dept# dept_name dept_loc
400 programming 200
401 financial 200
402 academic 100
403 support 300

• dept# and dept_name are candidate keys


• dept_loc is the only non-key attribute, and is, by
default, non-transitively functionally dependent on
the primary key
• This table is fine - it is only about departments

Example 5 - stock:
s# sname p# qty
10 GE 102 1000
10 GE 103 625
10 GE 104 2000
20 TRW 102 500
20 TRW 105 1200
30 Syl 103 1300

• technically in 3NF
• qty is the only non-key attribute (like example 1)
• candidate keys are (s#, p#) and (sname, p#)
• didn't require components of an alternate key to be fully
functionally dependent on the primary key

Copyright  1971-2002 Thomas P. Sturm Normalization Part 6, Page 11


Problems with 3NF Relations

The problems associated with alternate key


components were not recognized in the early
formulations of the relational model.

Have the same update anomalies as second normal


form

Solution: Establish 2 relations via projection


Example 5 - stock:
s# sname and s# p# qty
10 GE 10 102 1000
20 TRW 10 103 625
30 Syl 10 104 2000
20 102 500
20 105 1200
30 103 1300
or [s#, sname] and [sname, p#, qty]

Because of this problem, 3NF (as we have


described it) is sometimes referred to as “early
3rd Normal Form”

Copyright  1971-2002 Thomas P. Sturm Normalization Part 6, Page 12


BCNF
Boyce-Codd Normal Form

3NF + every determinant is a candidate key


(Determinant: any attribute on which some other
attribute is fully functionally dependent)

• In example 4, dept# determined dept_name; in


example 5, s# determined sname

• In example 4, dept# was a candidate key

• In example 5, s# (by itself) was not a candidate key

A relation that is 3NF but not BCNF


• Can be split into a collection of BCNF relations by
projection

• Can be reconstructed by join

BCNF is sometimes referred to as late 3rd Normal


Form, or even just as 3rd Normal Form

Copyright  1971-2002 Thomas P. Sturm Normalization Part 6, Page 13


3NF to BCNF Example

Example 6 - enrollment
Rules:
1. For each subject, each student is taught by 1 teacher
2. Each teacher teaches only 1 subject (don't I wish)
3. Each subject is taught by several teachers

Student Subject Teacher


Smith Math Dr. White
Smith English Dr. Brown
Jones Math Dr. White
Jones English Dr. Brown
Doe Math Dr. Green

a. Teacher dependent on Student + Subject


b. Subject dependent on Teacher
c. Teacher not dependent on Subject
d. (Student, Subject) is a candidate key
e. (Student, Teacher) is also a candidate key

Update anomalies, e.g., Dr. White changes name


Relation in 3NF, but not in BCNF
Teacher is a determinant (b.), but not a candidate key (d.
and e.)

Copyright  1971-2002 Thomas P. Sturm Normalization Part 6, Page 14


Solution to Example 6

Solution: Form two relations:


Student Teacher and Teacher Subject
Smith Dr. White Dr. White Math
Smith Dr. Brown Dr. Brown English
Jones Dr. White Dr. Green Math
Jones Dr. Brown
Doe Dr. Green

Question: How did we know to break it up this way?

Answer: The rules help us make this decision. In this


case, rule 2 gives us the crucial information - once you know
the teacher, you know the subject. Therefore, we need two
tables to enforce the rule.
• The [Teacher, Subject] table tells us which one subject
each teacher teaches.
• Students, in general, need both a subject and a teacher
• If we specify only subject, we don't know the teacher
• If we specify teacher, however, we do know the subject
because of the rule and the first table

Copyright  1971-2002 Thomas P. Sturm Normalization Part 6, Page 15


Fourth Normal Form

4NF and 5NF are relevant only when all attributes


in the relation are parts of the key
• if in BCNF and have a non-key attribute,
also in 5NF

Example 7 - skills:
Suppose we wish to store employee job skills and
language skills. (An employee may have many of each.)
employee skill language
Jones electrical French
Jones electrical German
Jones mechanical French
Jones mechanical German
Smith plumbing Spanish

In general: if Jones x A
and Jones y B

then Jones x B
and Jones y A

The relation is in BCNF - because it is all key ...


but there is redundancy

Copyright  1971-2002 Thomas P. Sturm Normalization Part 6, Page 16


Converting to 4NF

Ask the following questions:


• Could the relation have non-key attributes?
• Could any combination be missing?
If either answer is NO, need to break up relation to
achieve 4NF
Example 7 - skills:
employee skill language

should be broken up into two relations:

employee skill and employee language


Jones electrical Jones French
Jones mechanical Jones German
Smith plumbing Smith Spanish

if job skill and language are independent

Copyright  1971-2002 Thomas P. Sturm Normalization Part 6, Page 17


Problems without 4NF

Problem occurs when dealing with multiple,


independent facts
How do we represent them in a single relation?
Disjoint: Jones electrical
Jones mechanical
Jones French
Jones German
Smith plumbing
Smith Spanish

Random mix: Jones electrical French


Jones mechanical German
Smith plumbing Spanish

(do extras - repeat, - blank, - anything?)

Cross product: Jones electrical French


Jones mechanical French
Jones electrical German
Jones mechanical German
Smith plumbing Spanish

Check for independence!

Copyright  1971-2002 Thomas P. Sturm Normalization Part 6, Page 18


Fifth Normal Form

PJ/NF or Projection-Join Normal Form


(Kent) - Deals with cases where information can be
reconstructed from smaller pieces of information which can be
maintained with less redundancy

Example 8 - dealerships:
1. Agents represent Companies
2. Companies make Products
3. Agents sell Products

Which Agent sells which Product for which Company?


Agent Company Product
smith ford car
smith gm truck
jones ford car

• this form is necessary in the general case


BUT if we put a rule into effect that reads:
4. if an agent sells a product, and an agent represents a
company, then the agent must sell the product made by
the company

So, to obey the rule, we must add


smith ford truck
smith gm car

NOW, with the rule and the new rows, we have


REDUNDANCY

Copyright  1971-2002 Thomas P. Sturm Normalization Part 6, Page 19


Converting to 5NF

This time, we must break the relation into three


parts (will not break in two)
Example 8 - dealerships:
Agent Company Product
smith ford car
smith gm truck
jones ford car
smith ford truck
smith gm car

BREAK INTO 3

Agent Company Agent Product Company Product


smith ford smith car ford car
smith gm smith truck ford truck
jones ford jones car gm car
gm truck

A relation is already in 5NF if its information


content cannot be reconstructed from several
smaller record types (having different keys)
• Only have 5NF problems if there are symmetry
constraints (a pair of rows requires the existence of
one or more additional rows)

Copyright  1971-2002 Thomas P. Sturm Normalization Part 6, Page 20


Domain/Key Normal Form

No insertion/deletion anomalies

Impossible to make an insertion/deletion that


violates a constraint

Constraint types:
• domain constraints
• key constraints

Example 9 - customers

cust# branch
1234 west
1325 south
1421 east
1511 south

where valid branches are west, east, and south

Copyright  1971-2002 Thomas P. Sturm Normalization Part 6, Page 21


Enforcing Domain Integrity in DK/NF
Example 9 - customers:
cust# branch
1234 west
1325 south
1421 east
1511 south
1600 north

If this update is possible, not in DK/NF


One possibility for prohibiting this update is to maintain a
table of legal branches and write code to prohibit
the entry of a branch not in the table
legal branch
west
south
east

Problem: What's to stop someone from placing north in


the legal branch table?

Possible partial solution: Restrict access to the legal


branch table

Copyright  1971-2002 Thomas P. Sturm Normalization Part 6, Page 22


Normalization Example:
AutoCAD Database

Manufacturing plant electrical wiring


specifications

Blueprints contain:
• parts at locations
• wired connections
• attributes for each wire and location

AutoCAD transmits variable-length records


• only transmits data for “smart parts”
• one record per part
• all data must be related to one or more parts

Objectives:
• number of wires from any source to any destination
• sub-classified by voltage, shielding, and intrinsic
safety characteristics
• obtain conduit count from wire count (by hand)

Copyright  1971-2002 Thomas P. Sturm Normalization Part 6, Page 23


Wiring Problem Statement

Part: Wire 52 Part:


AS AS
303 404

Wire 53 Wire

Part:
Panel (3) AS
405

Panel (4)

Problem:
Count the wires going from panel (3) to
panel (4)

Copyright  1971-2002 Thomas P. Sturm Normalization Part 6, Page 24


Wiring Database Normalization
0th Normal Form
Part# Loc Loc_Desc Wire1 Volt IS Wire2 Volt IS Wire3 ...
AS303 (3) Panel 3 52 240 53 24 IS 54
AS404 (4) Panel 4 52 240 55 120
AS405 (4) Panel 4 53 24 IS

1st Normal Form (no repeating groups)


– but the 2-part key creates partial
dependencies
Part# Loc Loc_Desc Wire# Volt IS
AS303 (3) Panel 3 52 240
AS303 (3) Panel 3 53 24 IS
AS303 (3) Panel 3 54 240
AS404 (4) Panel 4 52 240
AS404 (4) Panel 4 55 120
AS405 (4) Panel 4 53 24 IS

2nd Normal Form (no partial dependencies)


Part# Loc Loc_Desc Wire#
AS303 (3) Panel 3 52
AS303 (3) Panel 3 53
AS303 (3) Panel 3 54
AS404 (4) Panel 4 52
AS404 (4) Panel 4 55
AS405 (4) Panel 4 53
Wire# Volt IS
52 240
53 24 IS
54 240
55 120

Copyright  1971-2002 Thomas P. Sturm Normalization Part 6, Page 25


Wiring Database Normalization
(Continued)
2nd Normal Form (apply rule again)
– with the step-by-step approach, you only
eliminate one partial dependency at a time
Part# Wire#
AS303 52
AS303 53
AS303 54
AS404 52
AS404 55
AS405 53
Wire# Volt IS
52 240
53 24 IS
54 240
55 120
Part# Loc Loc_Desc
AS303 (3) Panel 3
AS404 (4) Panel 4
AS405 (4) Panel 4

Copyright  1971-2002 Thomas P. Sturm Normalization Part 6, Page 26


Wiring Database Normalization
(Concluded)
3rd Normal Form (no transitive dependencies)
Part# Wire#
AS303 52
AS303 53
AS303 54
AS404 52
AS404 55
AS405 53
Wire# Volt IS
52 240
53 24 IS
54 240
55 120
Part# Loc
AS303 (3)
AS404 (4)
AS405 (4)
Loc Loc_Desc
(3) Panel 3
(4) Panel 4

Copyright  1971-2002 Thomas P. Sturm Normalization Part 6, Page 27


A Case Study (March)

Population:
• MBA students
• MIS majors
• last quarter in the program

Language:
• Nomad 2 4GL

Case:
• Customer, with attributes
• Dealer, with attributes
• Manufacturer, with attributes
• Contracts - customer, dealer, manufacturer, with
symmetry constraints

Task:
• Given case description fully analyzed
• Use existing Nomad database
• Perform 8 queries
• Perform 11 updates

Copyright  1971-2002 Thomas P. Sturm Normalization Part 6, Page 28


Relations Given in Case Study (March)

Design:
• 3 groups of 14 students
• same case, same queries, same updates
• different schemas (1NF, 3NF, and 5NF)

1NF Schema:
(d#, m#, c#, mfgr_attr, cust_attr, dealer_attr)

3NF Schema:
(c#, cust_attr)
(d#, dealer_attr)
(m#, mfgr_attr)
(d#, c#, m#)

5NF Schema:
(c#, cust_attr)
(d#, dealer_attr)
(m#, mfgr_attr)
(d#, c#)
(d#, m#)
(c#, m#)

Copyright  1971-2002 Thomas P. Sturm Normalization Part 6, Page 29


Preliminary Case Results (March)

Tasks Correctly Performed

Normal Form: Queries (8) Updates (11)

First 7.21 5.07


(90%) (46%)

Third 4.50 3.64


(56%) (33%)

Fifth 4.42 3.21


(55%) (29%)

Copyright  1971-2002 Thomas P. Sturm Normalization Part 6, Page 30


Database Design Issues Revisited

Ease of query formulation


Ease of enforcing referential integrity constraints
Ease of avoiding update anomalies

Normalization focuses only on avoiding update


anomalies

“Being normal is not enough”

Possible solutions:
1. Don't normalize
2. Don't normalize beyond BCNF
3. Normalize to 5NF, but back off
Problems with 1-3: update anomalies, bad data,
knowledge of database storage needed
4. Don't let users at base tables
5. Create views that are in low normal forms
6. Pre-define joins that give users the data they need
Solutions 4-6 are more work, but generally worth the
effort

Copyright  1971-2002 Thomas P. Sturm Normalization Part 6, Page 31


Copyright  1971-2002 Thomas P. Sturm Normalization Part 6, Page 32

You might also like