Professional Documents
Culture Documents
Normalization
• Part 1: A Simple Example
• Part 2: Another Example & The
Formal Stuff
1
A Problem: Keeping Track of Invoices (cont’d)
Fig. 9.1
Could store in an excel file but, as seen, might have problems if have complex
questions relating to the data:
1. How many 4” bolts did Frankenstein Parts order in 2002?
2. What items were sold on a certain date?
2
Solution: NF1 Cont’d
Fig. 9.2
But, were trying to reduce & simplify, now have introduced more data!
No matter, this will be addressed later (with NF3)
3
Solution: NF1 Cont’d
Fig. 9.4
• The underlying structure of the orders orders
table can be represented as Fig. 9.4 order_id(PK)
order_date
• Identify
Id tif the
th columns
l th
thatt make
k up the
th primary
i
customer_id
key with the PK notation. customer_name
customer_address
• Fig. 9.4 begins the Entity Relationship customer_city
customer_state
Diagram (or ERD). item_id(PK)
item_description
• DB schema now satisfies the 2 item_qty
item_price
requirements of NF1: atomicity item total price
item_total_price
order_total_price
& uniqueness. Thus it meets the
most basic criteria of a relational db.
Solution: NF2
• Second Normal Form (NF2):
No Partial Dependencies on a Concatenated Key
• Next have to test each table for partial dependencies on a
concatenated key
• Means that for a table with a concatenated primary key, each
column that is not part of the primary key must depend upon the
entire concatenated key for its existence.
• If a column depends upon only 1 part of the concatenated key,
then entire table has failed NF2 & must create another table to fix
it.
• For each column must ask the question:
q
– Can this column exist without one or the other part of the
concatenated primary key?
– If answer is “yes” – even once – table fails NF2
4
Solution: NF2 Cont’d
• Refer to Fig. 9.4 again to recall orders table structure.
• Recall the meaning of the two columns orders
Fig. 9.4
in the primary key:
order_id(PK)
order_date
– order_id ids invoice this item comes from. customer_id
customer_name
– item_id is the inventory items unique identifier. customer_address
5
Solution: NF2 Cont’d
• item_description is next column not itself part of PK. It is
the plain-language description of the inventory item.
– relies on item_id, but can it exist without an order_id?
– Yes! An inventory item (&
(&"description")
description ) could sit on a shelf,
shelf and
never be purchased... It can exist independent of an order.
– item_description fails the test. 8
• item_qty is no. of items purchased on a particular
invoice.
– can it exist without an item_id? No: cant have "amount of
nothing"
g
– can it exist without an order_id? No: a quantity purchased with
an invoice is meaningless without an invoice.
– So this column does not violate NF2
– item_qty depends on both parts of our concatenated PK.9
6
Solution: NF2 Cont’d
orders
order_id(PK)
Fig. 9.5 order_date
order_items
order_id(PK)
customer_id
item_id(PK)
customer_name
item_description
customer_address
item_qty
customer_city
item_price
customer_state
7
Solution: NF2 Cont’d
• things to notice abut Fig. 9.5:
1. have brought a copy of order_id to the order_items table to
allow each order_item to "remember" which order it is a part
of.
2 orders
2. d t bl h
table has fewer
f rows than
th before
b f & no longer
l has
h a
concatenated PK. PK consists of a single column, order_id.
3. order_items table does have a concatenated primary key.
• Crows feet mean in Fig. 9.5:
– each order can be associated with any number of order-items, but at
least one;
– each order-item is associated with one order, and only one.
orders
order_id(PK)
Fig. 9.5 order_date
order_items
order_id(PK)
customer_id
item_id(PK)
customer_name
item_description
customer_address
item_qty
customer_city
item_price
customer_state
8
Solution: NF2 Phase II Cont’d
Fig. 9.6 Fig. 9.6
order_items (New) order_items
order_id(PK) order_id(PK)
item_id(PK) item_id(PK)
item_description
p item description 8
item_description
item_qty item_qty9
item_price item_price 8
• On first pass thro NF2 test, lost all fields relying on item_id & put
them into new table. This time, only taking fields failing the test:
ie item_qty stays. What's different this time?
• First p
pass,, removed item_id key y from orders altogether
g cos of
the 1:M relationship between orders & order-items.
– Therefore item_qty field had to follow item_id into the new table.
• Second pass, item_id wasn’t taken from order-items table cos of
the M:1 relationship between order-items & items.
– Therefore, since item_qty does not violate NF2 this time, it is
permitted to stay in the table with the two PK parts that it relies on.
Fig. 9.7
order_date order_items items
customer_id
order_id(PK) item_id(PK)
customer_name
item_id(PK) item_description
customer_address
item_qty item_price
customer_city
customer_state
9
Solution: NF3
• Third Normal Form (NF3):
No Dependencies on Non-Key Attributes
• Can return to repeating Customer info problem. As db stands, if
customer places >1 order have to input customer
customer'ss contact info
again cos there are columns in orders that rely on "non-key
attributes".
• To understand this, consider order_date. Can it exist independent
of order_id?
– No!: an "order date" is meaningless without an order.
– order_date depends on a key attribute (order_id is "key attribute"
because it is table’s PK).
• What about customer_name — can it exist on its own, outside of
the orders table?
– Yes. It is meaningful to talk about a customer name without referring
to an order or invoice.
Fig. 9.8
customers
customer_id(PK)
customer_name
customer_address
customer_city
customer_state
10
Solution: NF3 Cont’d
• Restore relationship by creating a foreign key (indicated by (FK))
in orders
– As know, FK is a column that points to the PK in another table.
– Fig 9.9 describes this relationship, and shows our completed ERD.
• Relationship between orders & customers may be expressed in
this way:
– each order is made by one, and only one customer;
– each customer can make any number of orders, including zero
orders order_items items
order_id(PK) order_id(PK) item_id(PK)
customer_id(FK) item_id(PK) item_description
order_date item_qty item_price
Fig. 9.9
customers
customer_id(PK)
customer_name
customer_address
customer_city
customer_state
Fig. 9.10
customers
customer_id(PK)
customer_name
customer_address
customer_city
customer_state
11
Normalisation cont’d
S1 P1 300
S1
P2
P3
200
400
S1 Smith 20 Paris
• Normalisation theory is a S2
S3
Jones
Blake
10
30
Paris
Rome
S2
S2
P1
P2
300
400
practical application P1
P2
Nut
Bolt
Red
Green
12
17
London
Paris
P4
Screw
Screw
Blue
Red
27
14
Rome
London
12
Intro to Database Design Cont’d
• Normalisation theory is built around normal forms - each normal
form has a set of satisfiable criteria.
• Normal forms exist in a hierarchy:
– 1NF -> 2NF -> 3NF -> BCNF -> 4NF -> PJ/NF (5NF)
• Codd defined 1NF, 2NF, 3NF in 1972.
• 3NF had inadequacies so revised in ‘74 by Boyce/Codd (BCNF).
• 1977 Fagin defined 4NF, 1979 defined 5NF.
• 6NF,7NF ?... dependencies theory suggests there may be higher
NFs but not practicable in database environment.
• DB designers should aim for higher NFs but this is not law - just
recommended as normalisation simply provides guidelines for
database design.
• There are often good reason for not using normalisation theory.
13
Introduction to Database Design Cont’d
• There is no requirement in the definition of functional
dependence that R.X be a candidate key, thus:
R.X -> R.Y iff whenever 2 tuples of R.X are the same then the
corresponding R.Y values are also the same.
– R.Y is fully functionally dependent on R.X ….
– …. iff it is functionally dependent on R.X & not fully functionally
dependent on any subset of R.X
– Example:
S.(S#,STATUS) -> S.CITY is true but not full functional
dependence as S.S# -> > S.CITY
– If R.X -> R.Y but not fully then R.X must be composite
Normalisation: Example 2
• Given the report in Fig 9.11, need to put it in a tidy DB.
• Problems with current form:
– PROJ_NUM is supposed to be PK or part of PK but contains nulls.
Maybe
aybe PROJ
OJ_NUM+EMP
U U will de
_NUM define
e eac
each row.
o
– The table entries contain inconsistencies (e.g. JOB_CLASS
“Elect. Engineer” could be “EE” or “E. Eng” or others)
Fig. 9.11
14
Normalisation: Example 2 Cont’d
• Further problems with current form:
– The table has data redundancies leading to the following
anomalies:
1. Update Anomalies: Modifying (e.g.) JOB_CLASS for Employee 105 requires
lots of alterations (one for each employee 105).
2. Insertion Anomalies: To complete a row definition, a new employee must
be given a project; if not yet assigned, this must be assumed to complete
the employee tuple.
3. Deletion Anomalies: If employee 103 quits, every row with EMP_NUM=103
must be deleted with the potential loss of other data.
– Inefficiency: If a large number of new employees are hired, a
l t off redundant/unassigned
lot d d t/ i dddata
t mustt b
be assumedd and
d input.
i t
– Integrity: Possible data integrity problems may arise out of the
above.
Fig. 9.12
15
Example 2: Conversion to NF1 Cont’d
• Step 2. Identify the Primary Key
– Layout in Fig. 9.12 is only a cosmetic change – need a PK to
uniquely identify all tuples.
– This may be seen to be PROJ_NUM+EMP_NUM
• Step 3. Identify all dependencies
– The identification of the PK means already have the following:
PROJ_NUM,EMP_NUM PROJ_NAME,EMP_NAME,JOB_CLASS,CHG_HOUR, HOURS
Fig. 8.12
Transitive
16
Example 2: Conversion to NF1 Cont’d
• Looking at Fig. 9.13, can see that:
1. PK attributes are bold, underlined and a different colour.
2. Arrows above (blue) denote desirable FDs (those based on PK)
3. Arrows below the diagram (red and green) are less desirable:
a) Partial Dependencies: dependencies based on part of composite PK
– Need only know PROJ_NUM to know PROJ_NAME, so PROJ_NAME is only
dependent on part of the PK.
– Need only know EMP_NUM to find the EMP_NAME, JOB_CLASS,
CHG_HOUR.
b) Transitive Dependencies: Dependency of 1 non-prime attribute on another
– From Fig. 9.13, can see that CHG_HOUR is dependent on JOB_CLASS
– Neither of these is part of PK (i.e. a Prime Attribute).
Normal
Fig. 9.13
PROJ_ PROJ_ EMP_ EMP_ JOB_ CHG_ Partial
NUM NAME NUM NAME CLASS HOUR HOURS
Transitive
17
Example 2: Conversion to NF2
• Step 1. Identify all key components:
PROJ_NUM
EMP_NUM
PROJ NUM EMP
PROJ_NUM, EMP_NUM
NUM
– Each component becomes the key of a new table.
– Three new tables project, employee, assign
• Step 2. Identify the dependent attributes
– Use Fig. 9.13 to determine which attributes are dependent on
which others, using the arrows in the dependency diagram
project(PROJ NUM PROJ_NAME)
project(PROJ_NUM, PROJ NAME)
employee(EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOURS)
assign(PROJ_NUM, EMP_NUM, ASSIGN_HOURS)
– Results are shown in Fig. 9.14
project assign
Fig. 9.14 employee
18
Example 2: Conversion to NF3
• Step 1. Identify each new determinant
– For each transitive dependency, write its determinant as a PK for
a new table (recall: determinant is any attribute whose value
determines other values within a row).
– If have 3 transitive dependencies, have 3 different determinants
– Here only have one: JOB_CLASS
• Step 2. Identify the dependent attributes
– Identify the attributes dependent on each determinant identified
in Step 1. Here, have
JOB_CLASS CHG_HOUR
– Name the table to reflect its contents & function, here JOB is ok
• Step 3. Remove dependent attrib from transitive
dependencies
– Remove all dependent attributes from dependent relationship(s)
from each table with transitive relationships
– JOB_CLASS remains in the employee table as FK
• Or 4 Tables:
project(PROJ_NUM, PROJ_NAME)
assign(EMP_NUM, PROJ_NUM, ASSIGN_HOURS)
employee(EMP NUM EMP_NAME,
employee(EMP_NUM, EMP NAME JOB_CLASS)
JOB CLASS)
job(JOB_CLASS, CHG_HOUR)
• A table is in NF3 iff
– It is in NF2
And
– It contains no transitive dependencies.
19