You are on page 1of 64

Lecture 17 4

th
Normal Form
Prof. Sin-Min Lee
Department of Computer Science
Functional Dependencies
o Dependencies for
this relation:
A ÷ B
A ÷ D
BC ÷ EF
o Do they all hold in
this instance of
the relation R?
R A B C D E F
a1 b1 c1 d1 e1 f1
a1 b1 c2 d1 e2 f3
a2 b1 c2 d3 e2 f3
a3 b2 c3 d4 e3 f2
a2 b1 c3 d3 e4 f4
a4 b1 c1 d5 e1 f1

• Functional dependencies are specified by the database
programmer based on the intended meaning of the
attributes.
Armstrong’s Axioms
• Armstrong’s Axioms: Let X, Y be sets of attributes from a
relation T.
[1] Inclusion rule: If Y _ X, then X ÷ Y.
[2] Transitivity rule: If X ÷ Y, and Y ÷ Z, then X ÷ Z.
[3] Augmentation rule: If X ÷ Y, then XZ ÷ YZ.
• Other derived rules:
[1] Union rule: If X ÷ Y and X ÷ Z, then X ÷ YZ
[2] Decomposition rule: If X ÷ YZ, then X ÷ Y and X ÷ Z
[3] Pseudotransitivity: If X ÷ Y and WY ÷ Z, then XW ÷ Z
[4] Accumulation rule: If X ÷ YZ and Z ÷ BW,
then X ÷ YZB
Closure
• Let F be a set of functional dependencies.
• We say F implies a functional dependency g if g can be
derived from the FDs in F using the axioms.
• The closure of F, written as F
+
, is the set of all FDs that
are implied by F.
• Example: Given FDs { A ÷ BC, C ÷ D, AD ÷ B },
what is the closure?
• The closure is a potentially exponential set:
Trivial dependencies: {A ÷A, B ÷B,…,ABC ÷ABC,…}, other
dependencies obtained by augmentation {AB ÷ABC, BC ÷BD,…},
dependencies obtained by other rules (or multiple rules), {A ÷ BC, C
÷ D, AD ÷ B, A ÷ D }
Closure
• Given a set F of functional dependencies. A
functional dependency X ÷ Y is said to be
entailed (implied) by F, if X ÷ Y is in F
+.
• To sets of functional dependencies F1, F2 are said
to be equivalent, if their closures are equivalent,
I.e. F1
+
= F2
+
.
Checking Entailment
• To find whether a functional dependency X ÷ Y
is implied by a set of functional dependencies F
– We can apply all the rules in Armstrong’s Axioms to
find whether we can obtain this dependency from the
dependencies in F
OR
– We can determine the closure of attributes X, denoted
by X
+
with respect to F, and check whether Y _ X
+

Closure of a set of attributes
• Given a set of F of functional dependencies, the
closure of a set of attributes X, denoted by X
+
is the
largest set of attributes Y such that X ÷ Y.
Algorithm Closure(X,F)
X[0] = X; I = 0;
repeat
I = I + 1;
X[I] = X[I-1];
FOR ALL Z ÷ W in F
IF Z _ X[I] THEN X[I] = X[I] W
END FOR
until X[I] = X[I-1];
RETURN X
+
= X[I]
Entailment
• Given F = { C ÷ DE, AB ÷ CE, EB ÷ CF, G
÷ A }
• Find: GB
+
– Initialize: GB
+
= {G,B}
– Use G ÷ A , add A, GB
+
= {A,B,G}
– Use AB ÷ CE, add C,E, GB
+
= {A, B, C, E, G}
– Use C ÷ DE, add D, GB
+
= {A, B, C, D, E, G}
– Use EB ÷ CF, add F, GB
+
= {A, B, C, D, E, F, G}
– Incidentally, GB is a superkey. Is it also a key?
Closure of a set of attributes
• Given a set of functional dependencies F for
a relation R, X is said to be a superkey, if
X
+
contains all the attributes in R.
– In other words, X implies all other attributes.
– Alternatively, if two tuples are the same with
respect to X then they should be the same with
respect to all other attributes.
Boyce-Codd Normal Form
• A table T is said to be in Boyce-Codd Normal Form
(BCNF) with respect to a given set of functional
dependencies F if for all functional dependencies of the
form X ÷ A entailed by F the following is true:
– If A is not a subset of X then X is a superkey, or
– If A is not contained in X then, X contains all the
attributes in a key.
• Given {AB ÷ C, AB ÷ D, AE ÷ D, C ÷ F} with Key:
{A,B,E}
– not in BCNF since C is a single attribute not in AB, but AB is not a
superkey.
Boyce-Codd Normal Form
Given head(T)={A,B,C,D,E,F} with functional dependencies
{AC ÷ D, AC ÷ E, AF ÷ B, AD ÷ F, BC ÷ A, ABC ÷ F } and
keys: {A, C}, {B, C}, is this relation T in BCNF?
No. It is sufficient to find one violation!
– AF ÷ B violates BCNF since B is not in AF and AF
is not a superkey.
– AD ÷ F violates BCNF since F is not in AD and AD
is not a superkey.
Note: ABC ÷ F does not violate BCNF since ABC is a
superkey.
Decomposition
• A decomposition of a relation R with functional
dependency set F is a sequence of pairs of the form
– (R1,F1), …, (Rn,Fn) such that
– The union of attributes in R1,…,Rn is equivalent to the attributes
in R
– All functional dependencies in F1,…,Fn are entailed by F.
• A decomposition is obtained by a simple projection of R
onto the attributes in the decomposed relations.
– For example, given R1 with schema R1(A1) , then
R1 = H
A1
(R)

Lossless Decompositions
• A decomposition of R to (R1, F1) and (R2,F2) is
said to be lossless, if we join R1 and R2 on the
common attributes, we are guaranteed to get R.
• In other words, given R1(A1) and R2(A2), we
need to make sure that it is always the case that
R=R1 join(A1·A2) R2

where join(A1·A2) means equi-join on the common
attributes in A1 and A2.
R A B C D
a1 b1 c1 d1
a1 b1 c2 d1
a2 b1 c2 d3
a3 b2 c3 d4
a4 b1 c1 d1

R1 A B
a1 b1
a2 b1
a3 b2
a4 b1

R2 B C D
b1 c1 d1
b1 c2 d1
b1 c2 d3
b2 c3 d4

R1 join R2 A B C D
a1 b1 c1 d1
a1 b1 c2 d1
a1 b1 c2 d3
a2 b1 c1 d1
a2 b1 c2 d1
a2 b1 c2 d3
a3 b2 c3 d4
a4 b1 c1 d1
a4 b1 c2 d1
a4 b1 c2 d3
{R1, R2} is not a
lossless decompostion
of relation R.
The join with respect to
R1.B=R2.B is not equal
to R.
Lossless Decomposition
Given a relation R with a set F of functional
dependencies, and a decomposition of to
(R1, F1) and (R2,F2) such that R1(A1) and
R2(A2) is said to be lossless iff either

A1·A2 ÷ A1 or A1·A2 ÷ A2

is entailed by F.
Lossless Decompositions
Let F={AB ÷ C, CD ÷ E, AB ÷ E, DE ÷ AF}
and
R1(A,B,C,D) F1={AB ÷C}
R2(C,D,E,F) F2={CD ÷E,DE ÷F}

is this lossless?
ABCD · CDEF = CD
is CD ÷CDEF entailed by F?
Dependency preservation
• A decomposition of relation (R,F) into (R1(A1),
F1) and (R2(A2), F2) is said to be dependency
preserving iff F1F2 is equivalent to F.
• To check whether F1F2 is equivalent to F, we
need to check
1. Whether all functional dependencies in F1F2 are
entailed by F, and
2. Whether all functional dependencies in F are entailed
by F1F2 .
Dependency preservation
• Given a decomposition R1(A1) of a relation R
with functional dependency set F, the only
dependencies that can be preserved in R1 are all
dependencies in F
+
of the form B ÷ C such that
BC _ A1.
Let F={AB ÷ C, CD ÷ E, AB ÷ E, DE ÷ AF}
Find all the functional dependencies that can be
preserved in:
R1(A,B,C,D)
R2(C,D,E,F)
Dependency preservation
• When a decomposition is lossy, then information
connecting tuples is lost. We get errorneous
information.
• When dependencies are lost in a decomposition,
they cannot be enforced as table constraints. They
have to be enforced as additional constraints.
• It is vital that decompositions are lossless. It is
important but not vital that decompositions are
dependency preserving.
Normal Forms
• If a relation is not in BNCF normal form, then it
can be decomposed by lossless decompositions
into smaller relations that are in BCNF.
• BCNF decomposition:
– Until all relations are in BCNF
• Find a dependency X ÷ Y in R(A) that violated BCNF
• Replace R, with R1(XY), and R2(A-Y)X
• This algorithm may cause many dependencies to
be lost.
3NF Conversion
• Given R(A,B,C,D,E,F) and
F= { ABC÷D, ABC÷E, BD÷E, E÷C, E÷F}
• Keys: ABC and ABE.
• Not in BNCF (violations: BD÷E, E÷C, E÷F),
• Not in 3NF (violations: E÷F).
• Convert to 3NF using the algorithm:
– First compact functional dependencies with common left side, to get
F= { ABC÷DE, BD÷E, E÷CF}
– Create relations R1(A,B,C,D,E), R2(B,D,E), R3(E,C,F)
– Since there exists relations that contain ABC, and ABE, we are done.
– Incidentally, all relations are also in BCNF.
Multivalued Dependencies
(MVDs)
• Let R be a relation schema and let o _ R
and | _ R. The multivalued dependency
o ÷÷ |
holds on R if in any legal relation r(R), for
all pairs for tuples t
1
and t
2
in r such that
t
1
[o] = t
2
[o], there exist tuples t
3
and t
4
in r
such that:
t
1
[o] = t
2
[o] = t
3
[o] = t
4
[o]
t
3
[|] = t
1
[|]
t
3
[R – |] = t
2
[R – |]
t
4
[|] = t
2
[|]
t
4
[R – |] = t
1
[R – |]

Motivation
• There are schemas that are in BCNF that do not
seem to be sufficiently normalized
name street
Stars
city title year
C. Fisher
C. Fisher
C. Fisher
C. Fisher
C. Fisher
123 Maple Str.
5 Locust Ln.
123 Maple Str.
5 Locust Ln.
123 Maple Str.
5 Locust Ln. C. Fisher
Hollywood
Malibu
Hollywood
Malibu
Hollywood
Malibu
Star Wars 1977
Star Wars 1977
Empire Strikes Back 1980
Empire Strikes Back 1980
Return of the Jedi 1983
Return of the Jedi 1983
Attribute Independence
• No reason to associate address with one movie
and not another
• When we repeat address and movie facts in all
combinations, there is obvious redundancy
• However, NO BCNF violation in Stars
relation
– There are no non-trivial FD’s at all, all five attributes
form the only superkey
– Why?
Multi-valued Dependency
Definition: Multivalued dependency (MVD):
A
1
A
2
…A
n
÷÷ B
1
B
2
…B
m
holds for relation R if:
For all tuples t, u in R
If t[A
1
A
2
...A
n
] = u[A
1
A
2
...A
n
], then there exists a v in R
such that:
(1) v[A
1
A
2
...A
n
] = t[A
1
A
2
...A
n
] = u[A
1
A
2
...A
n
]
(2) v[B
1
B
2
…B
m
] = t[B
1
B
2
…B
m
]
(3) v[C
1
C
2
…C
k
] = u[C
1
C
2
…C
k
], where C
1
C
2
…C
k
is all
attributes in R except (A
1
A
2
...A
n
B
1
B
2
…B
m
)
Example: name ÷÷ street city
name street
Stars
city title year
C. Fisher
C. Fisher
C. Fisher
123 Maple Str.
123 Maple Str.
5 Locust Ln.
Hollywood
Hollywood
Malibu
Star Wars 1977
Empire Strikes Back 1980
Empire Strikes Back 1980
C. Fisher
C. Fisher
5 Locust Ln.
123 Maple Str.
5 Locust Ln. C. Fisher
Malibu
Hollywood
Malibu
Star Wars 1977
Return of the Jedi 1983
Return of the Jedi 1983
t
u
v
Example: name ÷÷ street city
name street
Stars
city title year
C. Fisher
C. Fisher
C. Fisher
C. Fisher
123 Maple Str.
5 Locust Ln.
123 Maple Str.
5 Locust Ln.
Hollywood
Malibu
Hollywood
Malibu
Star Wars 1977
Star Wars 1977
Empire Strikes Back 1980
Empire Strikes Back 1980
C. Fisher 123 Maple Str.
5 Locust Ln. C. Fisher
Hollywood
Malibu
Return of the Jedi 1983
Return of the Jedi 1983
u
t
w
v
More on MVDs
• Intuitively, A
1
A
2
…A
n
÷÷ B
1
B
2
…B
m
says that the
relationship between A
1
A
2
…A
n
and B
1
B
2
…B
m
is independent
of the relationship between A
1
A
2
…A
n
and R -{B
1
B
2
…B
m
}
– MVD's uncover situations where independent facts related to a certain
object are being squished together in one relation
• Functional dependencies rule out certain tuples from being in
a relation
– How?
• Multivalued dependencies require that other tuples of a certain
form be present in the relation
– a.k.a. tuple-generating dependencies
Let’s Illustrate
• In Stars, we must repeat the movie (title, year) once for
each address (street, city) a movie star has
– Alternatively, we must repeat the address for each movie a star has
made
• Example: Stars with name

÷÷ street city
name street city title year
C. Fisher
C. Fisher
C. Fisher
123 Maple Str.
5 Locust Ln.
123 Maple Str.
Hollywood
Malibu
Hollywood
Star Wars 1977
Empire Strikes Back 1980
Return of the Jedi 1983
• Is an incomplete extent of Stars
– Infer the existence of a fourth tuple under the given MVD
Trivial MVDs
• Trivial MVD
A
1
A
2
…A
n
÷÷ B
1
B
2
…B
m
where
B
1
B
2
…B
m
is a subset of A
1
A
2
…A
n
or
(A
1
A
2
…A
n
B
1
B
2
…B
m
) contains all
attributes of R

Reasoning About MVDs
• FD-IS-AN-MVD Rule (Replication)
If A
1
A
2
…A
n
÷ B
1
B
2
…B
m
then
A
1
A
2
…A
n
÷÷ B
1
B
2
…B
m
holds

Reasoning About MVDs
• COMPLEMENTATION Rule
If A
1
A
2
…A
n
÷÷ B
1
B
2
…B
m
then A
1
A
2
…A
n
÷÷
C
1
C
2
…C
k
where C
1
C
2
…C
k
is all attributes in R except
(A
1
A
2
…A
n
B
1
B
2
…B
m
)
• AUGMENTATION Rule
If XY and W_Z then WX YZ
• TRANSITIVITY Rule
If XY and YZ then X  (Z÷Y)

Coalescence Rule for MVD
X  Y
-W:W  Z
C
_
Then: X  Z

If:
Remark: Y and W have to be disjoint and Z has to be a subset of or
equal to Y
Definition 4NF
• Given: relation R and set of MVD's for R
• Definition: R is in 4NF with respect to its
MVD's if for every non-trivial MVD
A
1
A
2
…A
n
÷÷B
1
B
2
…B
m
, A
1
A
2
…A
n
is a
superkey

• Note: Since every FD is also an MVD, 4NF
implies BCNF
• Example: Stars is not in 4NF
Decomposition Algorithm
(1) apply closure to the user-specified FD's and MVD's**:
(2) repeat until no more 4NF violations:
if R with AA ->> BB violates 4NF then:
(2a) decompose R into R1(AA,BB) and
R2(AA,CC), where CC is all
attributes in R except (AA BB)
(2b) assign FD's and MVD's to the new relations**

** MVD's: hard problem!
• No simple test analogous to computing the attribute closure for FD’s
exists for MVD’s. You are stuck to have to use the 5 inference rules
for MVD’s when computing the closure!
Exercise
• Decompose Stars into a set of relations
that are in 4NF.
• name÷÷street city is a 4NF
violation
• Apply decomposition:
R(name, street, city)
S(name, title, year)
• What about name÷÷street city in R
and name÷÷title year in S?
MVD (Cont.)
• Tabular representation of o ÷÷ |
X ->> Y is trivial if
(a) Y _ X or
(b) Y U X = R
Multivalued Dependencies
• There are database schemas in BCNF that do not
seem to be sufficiently normalized
• Consider a database
classes(course, teacher, book)
such that (c,t,b) e classes means that t is qualified
to teach c, and b is a required textbook for c
• The database is supposed to list for each course
the set of teachers any one of which can be the
course’s instructor, and the set of books, all of
which are required for the course (no matter who
teaches it).
• There are no non-trivial functional dependencies and
therefore the relation is in BCNF
• Insertion anomalies – i.e., if Sara is a new teacher that can
teach database, two tuples need to be inserted
(database, Sara, DB Concepts)
(database, Sara, Ullman)
course teacher book
database
database
database
database
database
database
operating systems
operating systems
operating systems
operating systems
Avi
Avi
Hank
Hank
Sudarshan
Sudarshan
Avi
Avi
Jim
Jim
DB Concepts
Ullman
DB Concepts
Ullman
DB Concepts
Ullman
OS Concepts
Shaw
OS Concepts
Shaw
classes
Multivalued Dependencies
• Therefore, it is better to decompose classes
into:
course teacher
database
database
database
operating systems
operating systems
Avi
Hank
Sudarshan
Avi
Jim
teaches
course book
database
database
operating systems
operating systems
DB Concepts
Ullman
OS Concepts
Shaw
text
We shall see that these two relations are in Fourth Normal
Form (4NF)
Multivalued Dependencies
Multivalued Dependencies
(MVDs)
• Let R be a relation schema and let o _ R
and | _ R. The multivalued dependency
o ÷÷ |
holds on R if in any legal relation r(R), for
all pairs for tuples t
1
and t
2
in r such that
t
1
[o] = t
2
[o], there exist tuples t
3
and t
4
in r
such that:
t
1
[o] = t
2
[o] = t
3
[o] = t
4
[o]
t
3
[|] = t
1
[|]
t
3
[R – |] = t
2
[R – |]
t
4
[|] = t
2
[|]
t
4
[R – |] = t
1
[R – |]

MVD (Cont.)
• Tabular representation of o ÷÷ |
4th Normal Form
No multi-valued dependencies
4th Normal Form
Note: 4th Normal Form violations occur
when a triple (or higher) concatenated key
represents a pair of double keys
4th Normal Form
4th Normal Form
Multuvalued dependencies
Instructor Book Class
Price Inro Comp MIS 2003
Parker Intro Comp MIS 2003
Kemp Data in Action MIS 4533
Kemp ORACLE Tricks MIS 4533
Warner Data in Action MIS 4533
Warner ORACLE Tricks MIS 4533
4th Normal Form
INSTR-BOOK-COURSE(InstrID, Book,
CourseID)


COURSE-BOOK(CourseID, Book)
COURSE-INSTR(CourseID, InstrID)
4NF
(No multivalued dependencies)
TABLE TABLE
TABLE
TABLE TABLE
TABLE
Independent repeating groups have been treated as a
complex relationship.
Example
• Let R be a relation schema with a set of attributes
that are partitioned into 3 nonempty subsets.
Y, Z, W
• We say that Y ÷÷ Z (Y multidetermines Z)
if and only if for all possible relations r(R)
< y
1
, z
1
, w
1
> e r and < y
2
, z
2
, w
2
> e r
then
< y
1
, z
1
, w
2
> e r and < y
2
, z
2
, w
1
> e r
• Note that since the behavior of Z and W are
identical it follows that Y ÷÷ Z if Y ÷÷ W

Theory of MVDs
• From the definition of multivalued dependency,
we can derive the following rule:
– If o ÷ |, then o ÷÷ |
That is, every functional dependency is also a
multivalued dependency
• The closure D
+
of D is the set of all functional and
multivalued dependencies logically implied by D.
– We can compute D
+
from D, using the formal
definitions of functional dependencies and multivalued
dependencies.
– We can manage with such reasoning for very simple
multivalued dependencies, which seem to be most
common in practice
– For complex dependencies, it is better to reason about
sets of dependencies using a system of inference
rules
Fourth Normal Form
• A relation schema R is in 4NF with respect
to a set D of functional and multivalued
dependencies if for all multivalued
dependencies in D
+
of the form o ÷÷ |,
where o _ R and | _ R, at least one of the
following hold:
– o ÷÷ | is trivial (i.e., | _ o or o | = R)
– o is a superkey for schema R
• If a relation is in 4NF it is in BCNF
Restriction of Multivalued
Dependencies
• The restriction of D to R
i
is the set D
i

consisting of
– All functional dependencies in D
+
that include
only attributes of R
i
– All multivalued dependencies of the form
o ÷÷ (| · R
i
)

where o _ R
i
and o ÷÷ | is in D
+

4NF Decomposition Algorithm
result: = {R};
done := false;
compute D
+
;
Let D
i
denote the restriction of D
+
to R
i
while (not done)
if (there is a schema R
i
in result that is not in 4NF) then
begin
let o ÷÷ | be a nontrivial multivalued dependency
that holds
on R
i
such that o ÷ R
i
is not in D
i
, and o·|=|;
result := (result - R
i
) (R
i
- |) (o, |);
end
else done:= true;
Note: each R
i
is in 4NF, and decomposition is lossless-join
Example
• R =(A, B, C, G, H, I)
F ={ A ÷÷ B
B ÷÷ HI
CG ÷÷ H }
• R is not in 4NF since A ÷÷ B and A is not a superkey for
R
• Decomposition
a) R
1
= (A, B) (R
1
is in 4NF)
b) R
2
= (A, C, G, H, I) (R
2
is not in 4NF)
c) R
3
= (C, G, H) (R
3
is in 4NF)
d) R
4
= (A, C, G, I) (R
4
is not in 4NF)
• Since A ÷÷ B and B ÷÷ HI, A ÷÷ HI, A ÷÷ I
e) R
5
= (A, I) (R
5
is in 4NF)
f)R
6
= (A, C, G) (R
6
is in 4NF)