You are on page 1of 13

# Lecture #7-2

Normalization

## CPSC 608: Distributed Database System

February 17, 2000

Hoh In
Texas A&M University
Needs for Normalization
• Repetition anomaly
– Certain information may be repeated unnecessarily
• Update anomaly
– All repeated data should be updated
• Insertion anomaly
– Cannot insert a certain set of attributes
• Delete anomaly
– Deleting some information cause to lose another information
Definitions
Attributes
ss# name id

11 Domain: a set of
12 potential values

## • Key: minimum nonempty subset of its attributes

– uniquely identify each tuple (e.g., ss#)
• Super Key: the superset of a key (e.g. {ss# name})
• Candidate Keys: more than one, potential keys
– (e.g., id or ss#)
• Primary Key: selected one in candidate keys
• Primary attributes: attributes that make up key
• Degree: # of attributes
• Cardinality: # of tuples
Functional Dependency (FD)
• Definition: X-> Y
– “X functionally determines Y”
– “Y is functionally dependent on X”
– for each value of X in R, there is only one associated Y
value
– where, A = {A1, A2, …., An), X ⊂ A, Y ⊂ A.

X X Y
Y

a 2 a 2
b 1 b 1
a 2 a 4

## X->Y Not (X->Y)

1 NF
• No duplicated tuples in a table

X Y Z X Y Z

1 3 a 1 3 a
3 1 b 3 1 b
2 2 c 2 2 c
1 3 a

## Not “1NF” 1NF

3 NF
• For each FD X->Y,
– Rule 1: where Y is in X (Y ⊆ X) or
– Rule 2: where Y is not in X (Y ⊄ X),
• either X is a superkey of R or
• Y is a prime attribute (I.e., Each attribute in Y-X is contained in
a candidate key for R
• Examples
– EMP relation
• Is not 3 NF because of FD: TITLE -> SAL
– EMP(ENO, ENAME, TITLE, PNO, PESP, DUR)
• EMP (ENO, ENAME, TITLE)
• ASG (ENO, PNO, RESP, DUR)
Boyce-Codd Normal Form (BCNF)
• For each FD X->Y,
– Rule 1: where Y is in X (Y ⊆ X) or
– Rule 2:where Y is not in X (Y ⊄ X), for every DF X-> Y, X
has to be a superkey

• Example of BCNF
– Customer-schema = (c_name, c_street, c_city)
c_name -> c_street c_city
• Is BCNF because of Rule 1
– Loan-schema = (branch_name, c_name, loan_no, amount)
• Is not BCNF because Rule 1 or Rule 2 are not satisfied
• e.g., Loan_no -> amount branch_name
– Loan_no cannot be a superkey because wife and husband can
create the same Loan_no
Multivalued Dependency (MVD)
• Definition: X->-> Y
– “X multidetermines Y”
– “Y is multidependent on X”
– if X->Y, then X->-> Y
– in any legal relation r(R),
• for all pairs of tuples t1 and t2 in r such that t1[X] = t2[X],
• there exist tuples t3 and t4 such that
– t1[X] = t2[X] = t3[X] = t4[X]
– t1[Y] = t3[Y]
– t2[Y] = t4[Y] X Y Z = R-Y-Z
– t1[[R-Y] = t4[R-Y]
– t2[R-Y] = t3[R-Y]
t1: 1 a f
t2: 1 b g
t3: 1 a g
t4: 1 b f
4 NF
• For each MVD of the type X->->Y in R,
• X is a superkey for schema R
Projection-Join Dependency (PJD)
• Definition: * (X, Y,Z)
– R is equal to the join of X, Y, X (lossless-join decomposition)
– where, A = {A1, A2, …., An), X ⊂ A, Y ⊂ A, Z ⊂ A.

R
X Y Z X Y X Z

1 3 a 1 3 1 a
3 1 b 3 1 3 b
2 2 c 2 2 2 c
Counter-example of PJD

R
ΠXYR ΠXZR
X Y Z
X Y X Z
t1: 1 a f
t2: 1 b g 1 a 1 f
t3: 1 a g 1 b 1 g
t4: 1 b f
5 NF
• Every join dependency is implied by the candidate
keys of R (I.e., Every Ri is a superkey for R)