Normalization

February 17, 2000

Hoh In

Texas A&M University

Needs for Normalization

• Repetition anomaly

– Certain information may be repeated unnecessarily

• Update anomaly

– All repeated data should be updated

• Insertion anomaly

– Cannot insert a certain set of attributes

• Delete anomaly

– Deleting some information cause to lose another information

Definitions

Attributes

ss# name id

11 Domain: a set of

12 potential values

– uniquely identify each tuple (e.g., ss#)

• Super Key: the superset of a key (e.g. {ss# name})

• Candidate Keys: more than one, potential keys

– (e.g., id or ss#)

• Primary Key: selected one in candidate keys

• Primary attributes: attributes that make up key

• Degree: # of attributes

• Cardinality: # of tuples

Functional Dependency (FD)

• Definition: X-> Y

– “X functionally determines Y”

– “Y is functionally dependent on X”

– for each value of X in R, there is only one associated Y

value

– where, A = {A1, A2, …., An), X ⊂ A, Y ⊂ A.

X X Y

Y

a 2 a 2

b 1 b 1

a 2 a 4

1 NF

• No duplicated tuples in a table

X Y Z X Y Z

1 3 a 1 3 a

3 1 b 3 1 b

2 2 c 2 2 c

1 3 a

3 NF

• For each FD X->Y,

– Rule 1: where Y is in X (Y ⊆ X) or

– Rule 2: where Y is not in X (Y ⊄ X),

• either X is a superkey of R or

• Y is a prime attribute (I.e., Each attribute in Y-X is contained in

a candidate key for R

• Examples

– EMP relation

• Is not 3 NF because of FD: TITLE -> SAL

– EMP(ENO, ENAME, TITLE, PNO, PESP, DUR)

• EMP (ENO, ENAME, TITLE)

• ASG (ENO, PNO, RESP, DUR)

Boyce-Codd Normal Form (BCNF)

• For each FD X->Y,

– Rule 1: where Y is in X (Y ⊆ X) or

– Rule 2:where Y is not in X (Y ⊄ X), for every DF X-> Y, X

has to be a superkey

• Example of BCNF

– Customer-schema = (c_name, c_street, c_city)

c_name -> c_street c_city

• Is BCNF because of Rule 1

– Loan-schema = (branch_name, c_name, loan_no, amount)

• Is not BCNF because Rule 1 or Rule 2 are not satisfied

• e.g., Loan_no -> amount branch_name

– Loan_no cannot be a superkey because wife and husband can

create the same Loan_no

Multivalued Dependency (MVD)

• Definition: X->-> Y

– “X multidetermines Y”

– “Y is multidependent on X”

– if X->Y, then X->-> Y

– in any legal relation r(R),

• for all pairs of tuples t1 and t2 in r such that t1[X] = t2[X],

• there exist tuples t3 and t4 such that

– t1[X] = t2[X] = t3[X] = t4[X]

– t1[Y] = t3[Y]

– t2[Y] = t4[Y] X Y Z = R-Y-Z

– t1[[R-Y] = t4[R-Y]

– t2[R-Y] = t3[R-Y]

t1: 1 a f

t2: 1 b g

t3: 1 a g

t4: 1 b f

4 NF

• For each MVD of the type X->->Y in R,

• X is a superkey for schema R

Projection-Join Dependency (PJD)

• Definition: * (X, Y,Z)

– R is equal to the join of X, Y, X (lossless-join decomposition)

– where, A = {A1, A2, …., An), X ⊂ A, Y ⊂ A, Z ⊂ A.

R

X Y Z X Y X Z

1 3 a 1 3 1 a

3 1 b 3 1 3 b

2 2 c 2 2 2 c

Counter-example of PJD

R

ΠXYR ΠXZR

X Y Z

X Y X Z

t1: 1 a f

t2: 1 b g 1 a 1 f

t3: 1 a g 1 b 1 g

t4: 1 b f

5 NF

• Every join dependency is implied by the candidate

keys of R (I.e., Every Ri is a superkey for R)

