Lecture #7-2 RDB Basic Concept: Normalization

CPSC 608: Distributed Database System
February 17, 2000

Hoh In Texas A&M University

Needs for Normalization
• Repetition anomaly
– Certain information may be repeated unnecessarily

• Update anomaly
– All repeated data should be updated

• Insertion anomaly
– Cannot insert a certain set of attributes

• Delete anomaly
– Deleting some information cause to lose another information

Definitions
Attributes
ss# name id 11 12

Domain: a set of potential values

• Key: minimum nonempty subset of its attributes
– uniquely identify each tuple (e.g., ss#)

• Super Key: the superset of a key (e.g. {ss# name}) • Candidate Keys: more than one, potential keys
– (e.g., id or ss#)

• • • •

Primary Key: selected one in candidate keys Primary attributes: attributes that make up key Degree: # of attributes Cardinality: # of tuples

Functional Dependency (FD)
• Definition: X-> Y
– “X functionally determines Y” – “Y is functionally dependent on X” – for each value of X in R, there is only one associated Y value – where, A = {A1, A2, …., An), X ⊂ A, Y ⊂ A.

X a b a X->Y

Y 2 1 2

X a b a

Y 2 1 4 Not (X->Y)

1 NF
• No duplicated tuples in a table X 1 3 2 1 Y 3 1 2 3 Z a b c a X 1 3 2 Y 3 1 2 1NF Z a b c

Not “1NF”

3 NF
• For each FD X->Y,
– Rule 1: where Y is in X (Y ⊆ X) or – Rule 2: where Y is not in X (Y ⊄ X),
• either X is a superkey of R or • Y is a prime attribute (I.e., Each attribute in Y-X is contained in a candidate key for R

• Examples
– EMP relation
• Is not 3 NF because of FD: TITLE -> SAL

– EMP(ENO, ENAME, TITLE, PNO, PESP, DUR)
• EMP (ENO, ENAME, TITLE) • ASG (ENO, PNO, RESP, DUR)

Boyce-Codd Normal Form (BCNF)
• For each FD X->Y,
– Rule 1: where Y is in X (Y ⊆ X) or – Rule 2:where Y is not in X (Y ⊄ X), for every DF X-> Y, X has to be a superkey

• Example of BCNF
– Customer-schema = (c_name, c_street, c_city)
c_name -> c_street c_city • Is BCNF because of Rule 1

– Loan-schema = (branch_name, c_name, loan_no, amount)
• Is not BCNF because Rule 1 or Rule 2 are not satisfied • e.g., Loan_no -> amount branch_name
– Loan_no cannot be a superkey because wife and husband can create the same Loan_no

Multivalued Dependency (MVD)
• Definition: X->-> Y
– – – – “X multidetermines Y” “Y is multidependent on X” if X->Y, then X->-> Y in any legal relation r(R),
• for all pairs of tuples t1 and t2 in r such that t1[X] = t2[X], • there exist tuples t3 and t4 such that
– – – – – t1[X] = t2[X] = t3[X] = t4[X] t1[Y] = t3[Y] t2[Y] = t4[Y] t1[[R-Y] = t4[R-Y] t2[R-Y] = t3[R-Y]

X t1: t2: t3: t4: 1 1 1 1

Y a b a b

Z = R-Y-Z f g g f

4 NF
• For each MVD of the type X->->Y in R, • X is a superkey for schema R

Projection-Join Dependency (PJD)
• Definition: * (X, Y,Z)
– R is equal to the join of X, Y, X (lossless-join decomposition) – where, A = {A1, A2, …., An), X ⊂ A, Y ⊂ A, Z ⊂ A.

R
X 1 3 2 Y 3 1 2 Z a b c X Y 1 3 2 3 1 2 X Z 1 3 2 a b c

Counter-example of PJD
R X t1: t2: t3: t4: 1 1 1 1 Y a b a b Z X Y f g g f 1 1 a b X Z 1 1 f g ΠXYR ΠXZR

5 NF
• Every join dependency is implied by the candidate keys of R (I.e., Every Ri is a superkey for R) • (PNO,PNAME) |X| (PNO, BUDGET)

Sign up to vote on this title
UsefulNot useful

Master Your Semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master Your Semester with a Special Offer from Scribd & The New York Times

Cancel anytime.