NORMALIZATION

Functional Dependencies
Using FDs to determine global ICs:

Step 1: Given schema R = {A1, ..., An} use key constraints, laws of physics, trial-and-error, etc ... to determine an initial FD set, F. Step 2: Use FD elimination techniques to generate an alternative (but equivalent) FD set, F Step 3: Write assertions for each f in F . (for now)
Issues: (1) How do we guarantee that F= F? ans: closures (2) How do we find a minimal F = F? ans: minimal (canonical) cover algorithm
Example: suppose R = { A, B, C, D, E, H} and we determine that: F = { A BC, B CE, A E, AC H, D B} Then we determine the minimal cover of F: Fc = { A BH, B CE, D B} ensuring that F and Fc are equivalent Note: F requires 5 assertions Fc requires 3 assertions
Equivalence of FD sets: FD sets F and G are equivalent if the imply the same set of FDs
e.g. A B and B C : implies A C equivalence usually expressed in terms of closures Closures: For any FD set, F, F+ is the set of all FDs implied by F. can calculate in 2 ways: (1) Attribute Closure (2) Armstrongs axioms Both techniques tedious-- will do only for toy examples
F equivalent to G iff F+ = G+
Attribute Closures
Given; R = { A, B, C, D, E, H} and: F = { A BC, B CE, A E, AC H, D B} What is the closure of CD (CD+) ? Algorithm att-closure (X: set of Attributes) Result X repeat until stable for each FD in F, Y Z, do if Y Result then Result Result Z Attribute closure CD Iteration Result ----------------------------------0 CD 1 CDB 2 CDBE
Attribute Closures
Q: what is ACD+ ? Ans: ACD+ R
Q: How do you determine if ACD is a superkey? Ans: it is if ACD+ R
Q: How can you determine if ACD is a candidate key? Ans: It is if: ACD+ R AC+ -/-> R AD+ -/-> R not true => AD is a candidate key CD+ -/-> R
Attribute Closures to determine FD closures

Given; R = { A, B, C, D, E, H} and: F = { A BC, B CE, A E, AC H, D B}
F+ = { A A+, B B+, C C+, D D+, E E+, H H+, AB AB+, AC AC+, AD AD+, ......}
To decide if F,G are equivalent: (1) Compute F+ (2) Compute G+ (3) Is (1) = (2) ? Expensive: F+ has 63 rules (in general: 2|R|-1 rules)
FD Closures Using Armstrongs Axioms

A. Fundamental Rules (W, X, Y, Z: sets of attributes)
1. Reflexivity If Y
X then X Y WX WY
2. Augmentation If X Y then 3. Transitivity If X Y and Y Z then XZ B. Additional rules (can be proved from A) 4. UNION: If X Y and X Z then X YZ 5. Decomposition: If X YZ then X Y, X Z 6. Pseudotransitivity: If X Y and WY Z then WX Z
FD Closures Using Armstrongs Axioms

Given; F = { A BC, B CE, A E, AC H, D B} (1) (2) (3) (4) (5)
Exhaustively apply Armstrongs axioms to generate F+
F+ = F
1. 2. 3. 4. 5. { A B, A C }: decomposition on (1) { A CE }: transitivity to 1.1 and (2) { B C, B E }: decomp to (2) { A C, A E } decomp to 2 { A H } pseudotransitivity to 1.2 and (4)
Functional dependencies
Our goal: given a set of FD set, F, find an alternative FD set, G that is: smaller equivalent Bad news: Testing F=G (F+ = G+) is computationally expensive
Good news: Minimal Cover (or Canonical Cover) algorithm: given a set of FD, F, finds minimal FD set equivalent to F Minimal: cant find another equivalent FD set w/ fewer FDs
Minimal Cover Algorithm

Given: F = { A BC, B CE, A E, AC H, D B} Determines minimal cover of F: Fc = { A BH, B CE, D B} Another example: F = { A BC, B C, A B, AB C, AC D} Fc = F No G that is equivalent to F and is smaller than Fc
MC Algorithm
Fc = { A BD, B C}

Basic Algorithm ALGORITHM MinimalCover (X: FD set) BEGIN REPEAT UNTIL STABLE (1) Where possible, apply UNION rule (As axioms) (e.g., A BC, ACD becomes ABCD) (2) remove extraneous attributes from each FD (e.g., ABC, AB becomes AB, BC i.e., A is extraneous in ABC)
Extraneous Attributes
(1) Extraneous is RHS? e.g.: can we replace A BC with AC? (i.e. Is B extraneous in A BC?)
(2)
Extraneous in LHS ? e.g.: can we replace AB C with A C ? (i.e. Is B extraneous in ABC?)
Simple but expensive test: 1. Replace A BC (or AB C) with A C in F F2 = F - {A BC} U {A C} or F - {ABC} U {A C} 2. Test if F2+ = F+ ? if yes, then B extraneous
A. RHS: Is B extraneous in A BC? step 1: F2 = F - {A BC} U {A C} step 2: F+ = F2+ ?
To simplify step 2, observe that F2+
F+
i.e., not new FDs in F2+) Why? Have effectively removed AB from F
When is F+ = F2+ ? Ans. When (AB) in F2+
Idea: if F2+ includes: AB and AC, then it includes ABC
B. LHS: Is B extraneous in A BC ? step 1: F2 = F - {AB C} U {A C} step 2: F+ = F2+ ?
To simplify step 2, observe that F+
F2+
i.e., there may be new FDs in F2+) Why? AC implies ABC. therefore all FDs in F+ also in F2+. But ABC does not imply AC When is F+ = F2+ ? When (AC) in F+
Ans.
Idea: if F+ includes: all the FDs of F+.
AC then it will include
Extraneous attributes
A. RHS : Given F = {ABC, BC} is C extraneous in A BC? why or why not?
Ans: yes, because AC in { AB, BC}+ Proof. 1. A B 2. B C 3. AC
transitivity using Armstrongs axioms
Extraneous attributes
B. LHS : Given F = {AB, ABC} is B extraneous in AB C? why or why not?
Ans: yes, because AC in F+ Proof. 1. A B 2. AB C 3. AC
using pseudotransitivity on 1 and 2
Actually, we have
AAC but {A, A} = {A}

ALGORITHM MinimalCover (F: set of FDs) BEGIN REPEAT UNTIL STABLE (1) Where possible, apply UNION rule (As axioms) (2) Remove all extraneous attributes: a. Test if B extraneous in A BC (B extraneous if (AB) in (F - {ABC} U {AC})+ ) b. Test if B extraneous in ABC (B extraneous in ABC if (AC) in F+)

Example: determine the minimal cover of F = {A BC, BCE, AE}
Iteration 1: a. F = { ABCE, BCE} b. Must check for up to 5 extraneous attributes - B extraneous in ABCE? No - C extraneous in A BCE? yes: (AC) in { ABE, BCE} 1. ABE -> 2. AB -> 3. ACE -> 4. A C - E extraneous in ABE?

Example: determine the minimal cover of F = {A BC, BCE, AE} Iteration 1: a. F = { ABCE, BCE} b. Must check for up to 5 extraneous attributes - B extraneous in ABCE? - C extraneous in A BCE? - E extraneous in ABE? 1. AB -> 2. ACE -> AE - E extraneous in BCE - C extraneous in BCE No Yes
No No
Iteration 2: a. F = { A B, B CE} b. Extraneous attributes: - C extraneous in B CE - E extraneous in B CE

DONE
No No

Find the minimal cover of F = { A BC, B CE, A E, AC H, D B} Ans: Fc= { ABH, B CE, DB}

Find two different minimal covers of: F = { ABC, B CA, CAB}
Ans: Fc1 = { A B, BC, CA} and Fc2 = { AC, BA, CB}
FD so far...
1. Minimal Cover algorithm result (Fc) guaranteed to be the minimal FD set equivalent to F
2. Closure Algorithms a. Armstrongs Axioms: more common use: test for extraneous attributes in C.C. algorithm b. Attribute closure: more common use: test for superkeys
3. Purposes a. minimize the cost of global integrity constraints so far: min gics = |Fc|
In fact.... Min gics = 0 (FDs for normalization)
Another use of FDs: Schema Design

Example:
bname Downtown Downtown Mianus Downtown bcity Bkln Bkln Horse Bkln assets 9M 9M 1.7M 9M cname Jones Johnson Jones Hayes lno L-17 L-23 L-93 L-17 amt 1000 2000 500 1000
R=
R: Universal relation tuple meaning: Jones has a loan (L-17) for $1000 taken out at the Downtown branch in Bkln which has assets of $9M Design: +: -:
fast queries (no need for joins!) redudancy: update anomalies examples? deletion anomalies
Decomposition
1. Decomposing the schema R = ( bname, bcity, assets, cname, lno, amt) R = R1 U R2
R1 = (bname, bcity, assets, cname) 2. Decomposing the instance

bname Downtown Downtown Mianus Downtown
R1 = (cname, lno, amt)

bcity Bkln Bkln Horse Bkln assets 9M 9M 1.7M 9M cname Jones Johnson Jones Hayes lno L-17 L-23 L-93 L-17 amt 1000 2000 500 1000
bname Downtown Downtown Mianus Downtown
bcity Bkln Bkln Horse Bkln
assets 9M 9M 1.7M 9M
cname Jones Johnson Jones Hayes
cname Jones Johnson Jones Hayes
lno L-17 L-23 L-93 L-17
amt 1000 2000 500 1000
Goals of Decomposition
1. Lossless Joins Want to be able to reconstruct big (e.g. universal) relation by joining smaller ones (using natural joins) (i.e. R1 R2 = R) 2. Dependency preservation Want to minimize the cost of global integrity constraints based on FDs ( i.e. avoid big joins in assertions) 3. Redundancy Avoidance Avoid unnecessary data duplication (the motivation for decomposition) Why important? LJ : information loss DP: efficiency (time) RA: efficiency (space), update anomalies
Dependency Goal #1: lossless joins

A bad decomposition:
bname Downtown Downtown Mianus Downtown bcity Bkln Bkln Horse Bkln assets 9M 9M 1.7M 9M cname Jones Johnson Jones Hayes
cname Jones Johnson Jones Hayes lno L-17 L-23 L-93 L-17 amt 1000 2000 500 1000
=
bname Downtown Downtown Downtown Mianus Mianus Downtown bcity Bkln Bkln Bkln Horse Horse Bkln assets 9M 9M 9M 1.7M 1.7M 9M cname Jones Jones Johnson Jones Jones Hayes lno L-17 L-93 L-23 L-17 L-93 L-17 amt 1000 500 2000 1000 500 1000
Problem: join adds meaningless tuples lossy join: by adding noise, have lost meaningful information as a result of the decomposition
Dependency Goal #1: lossless joins

Is the following decomposition lossless or lossy?
bname Downtown Downtown Mianus Downtown assets 9M 9M 1.7M 9M cname Jones Johnson Jones Hayes lno L-17 L-23 L-93 L-17
lno L-17 L-23 L-93 bcity Bkln Bkln Horse amt 1000 2000 500
Ans: Lossless:
R = R1
R2, it has 4 tuples
Ensuring Lossless Joins

A decomposition of R : R = R1 U R2 is lossless iff R1 R2 R1, or R1 R2 R2 (i.e., intersecting attributes must be a superkey for one of the resulting smaller relations)
Decomposition Goal #2: Dependency preservation

Goal: efficient integrity checks of FDs An example w/ no DP: R = ( bname, bcity, assets, cname, lno, amt) bname bcity assets lno amt bname Decomposition: R = R1 U R2 R1 = (bname, assets, cname, lno) R2 = (lno, bcity, amt) Lossless but not DP. Why? Ans: bname bcity assets crosses 2 tables

To ensure best possible efficiency of FD checks
ensure that only a SINGLE table is needed in order to check each FD i.e. ensure that: A1 A2 ... An B1 B2 ... Bm Can be checked by examining Ri = ( ..., A1, A2, ..., An, ..., B1, ..., Bm, ...) To test if the decomposition R = R1 U R2 U ... U Rn is DP
(1) see which FDs of R are covered by R1, R2, ..., Rn (2) compare the closure of (1) with the closure of FDs of R

Example: Given F = { AB, AB D, C D}
consider R = R1 U R2 s.t. R1 = (A, B, C) , R2 = (C, D) (1) F+ = { ABD, CD}+ (2) G+ = {AB, CD} +
is it DP?
(3) F+ = G+ ? No because (AD) not in G+
Decomposition is not DP

Example: Given F = { AB, AB D, C D} consider R = R1 U R2 s.t. R1 = (A, B, D) , R2 = (C, D) (1) F+ = { ABD, CD}+ (2) G+ = {ABD, CD, ...} + (3) F+ = G+ note: G+ cannot introduce new FDs not in F+
Decomposition is DP
Decomposition Goal #3: Redudancy Avoidance

Example:
A a e g h m n p B x x y y y z z C 1 1 2 2 2 1 1
Redundancy for B=x , y and z
(1) An FD that exists in the above relation is:
BC
(2) A superkey in the above relation is A, (or any set containing A) When do you have redundancy? Ans: when there is some FD, XY covered by a relation and X is not a superkey
Normalization
Decomposition techniques for ensuring:
Lossless joins
Dependency preservation Redundancy avoidance We will look at some normal forms: Boyce-Codd Normal Form (BCNF)
3rd Normal Form (3NF)

NORMALIZATION

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

NORMALIZATION

Uploaded by

Copyright:

Available Formats

Functional Dependencies

Using FDs to determine global ICs:

Q: How do you determine if ACD is a superkey? Ans: it is if ACD+ R

Attribute Closures to determine FD closures

FD Closures Using Armstrongs Axioms

FD Closures Using Armstrongs Axioms

Exhaustively apply Armstrongs axioms to generate F+

Minimal Cover Algorithm

Minimal Cover Algorithm

Extraneous in LHS ? e.g.: can we replace AB C with A C ? (i.e. Is B extraneous in ABC?)

When is F+ = F2+ ? Ans. When (AB) in F2+

Idea: if F2+ includes: AB and AC, then it includes ABC

Idea: if F+ includes: all the FDs of F+.

AC then it will include

Ans: yes, because AC in { AB, BC}+ Proof. 1. A B 2. B C 3. AC

transitivity using Armstrongs axioms

Ans: yes, because AC in F+ Proof. 1. A B 2. AB C 3. AC

using pseudotransitivity on 1 and 2

AAC but {A, A} = {A}

Minimal Cover Algorithm

Minimal Cover Algorithm

Minimal Cover Algorithm

Iteration 2: a. F = { A B, B CE} b. Extraneous attributes: - C extraneous in B CE - E extraneous in B CE

Minimal Cover Algorithm

Minimal Cover Algorithm

Ans: Fc1 = { A B, BC, CA} and Fc2 = { AC, BA, CB}

Another use of FDs: Schema Design

R1 = (bname, bcity, assets, cname) 2. Decomposing the instance

R1 = (cname, lno, amt)

bname Downtown Downtown Mianus Downtown

bcity Bkln Bkln Horse Bkln

cname Jones Johnson Jones Hayes

cname Jones Johnson Jones Hayes

lno L-17 L-23 L-93 L-17

amt 1000 2000 500 1000

Dependency Goal #1: lossless joins

Dependency Goal #1: lossless joins

R2, it has 4 tuples

Ensuring Lossless Joins

Decomposition Goal #2: Dependency preservation

Decomposition Goal #2: Dependency preservation

Decomposition Goal #2: Dependency preservation

(3) F+ = G+ ? No because (AD) not in G+

Decomposition Goal #2: Dependency preservation

Decomposition Goal #3: Redudancy Avoidance

Redundancy for B=x , y and z

(1) An FD that exists in the above relation is:

3rd Normal Form (3NF)

You might also like