0% found this document useful (0 votes)
58 views175 pages

Database Design: Theory & Normalization

Uploaded by

Kamalesh Pantra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views175 pages

Database Design: Theory & Normalization

Uploaded by

Kamalesh Pantra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd

Chapter - 6

DATABASE DESIGN
THEORY AND
NORMALIZATION
Outline
 Informal Design Guidelines for Relational Databases
 Semantics of the Relation Attributes
 Redundant Information in Tuples and Update Anomalies
 Null Values in Tuples
 Spurious Tuples
 Functional Dependencies (FDs)
 Definition of FD
 Inference Rules for FDs
 Equivalence of Sets of FDs
 Minimal Sets of FDs

2
Outline

 Normal Forms Based on Primary Keys


 Normalization of Relations
 Practical Use of Normal Forms
 Definitions of Keys and Attributes Participating in Keys
 First Normal Form
 Second Normal Form
 Third Normal Form
 General Normal Form Definitions (For Multiple Keys)
 BCNF (Boyce-Codd Normal Form)
 Multivalued Dependency and Fourth Normal Form
 Join Dependencies and Fifth Normal Form

3
Review : Database Design
 Requirements Analysis
 user needs; what must database do?
 Conceptual Design
 high level description (often done w/ER model)
 Logical Design
 translate ER into DBMS data model
 Schema Refinement
 consistency, normalization

 Physical Design - indexes, disk layout


 Security Design - who accesses what
4
Design Steps
 Step (3) to step (4) is based on a “design theory” for
relations and is called “normalization”

 It is important for two reasons


 Automatic mappings from ER to relations may not produce the
best relational design possible
 Database designers may go directly from (1) to (3), in which
case, the relational design can be really bad

5
Relational Database Design
 Relational database design ultimately produces a set of relations
 The implicit goals of the design activity are
 Information preservation

 Maintaining all concepts, including attribute types, entity

types, and relationship types as well as


generalization/specialization relationships, which are
described using a model such as the EER model
 Minimum redundancy

 Minimizing redundant storage of the same information and

reducing the need for multiple updates to maintain


consistency across multiple copies of the same information in
response to real-world events that require making an update

6
Informal Design Guidelines
for Relational Databases
 Four informal guidelines that may be used as measures
to determine the quality of relation schema design
 Making sure that the semantics of the attributes is clear in the
schema
 Reducing the redundant information in tuples
 Reducing the NULL values in tuples
 Disallowing the possibility of generating spurious tuples

7
Semantics of the Relation
Attributes
 GUIDELINE 1 :
 Informally, each tuple in a relation should represent one entity or

relationship instance (Applies to individual relations and their


attributes)
 Attributes of different entities (EMPLOYEEs, DEPARTMENTs,

PROJECTs) should not be mixed in the same relation


 If the relation corresponds to a mixture of multiple entities and
relationships, semantic ambiguities will result and the relation cannot
be easily explained
 Only foreign keys should be used to refer to other entities
 Entity and relationship attributes should be kept apart as much as

possible
 Bottom Line: Design a schema that can be explained easily relation
by relation. The semantics of attributes should be easy to interpret
8
Informal Design Guidelines for
Relational Databases

Figure : A simplified COMPANY relational database schema

9
Examples of Violating
 Guideline
Violation of Guideline1 1distinct real-world
by mixing attributes from
entities
 EMP_DEPT mixes attributes of employees and departments, and
EMP_PROJ mixes attributes of employees and projects and the
WORKS_ON relationship
 Hence, they are against
the above measure
of design quality

10
Sample Database State

11
Redundant Information in
Tuples and Update Anomalies
 One goal of schema design is to minimize the storage space used
by the base relations
 Grouping attributes into relation schemas has a significant effect on
storage space
 Storing natural joins of base relations leads to an additional problem
referred to as update anomalies
 Information is stored redundantly
 Wastes storage

 Causes problems with update anomalies

 Insertion anomalies

 Deletion anomalies

 Modification anomalies

12
Example of an Modification
Anomaly
 Consider the relation
EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours)

 Update Anomaly
Changing the name of project number P1 from “Billing” to
“Customer-Accounting” may cause this update to be made for all
100 employees working on project P1

13
Example of an Insert Anomaly

 Consider the relation


EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours)

 Insert Anomaly
Cannot insert a project unless an employee is assigned to it
 Conversely
Cannot insert an employee unless an he/she is assigned to a
project

14
Example of an Delete Anomaly

 Consider the relation


EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours)

 Delete Anomaly
When a project is deleted, it will result in deleting all the employees
who work on that project
Alternately, if an employee is the sole employee on a project,

deleting that employee would result in deleting the corresponding


project

15
Update Anomalies
 Example 2
 Insertion anomaly: How to add a new major?
 Modification anomaly: What would happen if we change office of
Smith in the first tuple?
 Deletion anomaly: What would happen if Scott is deleted?

Student_Advisor
SID Name Major GPA Advisor Office
2011 John CS 3.4 Smith 3345
1235 Carl CS 3.2 Smith 3345
1003 Ken Math 3.5 Johnson 1120
1034 Bill Math 2.5 Johnson 1120
2005 Mary CS 2.9 Smith 3345
2078 Frank Math 4.0 Johnson 1120
1922 Scott Chem 3.45 Ford 2525

16
Guideline to Redundant
Information in Tuples and

Update
GUIDELINE 2
Anomalies
 Design a schema that does not suffer from the insertion, deletion
and update anomalies
 If there are any anomalies present, then note them so that
applications can be made to take them into account

17
Null Values in Tuples
 Fat Relations: A relation in which too many attributes are grouped.
 If many of the attributes do not apply to all tuples in the relation, we
end up with many NULLs in those tuples
 This can waste space at the storage level and may also lead to problems
with understanding the meaning of the attributes and with specifying
JOIN operations at the logical level
 Another problem with NULLs - How to account for them when
aggregate operations such as COUNT or SUM are applied

 Reasons for nulls


 Attribute not applicable or invalid
 Attribute value unknown (may exist)
 Value known to exist, but unavailable
18
Null Values in Tuples

 GUIDELINE 3
 Relations should be designed such that their tuples will have as
few NULL values as possible
 Attributes that are NULL frequently could be placed in separate
relations (with the primary key)

19
Spurious Tuples
 Bad designs for a relational database may result in erroneous results for
certain join operations
 Additional tuples that were not in the original relation are called spurious
tuples because they represent spurious or wrong information that is not
valid
 spurious tuples created when two tables are joined on attributes that are
neither primary keys nor foreign keys

 GUIDELINE 4
 Design relation schemas so that they can be joined with equality conditions on
attributes that are appropriately related (primary key, foreign key) pairs in a
way that guarantees that no spurious tuples are generated
 Avoid relations that contain matching attributes that are not (foreign key,
primary key) combinations because joining on such attributes may produce
spurious tuples

20
Spurious Tuples – Example 1

21
Spurious Tuples – Example 2

22
The Evils of Redundancy
 Redundancy is at the root of several problems associated with
relational schemas
 redundant storage, update anomalies

 Integrity constraints, in particular functional dependencies, can be


used to identify schemas with such problems and to suggest
refinements

 Main refinement technique: decomposition


 Decomposition should be used judiciously:
 Is there a reason to decompose a relation?
 What problems does decomposition cause?

23
Functional Dependencies
 A formal tool for analysis of relational schemas that enables us to
detect and describe some of the above-mentioned problems in
precise terms
 Most important concept in relational schema design theory is that of
a functional dependency

 Functional dependencies (FDs) are used to specify formal


measures of the "goodness" of relational designs
 FDs and keys are used to define normal forms for relations

 FDs are constraints that are derived from the meaning and
interrelationships of the data attributes

24
Functional Dependencies
 Relational Schema R = {A1, A2, ... , An}
 Definition
 A functional dependency(FD or f.d), denoted by X → Y, between two
sets of attributes X and Y that are subsets of R specifies a constraint on
the possible tuples that can form a relation state r of R
 The constraint is that, for any two tuples t1 and t2 in r that have t1[X] =
t2[X], they must also have t1[Y] = t2[Y]
 X  Y in R specifies a constraint on all relation instances r(R)
 Example : Consider r(A,B ) with the following instance of r

 On this instance, A -> B does NOT hold, but B -> A does hold
25
Functional Dependencies
 FD means that the values of the Y component of a tuple in r depend
on, or are determined by, the values of the X component
 Alternatively, the values of the X component (left-hand side) of a
tuple uniquely (or functionally) determine the values of the Y
component (left-hand side)
 We also say that there is a functional dependency from X to Y, or
that Y is functionally dependent on X

 In X->Y,
 X – Determinant Set

 Y – Dependent attribute

 FDs are derived from the real-world constraints on the attributes


26
Functional Dependencies
 For any relation R, attribute Y is functionally dependent on
attribute X (usually the PK), if for every valid instance of X, that
value of X uniquely determines the value of Y (X->Y)
 Examples
 SSN -> ENAME
 PNUMBER -> {PNAME, PLOCATION}
 {SSN, PNUMBER} -> HOURS

 The constraint must hold on every relation instance r(R)


 If K is a key of R, then K functionally determines all attributes in R
(since we never have two distinct tuples with t1[K]=t2[K])
 If X→Y in R, this does not say whether or not Y→X in R

27
Functional Dependencies
 Consider the following table of data r(R) of the relation schema R(A,
B, C,D, E)

 What kind of dependencies can we observe among the attributes in


Table R?

28
Functional Dependencies

 Since the values of A are unique (a1, a2, a3, etc.), it follows from the
FD definition that: A → B, A → C, A → D, A → E
 It also follows that A →BC (or any other subset of ABCDE)
 This can be summarized as A →BCDE
 From our understanding of primary keys, A is a primary key
 Since the values of E are always the same (all e1), it follows that:
A → E, B → E, C → E, D → E

29
Functional Dependencies

 Other observations
 Combinations of BC are unique, therefore BC → ADE
 Combinations of BD are unique, therefore BD → ACE
 If C values match, so do D values.
 Therefore, C → D

 However, D values don’t determine C values

 So C does not determine D, and D does not determine C

30
Functional Dependencies
 Consider a relation R (A, B, C, D) with its extension

 FDs of R
 B → C; C → B; {A, B} → C; {A, B} → D; and {C, D} → B

 FDs do not hold


 A → B (tuples 1 and 2 violate this constraint)
 B → A (tuples 2 and 3 violate this constraint)
 D→C (tuples 3 and 4 violate it)
31
Functional Dependencies
 Consider a relation R (A, B, C, D) with the following instance. Which
of the following functional dependencies are satisfied by this
relation?
 (a) A → B
 (b) A → CD
 (c) AB → CD
 (d) C → D
 (e) B → A
 (f) BD → AC
 (g) AD → BC
 (h) D → B
 (i) D → C
 (j) C → A X->Y, X,Y  R
t1[X] =t2[X] => t1[Y] =t2[Y]
32
Functional Dependencies
 Consider a relation R (A, B, C, D) with the following instance. Which
of the following functional dependencies are satisfied by this
relation?
 (a) A → B
 (b) A → CD
 (c) AB → CD
 (d) C → D
 (e) B → A
 (f) BD → AC
 (g) AD → BC
 (h) D → B
 (i) D → C
 (j) C → A

33
Inference Rules for FDs
 Given a set of FDs F, we can infer additional FDs that hold whenever
the FDs in F hold
 Armstrong's inference rules
 Armstrong’s axioms are a set of inference rules used to infer all
the functional dependencies on a relational database
 They were developed by William W. Armstrong

 IR1. (Reflexive) If X ⊇ Y, then X -> Y


 IR2. (Augmentation) If X -> Y, then XZ -> YZ
 (Notation: XZ stands for X U Z)
 IR3. (Transitive) If X -> Y and Y -> Z, then X -> Z

 IR1, IR2, IR3 form a sound and complete set of inference rules
 These are rules hold and all other rules that hold can be deduced from
these

34
Inference Rules for FDs
 Sound
 Given a set of FDs specified on a relation schema R, any
dependency that we can infer from F (set of functional
dependencies that are specified on relation schema R) by using
IR1 through IR3 holds in every relation state r of R that satisfies the
dependencies in F
 ie., All dependencies generated by the Axioms are correct

 Complete
 Using IR1 through IR3 repeatedly to infer dependencies until no
more dependencies can be inferred results in the complete set of
all possible dependencies that can be inferred from F
 ie., Repeatedly applying these rules can generate all correct
dependency
35
Inference Rules for FDs
 Some additional inference rules that are useful

 IR4. (Decomposition or Projective) If X -> YZ, then X -> Y and


X -> Z
 IR5. (Union or additive) If X -> Y and X -> Z, then X -> YZ
 IR6. (Psuedotransitivity) If X -> Y and WY -> Z, then WX -> Z

 Note: The last three inference rules (IR4, IR5, IR6) as well as any
other inference rules, can be deduced from IR1, IR2, and IR3
(completeness property)

36
Inference Rules for FDs
 IR1. (Reflexive) If X ⊇ Y, then X -> Y
 If X is a set of attributes and Y is the subset of X, then X functionally
determines Y
 The reflexive rule (IR1) states that a set of attributes always determines itself or
any of its subsets, which is obvious. Because IR1 generates dependencies that
are always true, such dependencies are called trivial

 Formally, a functional dependency X→Y is trivial if X ⊇ Y; otherwise, it is


nontrivial
 Note : The reflexive rule can also be stated as X → X; that is, any set of
attributes functionally determines itself

 Example
 Lastname ⊆ Firstname, Lastname then
Firstname, Lastname → Lastname

37
Proof of IR1
 Reflexive : If X ⊇ Y, then X -> Y

 Suppose that X ⊇ Y and that two tuples t1 and t2 exist in some


relation instance r of R such that t1 [X] = t2 [X]
 Then t1[Y] = t2[Y] because X ⊇ Y; hence, X→Y must hold in r

 Example
 Lastname ⊆ Firstname, Lastname then
Firstname, Lastname → Lastname

38
Inference Rules for FDs
 IR2. (Augmentation) If X -> Y, then XZ -> YZ
 If X determines Y and Z is any attribute set, then XZ determines

YZ. It is also called as a partial dependency


 Adding the same set of attributes to both the left- and right-hand
sides of a dependency results in another valid dependency
 Example
 Regno → Firstname, Lastname, then
Regno, address → Firstname, Lastname, address

 Note
 The augmentation rule can also be stated as X → Y, then XZ → Y; that is,
augmenting the left-hand side attributes of an FD produces another valid FD

39
Proof of IR2
 Augmentation : If X -> Y, then XZ -> YZ
 Assume that X  Y hold in a relation instance r of R but that
XZ  YZ does not hold
 Then there must be exist two tuples t1 and t2 in r such that
 (1) t1[X] = t2[X]
 (2) t1[Y] = t2[Y]
 (3) t1[XZ] = t2[XZ]
 (4) t1[YZ] ≠ t2[YZ]

 This is not possible because


(5) t1[Z] = t2[Z] from (1) and (3),
(6) t1[YZ] = t2[YZ] from (2) and (5) contradicting (4)

40
Inference Rules for FDs
 IR3. (Transitive) If X -> Y and Y -> Z, then X -> Z
 IR3 says functional dependencies are transitive
 if X determines Y and Y determines Z, then X also determines Z

 Example
 Emp_id→ Dept_id
Dept_id →Dept_Name

41
Proof of IR3
 Transitive: If X -> Y and Y -> Z, then X -> Z

 Assume that
(1) X → Y and
(2) Y → Z both hold in a relation r
 Then for any two tuples t1 and t2 in r such that t1 [X] = t2 [X], we
must have
(3) t1 [Y] = t2 [Y], from assumption (1);
 Hence we must also have
(4) t1 [Z] = t2 [Z] from (3) and assumption (2)

 Thus, X→Z must hold in r

42
Proof of IR4
 Decomposition or Projective : If X -> YZ, then X -> Y and X -> Z

 1. X→YZ (given)
 2. YZ→Y (using IR1 and knowing that YZ ⊇ Y)
 3. X→Y (using IR3 on 1 and 2)

 Example
Rollno → Firstname, Lastname then
Rollno → Firstname and Rollno → Lastname

43
Proof of IR5
 Union or additive : If X -> Y and X -> Z, then X -> YZ

 1. X→Y (given)
 2. X→Z (given)
 3. X→XY (using IR2 on 1 by augmenting with X; notice that XX = X)
 4. XY→YZ (using IR2 on 2 by augmenting with Y)
 5. X→YZ (using IR3 on 3 and 4)

 Example
Rollno → name and Rollno → address then
Rollno → name, address

44
Proof of IR6
 Psuedotransitivity : If X -> Y and WY -> Z, then WX -> Z

 1. X→Y (given)
 2. WY→Z (given)
 3. WX→WY (using IR2 on 1 by augmenting with W)
 4. WX→Z (using IR3 on 3 and 2)

 Example
Rollno → name and name,marks →percentage, then
Rollno,marks → percentage

45
Functional Dependencies
 Suppose we are given a relation scheme R = (A,B,C,G,H,I), and the
set of functional dependencies, F provided below:

A→B
A→C
CG → H
CG → I
B→H

 Is A → H logically implied by F? ==>Yes (by using IR3)


 Is AG → I logically implied by F? ==> Yes (by using IR2 and IR3)

46
Closure
 Closure of a set F of FDs is the set F+ of all FDs that can be
inferred from F

 Closure of a set of attributes X with respect to F is the set X+ of all


attributes that are functionally determined by X

 X+ can be calculated by repeatedly applying IR1, IR2, IR3 using the


FDs in F

 Definition
 For each such set of attributes X, we determine the set X+ of

attributes that are functionally determined by X based on F; X+ is


called the closure of X under F

47
Closure of a set F of FDs
(F+)
 Assume that there are 4 attributes A, B, C, D in a relation R and that
F = {A → B, B → C}
 Then, F+ includes all the following FDs
 A → A, A → B, A → C, B → B, B → C, C → C, D → D, AB → A, AB
→ B, AB → C, AC → A, AC → B, AC → C, AD → A, AD → B, AD →
C, AD → D, BC → B, BC → C, BD → B, BD → C, BD → D, CD →
C, CD → D, ABC → A, ABC → B, ABC → C, ABD → A, ABD → B,
ABD → C, ABD → D, BCD → B, BCD → C, BCD → D, ABCD → A,
ABCD → B, ABCD → C, ABCD → D

Note : Computing the closure of a set of FDs (F+) can be


expensive. (Size of closure is exponential in # attributes!)
48
Closure of an Attribute X

Computing the closure of a set of FDs (F+) can be expensive
(Size of closure is exponential in # attributes!)

 Typically, we just want to check if a given FD X ->Y is in the closure


of a set of FDs F - An efficient check
 Compute attribute closure of X (denoted X+ ) wrt F:

Set of all attributes A such that X ->A is in

There is a linear time algorithm to compute this

For each FD Y -> Z in F, if X+ is a superset of Y then add Y to X+

49
Algorithm to Calculate X+

50
Example 1- Closure of
Attributes
 Consider the following relation

 Closure sets with respect to F

51
Example 2- Closure of
Attributes
 Does F = {A ->B, B->C, CD -> E } imply A ->E?
 i.e, Is A -> E in the closure F+? Equivalently, is E in A+?

 Compute A+
 Initialize A+ to {A} : A+ = {A}
 From A -> B, we can add B to A+ : A+ = {A, B}
 From B -> C, we can add C to A+ : A+ = {A, B, C}
 We can not add any more attributes, and A+ does not contain
E therefore A -> E does not hold

52
Example 3 -Closure of
Attributes
 For example, consider the following relation schema about classes
held at a university in a given academic year
CLASS ( Classid, Course#, Instr_name, Credit_hrs, Text, Publisher,
Classroom, Capacity)
 Let F, the set of functional dependencies for the above relation
include the following [Link]:

53
Example 3 -Closure of
Attributes
 The closures of attributes or sets of attributes for some example sets

 Note that each closure above has an interpretation that is revealing


about the attribute(s) on the left-hand-side.
 The closure of { Classid }+ is the entire relation CLASS indicating that
all attributes of the relation can be determined from Classid and
hence it is a key
54
In-class Exercise
 R = ABCDE, F = {A -> BE, C -> BE, B -> D}
 Compute A+, B+, C+, D+, E+

 Answer

55
Use of X+ 1: Checking if
XY
 Steps of checking if an FD X Y is in the closure of a set of FDs F
 Compute X+ wrt F
 Check if Y is in X+
 YX+  XY is in F+
 Does F = {AB, BC, CDE } imply AE?
 i.e, Is A  E in F+? Equivalently, is E in A+?
 A+ (w.r.t. F)={A,B,C}
 E is not in A+, thus, AE is not in F+

56
In-Class Exercise
 Let R(ABCDEFGH) satisfy the following functional dependencies F: {A->B,
CH->A, B->E, BD->C, EG->H, DE->F}

 Which of the following FD is also guaranteed to be satisfied by R?


 BFG ->AE

 ACG ->DH

 CEG-> AB

 Hint: Compute the closure of the LHS of each FD that you get
as a choice. If the RHS of the candidate FD is contained in
the closure, then the candidate follows from the given FDs,
otherwise not

57
In-Class Exercise
 Solution
 FDs: {A->B, CH->A, B->E, BD->C, EG->H, DE->F}

 BFG ->AE ???


 Incorrect: BFG+ = BFGEH, which includes E, but not A
 ACG ->DH ???
 Incorrect: ACG+ = ACGBE, which includes neither D nor H
 CEG->AB ???
 Correct: CEG+ = CEGHAB, which contains AB

58
Use of X+ 2: Finding a key K
 Let R be the set of attributes for a schema and F be its functional
dependency set

 Algorithm
Set K:=R
For each attribute A in K
Compute (K-A)+ w.r.t. F
If (K-A)+ contains all the attributes in R then set K:= K-A

 This algorithm returns only one key out of the possible candidate
keys for R
 The key returned depends on the order in which attributes are
removed from R

59
Use of X+ 2: Finding a key K
 Which of the following could be a key for R(A,B,C,D,E,F,G) with
functional dependencies {AB ->C, CD->E, EF->G, FG->E, DE->C,
and BC ->A}?

 1. BDF
 2. ACDF
 3. ABDFG
 4. BDFG

60
Use of X+ 2: Finding a key K
 Solution
 R(A,B,C,D,E,F,G)
 FDs : {AB->C, CD->E, EF->G, FG->E, DE->C, and BC->A}
 1. BDF ???
 No. BDF + = BDF
 2. ACDF ???
 No. ACDF + = ACDFEG (The closure does not include B)
 3. ABDFG ???
 No. This choice is a super key, but it has proper subsets that are also keys
(e.g. BDFG + = BDFGECA)
 4. BDFG ???
 BDFG + = ABCDEFG
 Check if any subset of BDFG is a key:
 Since B, D, F never appear on the RHS of the FDs, they must form part of the key
 BDF + = BDF Not key. So, BDFG is the minimal key, hence the candidate key

61
Finding Keys using FDs
 Tricks for finding the key
 If an attribute never appears on the RHS of any FD, it must be
part of the key
 If an attribute never appears on the LHS of any FD, but appears
on the RHS of any FD, it of any FD, it must not be part of any key

 Example
 Consider R = {A, B, C, D, E, F, G, H} with a set of FDs
F = {CD →A, EC →H, GHB →AB, C →D, EG →A, H →B, BE →CD,
EC →B}
 Find all the candidate keys of R

62
Finding Keys using FDs
 R = {A, B, C, D, E, F, G, H}
 F = {CD →A, EC →H, GHB →AB, C →D, EG →A, H →B, BE →CD,
EC →B}
 Find all the candidate keys of R
 Note
 EFG never appear on RHS of any FD. So, EFG must be part of ANY key
of R
 A never appears on LHS of any FD, but appears on RHS of some FD.
So, A is not part of ANY key of R
 We now see if EFG is itself a key…
 EFG+ = EFGA ≠ R; So, EFG alone is not key

63
Finding Keys using FDs
 R = {A, B, C, D, E, F, G, H}
 F = {CD →A, EC →H, GHB →AB, C →D, EG →A, H →B, BE →CD,
EC →B}
 Checking by adding single attribute with EFG (except A):
 BEFG+ = ABCDEFGH = R; it’s a key [BE →CD, EG →A, EC →H]
 CEFG+ = ABCDEFGH = R; it’s a key [EG →A, EC →H, H →B, BE
→CD]
 DEFG+ = ADEFG ≠ R; it’s not a key [EG ≠ R; it’s not a key [EG →A]
 EFGH+ = ABCDEFGH = R; it’s a key [EG →A, H →B, BE →CD]

 If we add any further attribute(s), they will form the super key
 Therefore, we can stop here searching for candidate key(s)
 Therefore, candidate keys are: {BEFG, CEFG, EFGH}
64
In-Class Exercise
 Consider R = {A, B, C, D, E, F, G} with a set of FDs
F = {ABC →DE, AB →D, DE →ABCF, E →C}
 Find all the candidate keys of R

 Note
 G never appears on RHS of any FD. So, G must be part of ANY key of R
 F never appears on LHS of any FD, but appears on RHS of some FD
So, F is not part of ANY key of R
 G+ = G ≠ R So, G alone is not a key!

65
In-Class Exercise - Solution
 Try to find keys by adding more attributes (except F) to G
 Add LHS of FDs that have only one attribute (E in E →C):
 GE+ = GEC ≠ R
 Add LHS of FDs that have two attributes (AB in AB →D and DE in DE
→ABCF):
 GAB+ = GABD
 GDE+ = ABCDEFG = R; [DE = R; [DE →ABCF] It’s a key!
 Add LHS of FDs that have three attributes (ABC in ABC →DE), but not
taking super set of GDE:
 GABC+ = ABCDEFG = R; [ABC →DE, DE →ABCF] It’s a key!
 GABE+ = ABCDEFG = R; [AB →D, DE →ABCF] It’s a key!
 If we add any further attribute(s), they will form the superkey Therefore,
we can stop here
 The candidate key(s) are {GDE, GABC, GABE}

66
Use of X+ 3: Compute F+
 Given a set of functional dependencies F, we define F+ to be the set
of all functional dependencies that can be inferred from F

1. F+ ={};
2. For each attribute set A in R, computing A+
3. For each XY implied by A+, add XY to F+

67
Equivalence of Sets of
Functional Dependencies
 Let E&F be two sets of functional dependencies
• F covers E if E  F+
• E and F are equivalent if E+= F+
• E+= F+ iff E covers F and F covers E

 Note: Equivalence means that every FD in E can be


inferred from F, and every FD in F can be inferred from E

 Determine whether F covers E


 For each FD XY in E, calculate X+ with respect to F,
+
then check whether X Y
68
Equivalence of Sets of
Functional Dependencies
 Example : Check whether or not F is equivalent to G
 F={AC, ACD, EAD,EH}

 G={ACD, EAH}

 Prove that F is covered by G



{A}+= {A,C,D} (wrt to G). Since {C}  A+, AC can be inferred from G


{AC}+= {A,C,D} (wrt to G). Since {D} {AC}+, ACD is covered by G
 +
 +
{E} = {E,A,H,C,D}
 (wrt G), Since {AD} {E} , EAD is covered by G
 Since {H} {E}+, EH is covered by G

69
Equivalence of Sets of
Functional Dependencies
 F={AC, ACD, EAD,EH}
 G={ACD, EAH}

 Prove that G is covered by F


 {A}+= {A,C,D} (wrt F), Since {CD}  {A}+, ACD is covered by F
 {E}+= {E,A,D,H,C} (with respect to F)
{A,H}  {E}+, EAH is covered by F

 Since F covers G and G covers F, F and G are equivalent

70
Equivalence of Sets of
Functional Dependencies – In-
Example 2
class Exercise
 Let F1 = {A → C, AC → D, E → AD} and F2 = {A → CD, E → AH}. Are
F1 and F2 are equivalent?

 Check if F1 ⊆ F2+:
 A+ = ACD wrt F2

 AC+ = ACD wrt F2

 E+ = ACDEH (from E → AH, A → CD) wrt F2

 ie., if you know E then you know ACDEH according to the FDs of F2. Hence,
the functional dependency E → AD of F1 is in F2
 We are able to derive all the FDs of F1 from F2+
Hence, F1 ⊆ F2+ is TRUE

71
Equivalence of Sets of
Functional Dependencies – In-

class
Check if F2 ⊆ F1
Exercise
+


A+= ACD wrt F1, So, A → CD of F2 is in F1

E+ = ACDE, So, E → AH of F2 is not in F1

 Hence, F2 ⊆ F1+ is FALSE

 From the above, it is proved that F1 and F2 are not equivalent

72
Minimal Sets of Functional
Dependencies
 A set of functional dependencies F is minimal if it satisfies the
following three conditions:

 1. Every FD in F has a single attribute for its right-hand side


(This is a standard form, not a requirement)
 2. We cannot replace any dependency XA in F with a
dependency YA, where Y is a proper subset of X, and still
have a set of dependencies that is equivalent to F
 3. We cannot remove any dependency from F and still have a
set of dependencies that is equivalent to F

 Note: There can be several minimal covers for a set of functional


dependencies

73
We can think of a minimal set of dependencies as being a set of
dependencies in a standard or canonical form and with no redundancies
Minimal Sets of FDs
 Definition
 A minimal cover of a set of functional dependencies E is a minimal set of
dependencies (in the standard canonical form and without redundancy)
that is equivalent to E. We can always find at least one minimal cover F
for any set of dependencies E

74
Minimal Cover
 Example 1
 Find the minimal cover of the set of functional dependencies given
below.
F= {A → C, AB → C, C → DI, CD → I, EC → AB, EI → C}
 Steps of minimal cover
 1. Right Hand Side (RHS) of all FDs should be single attribute -
every dependency in a canonical form with a single attribute on the right-
hand side
 2. Remove extraneous attributes
 3. Eliminate redundant functional dependencies
 Steps 2 and 3 ensure that there are no redundancies in the dependencies either
by having redundant attributes on the left-hand side of a dependency (Step 2) or
by having a dependency that can be inferred from the remaining FDs in F (Step
3).
75
Minimal Cover
 F = {A → C, AB → C, C → DI, CD → I, EC → AB, EI → C}
 Step 1:
 Right Hand Side (RHS) of all FDs should be single attribute. So we write F as F1,
as follows:
 F1 = {A → C, AB → C, C → D, C → I, CD → I, EC → A, EC → B, EI → C}

 Step 2 : Remove extraneous attributes


 Extraneous attribute is a redundant attribute on the LHS of the functional
dependency
 In F1, the FDs, AB → C, CD → I, EC → A, EC → B, and EI → C have more than
one attribute in the LHS. Hence, we check one of these LHS attributes are
extraneous or not
 To check, we need to find the closure of each attribute on the LHS

76
Minimal Cover
 F1 = {A → C, AB → C, C → D, C → I, CD → I, EC → A, EC → B, EI → C}
 (i) A+ = ACDI
 (ii) B+ = B
 (iii) C+ = CDI
 (iv) D+ = D
 (v) E+ = E
 (vi) I+ = I
 From (i), the closure of A included the attribute C. So, B is extraneous in AB
→ C, and B can be removed
 From (iii), the closure of C included the attribute I. So, D is extraneous in
CD → I, and D can be removed
 No more extraneous attributes are found

77
Minimal Cover
 We write F1 as F2 after removing extraneous attributes from F1
 F2 = {A → C, C → D, C → I, EC → A, EC → B, EI → C}

 Step 3: Eliminate redundant functional dependency


 None of the FDs in F2 is redundant. Hence, F2 is minimal cover

78
Minimal Cover
 Example 2 : Find the minimal cover of the set of functional dependencies
given; {A → BC, B → C, AB → D}
 Step 1. Right Hand Side (RHS) of all FDs should have single attribute. So
we write F as F1, as follows;
 F1 = {A → B, A → C, B → C, AB → D}
 Step 2: Remove extraneous attributes
 B+ = BC
 A+ = ABCD
 so B is extraneous, ie., we can identify D without B on the LHS. Now, we can write
the new set of FDs, F2 as follows;
 F2 = {A → B, A → C, B → C, A → D}
 Step 3: Eliminate redundant functional dependency
 If A → B, and B → C, then A → C is true (according to transitive rule). Hence, the FD A →
C is redundant. We can eliminate this and we get final set of FDs F3 as follows:

F3 = {A → B, B → C, A → D} - minimal cover of F

79
Minimal Cover – In-Class
Exercise
 F= {A → BC, B → C, A → B, AB → C}

 Step1 : Put FDs in standard form


 M = {A → B, A → C, B → C, AB → C}

 Step 2: Minimize left side of each FD


 M = {A → B, A → C, B → C, AB → C}
 M = {A → B, A → C, B → C}

 Step 3: Delete redundant FDs


 M= {A → B, A → C, B → C}
 M= {A → B, B → C} – Minimal Cover of F

 Note : Closure of M = Closure of F


80
Decompositions
 Definition
 The decomposition of a schema R=A 1…An is its replacement by a
collection DR = {R1 , R2 , …, Rm} of subsets of R such that R=
R1  R 2  …  R m
 Note: schemas Ri’s do not have to be disjoint!

 Example 1
 Assume the schema R=ABCD. The following are possible
decompositions of R.
 D1 = {AB, CD}
 D2 = {AB, ACD}
 D3 = {A, BCD}
 D4 = {AB, BC, CD, AD}
81
Properties of Relational
Decompositions
 The relational database schema must possess certain additional
properties to ensure a good design
 Two of these properties
 The dependency preservation property and
 The nonadditive (or lossless) join property

 Relation Decomposition
 Universal Relation Schema
 A relation schema R = {A1, A2, …, An} that includes all the attributes
of the database.
 Universal relation assumption : Every attribute name is unique

82
Properties of Relational
Decompositions
 The set F of functional dependencies that should hold on the attributes
of R is specified by the database designers and is made available to
the design algorithms
 Using F, the algorithms decompose the universal relation schema R
into a set of relation schemas D = {R1, R2, ..., Rm} that will become the
relational database schema D is called a decomposition of R
 Each attribute in R will appear in at least one relation schema Ri in the
decomposition so that no attributes are lost
 Attribute preservation condition of a decomposition

 Another goal of decomposition is to have each individual relation Ri in


the decomposition D be in BCNF or 3NF
 Additional properties of decomposition are needed to prevent from
generating spurious tuples
83
1. Dependency Preservation
Property
 Definition of a Decomposition
 Given a set of dependencies F on R, the projection of F on R , denoted by
i

Ri(F) where Ri is a subset of R, is the set of dependencies X → Y in F+


such that the attributes in X -> Y are all contained in Ri

 A decomposition D = {R1, R2, ..., Rm} of R is dependency-


preserving with respect to F if the union of the projections of F on
each Ri in D is equivalent to F; that is
((R1(F)) υ . . . υ (Rm(F)))+ = F+

 Claim
 It is always possible to find a dependency-preserving decomposition D
with respect to F such that each relation is in 3NF (to be discussed later)
84
1. Dependency Preservation
Property


Example 1
of a Decomposition
Assume R= ABCD, and F= {A→B, C → D}.
 The decomposition D1 = {AB, CD} is clearly dependency preserving
Observe the first rule is kept in R1 , while the second is preserved
in R2 . Also notice that schemas AB and CD are in 3NF format

85
1. Dependency Preservation
Property


Example 2
of a Decomposition
Assume R= ABCD, and F= {A→B, B →C, C → D} Evaluate the
decomposition D = {ABC, CD}

 D is clearly F-preserving. Observe that

86
1. Dependency Preservation
Property

of a Decomposition
Algorithm: Testing Preservation of Functional Dependencies

87
Example - 1
 Given the following:
R(A,B,C,D,E)
F = {AB→C, C→E, B→D, E→A}
R1(B,C,D) R2(A,C,E)
 Is this decomposition dependency preserving?

 Z=AB
 For Z  R1 = AB  BCD = B
 {B}+ = BD
 {B}+  R1 = BD  BCD = BD
 Update Z = AB  BD = ABD, continue

88
Example -1

 Z=ABD
 For Z  R2 = ABD  ACE = A
 {A}+ = A
 {A}+  R2 = A  ACE = A
 Update Z, Z is still ABD
 repeat checking R1 to R2

89
Example -1

 Z=ABD
 For Z  R1 = ABD  BCD = BD
 {BD}+ = BD
 {BD}+  R1 = BD  BCD = BD
 Update Z = ABD  BD = ABD
 Z hasn’t changed but you still have to continue

90
Example -1

 Z=ABD and checking R2, Z will still be ABD


 Since Z hasn’t change, we can conclude AB->C is not preserved

 Let’s practice with other functional dependencies B->D


 Z=X=B
 For Z  R1 = B  BCD = B
 {B}+ = BD
 {B}+  R1 = BD  BCD = BD
 Update Z = B  BD = BD
 Since Y=D is proper subset of BD, B->D is preserved
91
Example - 1

 Let’s check C->E

 Z=X=C
 For Z  R2 = C  ACE = C
 {C}+ = CEA
 {C}+  R1 = CEA  ACE = ACE
 Update Z = C  ACE= ACE
 Since Y=E is proper subset of ACE, C->E is preserved

92
Example -1

 Let’s check E->A


 Z=X=E
 For Z  R1 = E  ACE = E
 {E}+ = EA
 {E}+  R1 = EA  ACE = EA
 Update Z = E  EA= EA
 Since Y=A is proper subset of EA, E->A is preserved

93
Example -1

 Shortcut
 For any functional dependency, if both LHS and RHS collectively
are within any of the subschema Ri, then this functional dependency
is preserved

94
Example - 2
 Let R{A,B,C,D} and F={A → B, B → C, C → D, D → A}
 Let’s decomposed R into R1 = AB, R2 = BC, and R3 = CD
 Is this a dependency preserving decomposition?

 A → B, B → C, C → D are preserved for R1, R2, R3


 The key is to check whether D → A is preserved using the algorithm
 This is also preserved

95
Example – 3 In-class Exercise
 R{A,B,C,D,E)
 F={A → BD, B → E}
 Decomposition: R1{A,B,C} R2{A,D} R3{B,D,E}
 Is this a dependency preserving decomposition?

 Solution
 A → BD - Preserved
 B→E - Preserved

96
2. Nonadditive (Lossless) Join
Property of a Decomposition
 Observation
 Another property that a decomposition D should possess is the nonadditive join
property, which ensures that no spurious(phantom) tuples are generated when a
NATURAL JOIN operation is applied to the relations in the decomposition

 Note:
 The word loss in lossless refers to loss of information, not to loss of tuples. If
a decomposition does not have the lossless join property, we may get additional
spurious tuples
 The nonadditive join property ensures that no spurious tuples result after the
application of PROJECT and JOIN operations

97
Nonadditive Lossless-Join
Decomposition
 Example 1

98
Nonadditive Lossless-Join
Decomposition
 Example 2

99
Testing Lossless-Join (or Non-
Additive) Decomposition
 Definition (good only on binary partition)
 If D={ R1, R2} is a decomposition of R and F is a set of FDs on R, then D
has a lossless-join with respect to F if

 Example
 Consider R=ABC and F = { A→B }
 Let‟s assess the partition D1= {AB, AC} Here R1=AB and R2= AC
 therefore R1  R2 = A
R1 – R2 = B
R2 – R1 = C
 The question F=>(R1  R2 ) → (R1 - R2 ) is equivalent to F=>A → B and we
know this is true because F contains exactly this dependency
 We must conclude the decomposition D1 is lossless with respect to F
100
Testing Lossless-Join (or Non-
Additive) Decomposition
 Consider the previous problem where R=ABC and F = { A→B }
 Let’s now evaluate the partition D2= {AB, BC}. Here R1=AB and R2= BC
 Therefore
R1  R2 = B
R1 – R2 = A
R2 – R1 = C

 The question F=>( R1  R2) → (R1-R2) is equivalent to F=>B → A (or


F=>B → C). Both dependencies are NOT derivable from F (they are not in
F+)
 We conclude the decomposition D2 is NOT lossless with respect to F (we
will call it a lossy decomposition)

101
Testing Lossless-Join (or Non-
Additive) Decomposition
 Consider the schema R=ABCDand F = { A→B, C→D }
 Let’s now evaluate the following binary partitions

102
Successive (Non-Additive)
Lossless-Join Decompositions

103
Algorithm - Testing for
Nonadditive Join Property

104
Testing for Nonadditive Join
 Example 1
Property

105
Testing for Nonadditive Join
Property
 Example 2

106
Testing for Nonadditive Join
Property
 Example 3

107
Testing for Nonadditive Join
Property
In-class Exercise

108
Prime, Non-Prime and Key
Attributes
 Key Attributes
A set X of attributes in the schema R is a key for R under the
dependencies F, if X→Y and no proper subset Y of X (X Y)
has the same property

 Prime Attributes
 An attribute A in a relation schema R is prime when it is part of any
candidate key of the relation

 If A is not included in any candidate key of R, A is called non-prime

109
Prime, Non-Prime and Key
Attributes
 Example

110
Prime, Non-Prime and Key
Attributes

111
Normalization of Relations
 Normalization of data is a process of analyzing the given relation
schemas based on their FDs and primary keys to achieve the
desirable properties of
 minimizing redundancy and

 minimizing the insertion, deletion, and update anomalies

 It can be considered as a “filtering” or “purification” process to make


the design have successively better quality
 Unsatisfactory (“bad”) relation schemas that do not meet certain
conditions—the normal form tests—are decomposed into smaller
relation schemas that meet the tests and hence possess the
desirable properties

112
Normalization of Relations
 Normalization procedure provides database designers with the
following
 A formal framework for analyzing relation schemas based on their keys
and on the functional dependencies among their attributes
 A series of normal form tests that can be carried out on individual relation
schemas so that the relational database can be normalized to any
desired degree

 Definition. The normal form of a relation refers to the highest


normal form condition that it meets, and hence indicates the degree
to which it has been normalized

113
Normalization of Relations

based on keys and FDs of


a relation schema

based on keys, multi-valued


dependencies

based on keys, join dependencies

114
Normalization of Relations
 The process of normalization through decomposition must also
confirm the existence of the following additional properties to ensure
a good relational design

 Nonadditive join or lossless join property


 guarantees that the spurious tuple generation problem does not

occur with respect to the relation schemas created after


decomposition
 extremely important and cannot be sacrificed

 Dependency preservation property


 ensures that each functional dependency is represented in some

individual relation resulting after decomposition


 less stringent and may be sacrificed

115
Practical Use of Normal
Forms
 Normalization is carried out in practice so that the resulting designs are of
high quality and meet the desirable properties
 The practical utility of these normal forms becomes questionable when the
constraints on which they are based are hard to understand or to detect
 The database designers need not normalize to the highest possible normal
form
 (usually up to 3NF and BCNF. 4NF rarely used in practice)

 Another point worth noting is that the database designers need not
normalize to the highest possible normal form. Relations may be left in a
lower normalization status, such as 2NF, for performance reasons

 Denormalization
 The process of storing the join of higher normal form relations as

a base relation-which is in a lower normal form


116
Definitions of Keys and
Attributes Participating in

Keys
A superkey of a relation schema R = {A1, A2, ...., An} is a set of
attributes S subset-of R with the property that no two tuples t1 and
t2 in any legal relation state r of R will have t1[S] = t2[S]

 A key K is a superkey with the additional property that removal of


any attribute from K will cause K not to be a superkey any more
 If a relation schema has more than one key, each is called a
candidate key
 One of the candidate keys is arbitrarily designated to be the primary key,
and the others are called secondary keys
 A Prime attribute must be a member of some candidate key
 A Nonprime attribute is not a prime attribute- that is, it is not a
member of any candidate key

117
First Normal Form (1NF)
 1NF – states that the domain of an attribute must include only
atomic (simple, indivisible) values and that the value of any attribute
in a tuple must be a single value from the domain of that attribute
 It disallows multivalued attributes, composite attributes, and their
combinations
 In other words, 1NF disallows relations within relations or relations as
attribute values within tuples
 Each column must contain only a single atomic (indivisible)
values
 Repeating groups of records (redundancy) must be eliminated
 Eliminate duplicative columns from the same table

118
First Normal Form (1NF)
(a) A relation schema that is not
in 1NF

(b) Sample state of relation


DEPARTMENT

(c) 1NF version of the same


relation with redundancy

119
First Normal Form (1NF)
 There are three main techniques to achieve first normal form for
Department relation
 Remove the attribute Dlocations that violates 1NF and place it in a
separate relation DEPT_LOCATIONS along with the primary key
Dnumber of DEPARTMENT. The primary key of this relation is the
combination {Dnumber, Dlocation}
 Expand the key so that there will be a separate tuple in the original
DEPARTMENT relation for each location of a DEPARTMENT
 If a maximum number of values is known for the attribute—for example,
if it is known that at most three locations can exist for a department -
replace the Dlocations attribute by three atomic attributes: Dlocation1,
Dlocation2, and Dlocation3.
 This solution has the disadvantage of introducing NULL values if most departments have
fewer than three locations. Querying on this attribute becomes more difficult; for
example,
 List the departments that have ‘Bellaire’ as one of their locations in this design
120
First Normal Form (1NF)
 First normal form also disallows multivalued attributes that are
themselves composite
 These are called nested relations because each tuple can have a
relation within it

121
Second Normal Form – 2NF
 Second normal form is based on the concept of full functional
dependency
 A functional dependency X → Y is a full functional dependency if
removal of any attribute A from X means that the dependency does
not hold any more
 ie., for any attribute A  X, (X – {A}) does not functionally determine Y

 A functional dependency X→Y is a partial dependency if some


attribute A ε X can be removed from X and the dependency still
holds; that is, for some A ε X, (X – {A}) → Y
 Examples:
 {SSN, PNUMBER} → HOURS is a full FD since neither SSN → HOURS nor
PNUMBER → HOURS hold
 {SSN, PNUMBER} → ENAME is not a full FD (it is called a partial dependency )
since SSN → ENAME also holds

122
Second Normal Form – 2NF
 Definition : A relation scheme R is in second normal form (2NF)
with respect to a set of FDs F if it is in 1NF and every nonprime
attribute is fully dependent on every key of R
 The test for 2NF involves testing for functional dependencies whose
left-hand side attributes are part of the primary key
 If the primary key contains a single attribute, the test need not be applied
at all
 Example
 Let R=ABCD and F = { AB → C, B → D }.
 Here, AB is a key. C and D are non-prime.
 C is fully dependent on the entire key AB
 D functionally depends on just part of the key (B →D). This is called
a partial dependency
123
Second Normal Form – 2NF
 The EMP_PROJ relation is in 1NF but is not in 2NF
 The nonprime attribute Ename violates 2NF because of FD2, as do the
nonprime attributes Pname and Plocation because of FD3.
 The functional dependencies FD2 and FD3 make Ename, Pname, and
Plocation partially dependent on the primary key {Ssn, Pnumber} of
EMP_PROJ, thus violating the 2NF test

124
Third Normal Form – 3NF
 Third normal form is based on the concept of transitive
dependency
 A functional dependency X→Y in a relation schema R is a
transitive dependency if there exists a set of attributes Z in R that
is neither a candidate key nor a subset of any key of R and both
X→Z and Z→Y hold
 Definition. A relation schema R is in 3NF if it satisfies 2NF and no
nonprime attribute of R is transitively dependent on the primary key
 Note :
 In X → Y and Y → Z, with X as the primary key, we consider this a problem only if
Y is not a candidate key.
 When Y is a candidate key, there is no problem with the transitive dependency.
For example : Consider EMP (SSN, Emp#, Salary )
Here, SSN → Emp# → Salary and Emp# is a candidate key

125
Third Normal Form – 3NF

126
Third Normal Form – 3NF
 Key : {SID}
 Building → Fee
 Building → Manager

 Fee and Manager are transitively depend on SID via non-prime


attribute Building. Therefore, the relation is not in 3NF

127
Normal Forms Defined
Informally
 1st normal form
 All attributes depend on the key
 2nd normal form
 All attributes depend on the whole key
 3rd normal form
 All attributes depend on nothing but the key

128
Summary of Normal Forms
Based on Primary Keys

129
In-Class Exercise
 INVOICE (Pine Valley Furniture Company)

130
In-Class Exercise
 Table with multivalued attributes, not in 1st normal form

131
In-Class Exercise
 INVOICE relation (1NF) - Table with no multivalued attributes and
unique rows

132
In-Class Exercise
 Anomalies in this Table

 Insertion
 if new product is ordered for order 1007 of existing customer, customer
data must be re-entered, causing duplication
 Deletion
 if we delete the Dining Table from Order 1006, we lose information
concerning this item's finish and price
 Update
 changing the price of product ID 4 requires update in several records

133
In-Class Exercise

134
Therefore, NOT in 2nd Normal Form
In-Class Exercise
 2NF - Partial Dependencies are Removed

Partial dependencies are removed, but there are still transitive 135
dependencies
In-Class Exercise
 3NF - Transitive Dependencies are Removed

136
Approaches to Normalization

 There are two principle approaches to normalization

 Decomposition
 Break larger relations into smaller ones

 Synthesis
 Begin with a set of dependencies (usually FDs), and construct a
corresponding relational schema

137
Algorithms for Relational
Database Design

 1. Non-Additive Decomposition into 3NF Schemas


 Non-Additive Decomposition into BCNF Schemas
 2. Dependency Preserving Decompositions into 3NF Schemas
a) Synthesis Method
 3. Dependency-Preserving and Non-additive (Lossless)
Decompositions into 3NF Schemas

138
Non-Additive Decomposition
into 3NF Schemas
 Algorithm : 3NF Decomposition Method

139
Non-Additive Decomposition
into 3NF Schemas

140
Non-Additive Decomposition
into 3NF Schemas
 Shortcomings of 3NF Decomposition
 1. Time consuming - testing if an attribute is prime is an NP operation
 2. It may produce too many tables (more than we need for 3NF)

141
Non-Additive Decomposition
into 3NF Schemas
 Shortcomings of 3NF Decomposition
 A serious problem with the 3NF-Decomposition methods is that
dependencies in F may not be enforced on the decomposition

142
Algorithms for Relational
Database Design

 1. Non-Additive Decomposition into 3NF Schemas


 Non-Additive Decomposition into BCNF Schemas
 2. Dependency Preserving Decompositions into 3NF Schemas
a) Synthesis Method
 3. Dependency-Preserving and Non-additive (Lossless)
Decompositions into 3NF Schemas

143
Dependency Preserving
Decompositions into 3NF
Schemas

144
Dependency Preserving
Decompositions into 3NF
Schemas

145
Dependency Preserving
Decompositions into 3NF
Schemas

146
Dependency Preserving
Decompositions into 3NF
Schemas

147
Dependency-Preserving and Non-
additive (Lossless) Join
Decomposition into 3NF Schemas

148
Dependency-Preserving and
Nonadditive (Lossless) Join
Decomposition into 3NF Schemas

149
Boyce-Codd normal form
(BCNF)
 Each normal form is strictly stronger than the previous one
 Every 2NF relation is in 1NF
 Every 3NF relation is in 2NF
 Every BCNF relation is in 3NF
 There exist relations that are in 3NF but not in BCNF
 The goal is to have each relation in BCNF (or 3NF)

 Partial and transitive dependencies are the most important reasons


for update anomalies. The 3NF disallow partial and transitive
dependencies on the primary key so if all database relations is in
3NF we can think the result may be good

150
Boyce-Codd normal form
(BCNF)
 BCNF- was proposed as a simpler form of 3NF, but it was found to be
stricter than 3NF
 ie., every relation in BCNF is also in 3NF; however, a relation in 3NF is not
necessarily in BCNF
 Definition. A relation schema R is in BCNF if whenever a nontrivial
functional dependency X→A holds in R, then X is a superkey of R

Figure
Boyce-Codd normal form. (a) BCNF normalization of LOTS1A with the
functional dependency FD2 being lost in the decomposition. (b) A schematic
relation with FDs; it is in 3NF, but not in BCNF due to the f.d. C → B.

151
Boyce-Codd normal form
(BCNF)

Figure
A relation TEACH that is in
3NF but not BCNF

152
Boyce-Codd normal form
(BCNF)
 Two FDs exist in the relation TEACH:
 fd1: { student, course} -> instructor

 fd2: instructor -> course

 {student, course} is a candidate key for this relation

 So this relation is in 3NF but not in BCNF

 A relation NOT in BCNF should be decomposed so as to meet this


property, while possibly forgoing the preservation of all functional
dependencies in the decomposed relations
 Non-additive decomposition is a must during normalization

153
Boyce-Codd normal form
(BCNF)
 TEACH is decomposed into one of the three following possible pairs
 1. {Student, Instructor} and {Student, Course}
 2. {Course, Instructor} and {Course, Student}
 3. {Instructor, Course} and {Instructor, Student}

 All three decompositions lose the functional dependency fd1


 The desirable decomposition of those just shown is 3 because it will
not generate spurious tuples after a join

 Hence, BCNF of TEACH relation is


(Instructor, Course) and (Instructor, Student)

154
Boyce-Codd normal form
(BCNF)
 Difference between 3NF and BCNF is that for a functional
dependency A  B, 3NF allows this dependency in a relation if B is
a primary-key attribute and A is not a candidate key.
 Whereas, BCNF insists that for this dependency to remain in a
relation, A must be a candidate key.
 Every relation in BCNF is also in 3NF. However, relation in 3NF may
not be in BCNF. Below is in 3NF but not in BCNF

155
Non-Additive Decomposition
into BCNF Schemas

156
Non-Additive Decomposition
into BCNF Schemas

157
Non-Additive Decomposition
into BCNF Schemas

158
Non-Additive Decomposition into
BCNF Schemas

159
Review of Normalization (1NF
to BCNF)

160
Review of Normalization (1NF
to BCNF)

161
Review of Normalization (1NF
to BCNF)

162
Review of Normalization (1NF
to BCNF)
StaffPropertyInspection (propertyNo, iDate, iTime, pAddress, comments,
staffNo, sName, carReg)

Property (propertyNo, pAddress)


PropertyInspect (propertyNo, iDate, iTime, comments, staffNo, sName,
carReg)

Property (propertyNo,
pAddress)
Staff (staffNo, sName)
PropertyInspect (propertyNo, iDate, iTime, comments, staffNo, carReg)

163
Review of Normalization
(1NF to BCNF)
Property (propertyNo,
pAddress)
Staff (staffNo, sName)

Inspection (propertyNo, iDate, iTime, comments, staffNo)

StaffCar (staffNo, iDate, carReg)

PropertyInspect (propertyNo, iDate, iTime, comments, staffNo, carReg)


fd1’: propertyNo, iDate iTime, comments, staffNo, carReg
fd4: staffNo, iDatecarReg
fd5’: carReg, iDate, iTime propertyNo, comments, staffNo
fd6’: staffNo, iDate, iTime propertyNo, comments

164
Review of Normalization (1NF
to BCNF)

165
Multivalued Dependencies and
Fourth Normal Form (4NF)
 Definition
 A multivalued dependency specified on relation schema R, where
X and Y are both subsets of R, specifies the following constraint on any
relation state r of R:
 If two tuples t1and t2exist in r such that t1[X] = t2[X], then two tuples t3
and t4 should also exist in r with the following properties

166
Multivalued Dependencies and
Fourth Normal Form (4NF)

167
Multivalued Dependencies and
Fourth Normal Form (4NF)

168
Multivalued Dependencies and
Fourth Normal Form (4NF)
 Three conditions for multivalued Dependency
 A ->>B, for a single value of A, more than one values exist in B
 Table should have at least three columns
 For a table with columns A, B, and C, B and C should be independent

 Example : Enrolment

169
Multivalued Dependencies and
Fourth Normal Form (4NF)
 Problem???
course and hobby are
independent

 Solution

170
Join Dependencies and Fifth
Normal Form (5NF)

171
Join Dependencies and Fifth
Normal Form (5NF)

172
Join Dependencies and Fifth
Normal Form (5NF)

173
Join Dependencies and Fifth
Normal Form (5NF)

174
Normal Forms and ER
Modeling
 Normalization and ER modeling are two independent concepts

 You can use ER modeling to produce an initial relational schema and then
use normalization to remove any remaining redundancies
 If you are a good ER modeler, it is rare that much normalization will be

required.

 In theory, you can use normalization by itself. This would involve identifying
all attributes, giving them unique names, discovering all FDs and MVDs,
then applying the normalization algorithms
 Since this is a lot harder than ER modeling, most people do not do it.

175

You might also like