You are on page 1of 76

Unit 4 : Introduction to Schema Refinement

March 27, 2024 Department of CSE EID 301 & Database Management Systems 1
Module IV

Normalization of database tables: Schema Refinement and Normal Forms:


Introduction to Schema Refinement, Functional Dependencies, Reasoning
about Functional Dependencies. Normal Forms, Properties of Decomposition,
Normalization, different types of dependencies.

Department of CSE EID 301 & Database Management Systems


March 27, 2024 2
Schema Refinement
 Conceptual database design gives us a set of relation schemas and IC’s , that can be
regarded as a good starting point for the final Database design.

 Database Design based on ER model may have some


o Inconsistency
o Ambiguity
o Redundancy.

To resolve these problems we need to refine the logical database tables by applying some
normalization to get the exact tables. This refinement is called as Normalization.

March 27, 2024 Department of CSE EID 301 & Database Management Systems 3
 Normalization involves building structures like tables(starting from identifying
columns associated in the tables , it is called bottom-up approach).

 Normalization eliminates the duplicate data and makes insert, delete and update
operations much more efficient in terms of performance and space requirement to
store the data.

 Before they are physically stored in database the tables are normalized/refined first.

Department of CSE EID 301 & Database Management Systems


March 27, 2024 4
:
Problems caused by redundancy:

Storing the same information redundantly i.e., more than one place within a
database may lead to several problems.
1. Redundant Storage: same information is stored repeatedly
2. Update Anomalies: If one copy of such repeated data is updated, an
inconsistency is created unless all copies are similarly updated.
3. Insertion Anomalies: it may not be possible to store certain information
unless some other, unrelated, information is stored as well.
4. Deletion Anomalies: It may not be possible to delete certain information
without losing some other, unrelated, information as well

Department of CSE EID 301 & Database Management Systems


March 27, 2024 5
:
7
8
• Example: consider a relation
Hourly_Emps(ssn, name, lot, rating, hrly_wages, hrs_worked);

Suppose the hrly_wages attribute is determined by the rating


attribute(Hrly_wages is dependent on the rating attribute) . This IC is an
example of Functional Dependency.
SSN Name Lot Rating Hrly_wages Hrs_worked
123 Ram 48 8 10 40
231 Raj 22 8 10 30
131 Ravi 35 5 7 30
434 Vijay 35 5 7 32
612 Sam 35 8 10 40

Department of CSE EID 301 & Database Management Systems


March 27, 2024 9
:
Redundant storage: In the above table rating value 8 with hrly_wages 10 is repeated 3
times. This repetition leads to wastage of file storage space and inconsistency, and
inconsistency generates problems in insertion, deletion and updation.
Update Anomalies: the hrly_wages in the 1st tuple could be updated without making a
similar change in the second tuple.
Insert Anomalies: it may not be possible to store certain information unless some other,
unrelated, information is stored as well(we cannot insert a tuple for an employee unless we
know the hrly_wages for employee’s rating value).
Delete Anomalies: It may not be possible to delete certain information without losing some
other, unrelated, information as well( if we delete a tuple with given rating value, we loose
the association b/w the rating and hrly_wages)

So we want schemas that do not permit redundancy, inconsistency and ambiguity. 10


Decomposition(replacing Original relation into smaller relations)
 Organnized collection of data in database
 It is clear that redundancy arises when a relational schema forces an association b/w attributes
that is not natural.
 FD’s can be used to identify such situations and suggest to refine the schema.
 Once FD’s are identified just replace an original relation with smaller relations.
 Each smaller relations contains a subset of attributes of the original relations. This process is
known as Decomposition.
 Original table: Hrly_Emps1(SSN, name, lot,rating,hrly_wages, Hrs_worked) into
 Smaller relations: Hrly_Emps2(SSN, name, lot,rating, Hrs_worked)
Wages(rating,hrly_wages)

Department of CSE EID 301 & Database


March 27, 2024 11
Management Systems
SSN Name Lot Rating Hrs_worked
123 Ram 48 8 40
231 Raj 22 8 30
131 Ravi 35 5 30
434 Vijay 35 5 34
612 Sam 35 8 20

Rating Hrly_wages
8 10
5 7

Department of CSE EID 301 & Database Management Systems


March 27, 2024 12
:
Problems related to Decomposition
 Decomposition may solve the problem of redundancy and inconsistency
but special care should be taken while decomposing the relations
because it may cause problems rather than solving the problem.
Two important questions must be asked repeatedly:
1. Do we need to decompose a relation?
Ans: If so, several normal forms have been proposed for relations. If a
relation schema is in one of these normal forms, we know that certain
kinds of problems cannot arise.
2. What problem does a given decomposition cause?
Ans: W.R.T this question, two properties of decomposition are there

Department of CSE EID 301 & Database Management Systems


March 27, 2024 13
:
Decomposition is a tool that allows us to eliminate redundancy.
Check whether a decomposition allows us to recover the original relation, and
whether it allows us to check ICs efficiently by using the properties.
1.The Loss-less join/lossy join property
2. The Dependency Preservation Property

 The 1st property enables us to recover any instance of the decomposed


relation from corresponding instances of the smaller relations.
 The 2nd property enables us to enforce any constraint on the original relation
by simply enforcing some constraints on each of the smaller relations i.e we
need not perform joins on smaller relations to check whether a constraint on
original relation is violated.
Department of CSE EID 301 & Database Management Systems
March 27, 2024 14
:
Functional Dependencies
A functional dependency(FD) is a kind of IC that generalizes the concept of a
key.
Definition: Let R be a relation schema and let X and Y be nonempty sets of
attributes in R. We say that an instance r of R satisfies the FD X->Y
(X determines Y) if the following holds for every pair of tuples t1 and t2 in r.
If t1.X=t2.X, then t1.Y=t2.Y
The notation t1.X refers to the projection/column of tuple t1 onto the
attribute in X.
An FD X->Y says that it two tuples agree on the values in attributes X, they
must also agree on the values in attributes Y.
March 27, 2024 Department of CSE EID 301 &Database Management Systems: 15
Functional Dependency
18
Example of the FD : A B C D
a1 b1 c1 d1
a1 b1 c1 d2
a1 b2 c2 d1
a2 b1 c3 d1

 The FD AB->C by showing an instance that satisfies this dependency.


 The 1st two tuples show that a FD is not the same as a key constraint: although the FD is not
violated, AB is clearly not a key for the relation.
 The 3rd and 4th tuples illustrates that if two tuples differ in either the A field or the B field, they
can differ in C field without violating the FD.
 If we add a tuple<a1, b1, c2,d1> to the instance of relation R, the resulting instance would
violate the FD; to see the violation, compare the 1st tuple with new tuple
Note: Recall that a legal instance of a relation must satisfy all specified ICs, including all specified
FDs. However , we can never deduce that an FD does hold by looking at one or more instances of
the relation, b’coz like other ICs , FD is a statement about all possible legal instances of the
relation.
March 27, 2024 Department of CSE EID 301 &Database Management Systems: 19
March 27, 2024 Department of Biotechnology, GIT Course Code and 20
Course Title:
March 27, 2024 Department of Biotechnology, GIT Course Code and 21
Course Title:
March 27, 2024 Department of Biotechnology, GIT Course Code and 22
Course Title:
Closure of a set of FDs
The set of all FDs implies by a given set of F of FDs is called the closure of F, denoted as
Q) How can we infer , or Compute the closure of a given set F of FDs?
A) By using the 3 rules called Armstrong’s Axioms, can be applied repeatedly to infer all Fds implied by
a set F of FDs.
We use X, Y, Z to denote set of attributes over a relation schema R.

• Armstrong’s Axioms:
– Reflexivity: If X⊇ Y, then X->Y
– Augmentation: If X-> Y, then XZ->YZ for any Z
– Transitivity: If X -> Y and Y-> Z, then X -> Z
• These are sound and complete inference rules for FDs!

March 27, 2024 Department of CSE EID 301 &Database Management Systems: 23
Armstrong’s Axioms are sound, in that they generate only FDs in when
applied to a set of FDs. They are also complete, in that repeated
application of these rules will generate all FDs in the closure .

Some additional rules while reasoning about :


Union: If X->Y and X->Z, the X->YZ.
Decomposition: If X->YZ, then X->Y and X->Z.
Pseudo Transitivity: If X->Y and ZY->W, then ZX->W

March 27, 2024 Department of CSE EID 301 &Database Management Systems: 24
Trivial Functional Dependency:
FDs are said to be TFD if they are satisfied by all the relations
Example:
Consider a relation schema ABC with FDs A->B and B->C.
 In a trivial FD, the right side contains only attributes that also appear on the left
side, such dependencies always hold due to reflexivity.
 Using reflexivity, we can generate all trivial dependencies which are of the form:
X->Y, where Y ⊆ X(every element of Y is in X), X ⊆ ABC, and Y ⊆ ABC.
 From transitivity , we get A->C.
 From Augmentation, we get the non trivial dependencies :
AC->BC, AB->AC,AB->BC
March 27, 2024 Department of Biotechnology, GIT Course Code and 25
Course Title:
Example1:
Suppose we are given a relation with attributes A, B,C,D,E,F and the functional
dependencies(FDs)
A → BC
B→E
CD → EF
Solution:
A → BC (given)
A → B and A → C (decomposition)
AD → CD ( augmentation)
CD → EF (given)
AD → EF (Transitivity)
AD → E, AD → F (decomposition)
26
Closure of a set of FDs contd.

Example2:

As another example, we use a more elaborate version of Contracts:


Contracts(contractid, supplierid, projectid, deptid, partid, qty, value)
We denote the schema for Contracts as CSJDPQV. The meaning of a tuple is that
the contract with contractid C is an agreement that supplier S(supplierid) will
supply Q items of part P(partid) to project J(projectid) associated with
department D(deptid); the value V of this contract is equal to value.

Department of computer science and engineering, GST Course Code


March 27, 2024 27
19ECS333 Course Title: Database management systems
Closure of a set of FDs contd.
The following ICs are known to hold:
• The contract id C is a key: C → CSJDPQV.
• A Project purchases a given part using a single contract: JP → C.
• A department purchases at most one part from a supplier: SD → P.

Several additional FDs hold in the closure of the set of given FDs.
• From JP → C, C → CSJDPQV and transitivity, we infer JP → CSJDPQV
• From SD → P and augmentation, we infer SDJ → JP
• From SDJ → JP, JP → CSJDPQV, and transitivity, we infer SDJ → CSJDPQV
We can infer several additional FDs that are in the closure by using augmentation or
decomposition.
• For example, from C → CSJDPQV, using decomposition, we can infer:
• C → C, C → S, C → J, C → D, and so forth.
• Finally we have a number of FDs from the reflexivity rule.
Department of computer science and engineering, GST Course Code
March 27, 2024 28
19ECS333 Course Title: Database management systems
Attribute Closure

• If we just want to check whether a given dependency, say X → Y, is in the closure of


a set of FDs, we can do so efficiently without computing F+.
• We first compute the attribute closure X+ with respect to F, which is the set of
attributes A such that X → A can be inferred using the Armstrong Axioms.

The algorithm for computing the attribute closure of a set X of attributes is shown
below:
closure=X;
repeat until there is no change:{
if there is an FD U → V in F such that U ⊆ closure,
then set closure = closure U V
}
29
Closure of a set of FDs:
Example1: Find the closure of

30
March 27, 2024 Department of Biotechnology, GIT Course Code and 31
Course Title:
Example2:
A → BC
E → CF
B→E
CD→ EF
Find the closure of
Solution:
= { A, B, C, E, F}

March 27, 2024 Department of Biotechnology, GIT Course Code and 32


Course Title:
= {A, B}
{A, B} U {C} = {A, B, C} [A → BC ]
= {A, B, C} U {E} = {A, B, C, E} [B → E ]
= {A, B, C, E} U {F} = {A, B, C, E, F} [E → CF ]

Therefore, the closure of = {A, B, C, E, F}

March 27, 2024 Department of Biotechnology, GIT Course Code and 33


Course Title:
March 27, 2024 Department of Biotechnology, GIT Course Code and 34
Course Title:
• Insertion Anomaly

March 27, 2024 Department of Biotechnology, GIT Course Code and 35


Course Title:
• Update anomaly

March 27, 2024 Department of Biotechnology, GIT Course Code and 36


Course Title:
March 27, 2024 Department of Biotechnology, GIT Course Code and 37
Course Title:
38
39
March 27, 2024 Department of Biotechnology, GIT Course Code and 40
Course Title:
March 27, 2024 Department of Biotechnology, GIT Course Code and 41
Course Title:
42
43
Second Normal Form:

• A table is said to be in 2NF iff


i) It should be in 1NF
ii) It should not have any partial dependencies.

44
March 27, 2024 45
March 27, 2024 46
March 27, 2024 47
March 27, 2024 Department of Biotechnology, GIT Course Code and 48
Course Title:
49
Third Normal Form:
• A table is said to be in 3NF iff
i) It should be in 2NF
ii) It should not have transitive dependency.
Note:
X  A, any one condition satisfied
i) Trivial Functional Dependency.
ii) LHS should be a super key or candidate key.
iii) RHS should be a part of candidate key.

BCNF:
i)Trivial Functional Dependency.
ii) LHS should be a super key or candidate key.
50
March 27, 2024 Department of Biotechnology, GIT Course Code and 51
Course Title:
March 27, 2024 52
March 27, 2024 Department of Biotechnology, GIT Course Code and 53
Course Title:
Third Normal Form:

54
55
March 27, 2024 56
March 27, 2024 Department of Biotechnology, GIT Course Code and 57
Course Title:
Boyce-Codd normal form (BCNF) :

58
Fourth Normal Form:
• A table is said to be in 4NF iff
i) It should satisfy BCNF
ii) It should not have Multi-valued dependency.

• In 2NF, we removed partial dependency.


• And in 3NF, we removed transitive dependency.

March 27, 2024 Department of Biotechnology, GIT Course Code and 59


Course Title:
March 27, 2024 Department of Biotechnology, GIT Course Code and 60
Course Title:
March 27, 2024 61
Fifth Normal Form:
A table is said to be in 5NF iff
i) It is in 4NF
ii) No join dependency

March 27, 2024 Department of Biotechnology, GIT Course Code and 62


Course Title:
Finding Candidate Key

March 27, 2024 63


March 27, 2024 Department of Biotechnology, GIT Course Code and 64
Course Title:
Example2:

R( A,B,C,D,E)
FD’s
A ->B
D->E
Find all the possible candidate keys
= {A, C, D,B, E}
So ACD is candidate key

March 27, 2024 Department of Biotechnology, GIT Course Code and 65


Course Title:
• R(A,B,C,D,E)
Problems on Normal Forms
• F: A->B
• BC->E
• ED->A.
In which normal form is the table in?
First find out the Candidate keys(CK)
(ABCDE)+=(ABCDE)=ACDE=CDE
Discarding: discarded B(A->B): ACDE: discard A---- CDE
Candidate keys: ACD,ECD,BCD---table is in 3NF

X  A, any one condition satisfied


i) Trivial Functional Dependency.
ii) LHS should be a super key or candidate key.
iii) RHS should be a part of candidate key.

BCNF:
i)Trivial Functional Dependency.
ii) LHS should be a super key or candidate
March 27, 2024
key.
Department of Biotechnology, GIT Course Code and 66
Course Title:
Problems on Normal Forms: In which Normal Form the table is
in?
• R(ABCDE)
• FDs: {A->BCDE, BC->ACE, D->E}

• R(ABCDE)
• FDs: {AB->CDE, D->A}

March 27, 2024 Department of Biotechnology, GIT Course Code and 67


Course Title:
Properties of Decomposition
• 1)lossless join Decomposition
• 2) Dependency Preserving

1) lossless join Decomposition:Let R be a relation.Let R be decomposed into R1 and R2.The decomposition is saidto
be lossless join decomposition if after joining the decomposed table we get back our original instance as in R.
3 conditions needs to be satisfied for lossless join decomposition:
i) attribute(R1) U attribute(R2)=attribute(R). i.e after joining the decomposed relations the number of attributes
should be the same as attributes of R before decomposition.
ii) attribute(R1) ∩ attribute(R2 ≠ Φ : i.e at least one attribute has to be there in the decomposed relations.
iii) attribute(R1) U attribute(R2)=attribute(R1)
or
attribute(R1) U attribute(R2)=attribute(R2)
Or BOTH
i.e. the common attribute should be either a Super key or a candidate key in either R1 or R2 or both.

March 27, 2024 Department of Biotechnology, GIT Course Code and 68


Course Title:
Properties of Decomposition
• 2) Dependency Preserving

Let R be a relation schema that is decomposed into two schemas with attribute sets
X and Y, and let F be a set of FDs over R. The projection of F on X is the set of
FDs in the closure F + (not just F !) that involve only attributes in X. We will denote
the projection of F on attributes X as FX . Note that a dependency U → V in F + is
in FX only if all the attributes in U and V are in X.
The decomposition of relation schema R with FDs F into schemas with attribute sets
X and Y is dependency-preserving if (FX ∪ FY )+ = F +.

March 27, 2024 Department of Biotechnology, GIT Course Code and 69


Course Title:
Properties of Decomposition
• 2) Dependency Preserving

What ever dependencies exited before decomposition in R, the same FDs must be preserved in either of the
decomposed relations, then we say dependency preserving property is maintained.
Let R be the original FD,and F be the set of FDs over R.
R is decomposed into R1 with F1 FDs and R2 with F2 Fds.
The decomposition is said to be dependency preserving,iff the following holds good.
F1 U F2 ≡ F
Let F1 U F2 =G
Then G ≡ F
We say that G is equivalent to F, iff
G Covers F
And
F covers G
March 27, 2024 Department of Biotechnology, GIT Course Code and 70
Course Title:
Problems on Lossless join decomposition
1)R(ABCDE)
AB->CDE, D->A
• 2)R(ABCDE)
• A->BCDE, BC->ACE, D->E
• R(ABCDE)
• 11213
• 22213
• 31636
• 42857
• 53957
• 1)R1(ABC), R2(DE): lossy decomposition
• 2)R1(ABC), R2(CD) : lossy
• 3)R1ABC) R2(CDE):Lossless join Decomposition
• 4)R1(ABC), R2(ABDE): Lossless join Decomposition
• 5)R1(AB), R2(BCDE):lossy decomposition
• 6)R1(AB), R2(CD), R3(DE):lossy decomposition
March 27, 2024 Department of Biotechnology, GIT Course Code and 71
Course Title:
Problems on Dependency Preserving decomposition
1)R(ABCDE)
• Fds:{A->B,B->C,C->D,D->A}
• R is decomposed to R1(ABC) and R2(CDE).
• Is the decomposition Dependency Preserving decomposition?
ANSWER: it is Dependency Preserving decomposition—already discussed in class

2) 1)R(ABCDE)
• Fds:{A->BCD,B->AE,BC->AED,D->E,C->DE}
• R1(AB), R2(BC), R3(CDE)
• ANSWER: : it is Dependency Preserving decomposition—already discussed in class

March 27, 2024 Department of Biotechnology, GIT Course Code and 72


Course Title:
Example3:

R( A,B,C,D)
FD’s
A->B
B->C
C ->A
Find all the possible candidate keys

Search for the attribute which is not present in the RHS of FD


D is not present.
so D will be in the candidate key.
so apart from D, what all the other attributes are present.
so we have to start with D.
73
so we will find out the closure of D.
= {D}
We will take the combinations of the attribute with D.
= {D, A, B, C}  Candidate key
= {D, B, C, A}  Candidate key
= {D, C, A, B}  Candidate key

Therefore the candidate keys are {AD, BD, CD}

Number of candidate keys = 3

March 27, 2024 Department of Biotechnology, GIT Course Code and 74


Course Title:
Example 4:

75
Therefore the candidate keys are { DA, DB, DE, DF }
Number of candidate keys = 4 76

You might also like