You are on page 1of 41

20CS07 Data Base Management

Systems
Program & Semester: B.Tech & III SEM

AI&DS
Academic Year: 2023 - 24

U N I T III
Normalization
Following determine the quality of relation schema design:
 Making sure that the semantics of the attributes is clear in the schema
 Reducing the redundant information in tuples
 Reducing the NULL values in tuples
 Disallowing the possibility of generating spurious tuples.

Functional Dependency
A functional dependency, denoted by X → Y, between two sets of attributes X and Y
that are subsets of R specifies a constraint on the possible tuples that can form a
relation state r of R. The constraint is that, for any two tuples t1 and t2 in r that have
t1[X] = t2[X], they must also have t1[Y] = t2[Y].

Functional dependency (FD) is a set of constraints between two attributes in a


relation.
This means that the values of the Y component of a tuple in r depend on, or are
determined by, the values of the X component; alternatively, the values of the X
component of a tuple uniquely (or functionally) determine the values of the Y
component.
We also say that there is a functional dependency from X to Y, or that Y is
functionally dependent on X. The abbreviation for functional dependency is FD
or f.d. The set of attributes X is called the Determinant (left-hand side) of the FD,
and Y is called the dependent (right-hand side).

Ex: Zip_code → Area_code

Consider relation schema R = (A, B, C, G, H, I) and the set of


functional dependencies. We can represent functional dependencies as
A →B
A →C
CG→ H
CG→ I
B→H

Let F be a set of functional dependencies. The closure of F, denoted by F+, is the


set of all functional dependencies logically implied by F. Given F, we can
compute F+ directly from the formal definition of functional dependency.
Armstrong's Axioms
Axioms, or rules of inference, provide a simpler technique for reasoning
about functional dependencies.

Reflexivity rule
If α is a set of attributes and β ⊆ α, then α →β holds.
Augmentation rule
If α → β holds and γ is a set of attributes, then γα → γβ holds.
Transitivity rule
If α →β holds and β → γ holds, then α → γ holds.
Union rule
If α → β holds and α → γ holds, then α →βγ holds.
Decomposition rule
If α →βγ holds, then α → β holds and α →γ holds.
Pseudo transitivity rule
If α→β holds and γβ →δ holds, then αγ →δ holds.
Let us apply our rules to the example of schema R = (A, B, C, G, H, I) and the
set F of functional dependencies {A → B, A → C, CG → H, CG → I, B →
H}.

We can list several members of F+ here:


A → H. Since A → B and B → H hold, we apply the transitivity rule.
Observethat it was much easier to use Armstrong’s axioms to show that
A → H holds than it was to argue directly from the definitions, as we
did earlier in this section.
CG → HI . Since CG → H and CG → I , the union rule implies that CG → HI .
AG → I. Since A → C and CG → I, the pseudo transitivity rule implies that
AG → I holds.

Types of Functional Dependencies


 Trivial functional dependency
 non-trivial functional dependency
 Multivalued dependency
 Transitive dependency
Trivial functional dependency
The dependency of an attribute on a set of attributes is known as trivial functional
dependency if the set of determining attributes includes dependent attribute.
Symbolically:
A ->B is trivial functional dependency if B is a subset of A.
The following dependencies are also trivial: A->A & B->B

For example:
Consider a table with two columns Student_id and Student_Name.
{Student_Id, Student_Name} -> Student_Id is a trivial functional dependency as
Student_Id is a subset of {Student_Id, Student_Name}.

That makes sense because if we know the values of Student_Id


and
Student_Name then the value of Student_Id can be uniquely determined.

Also, Student_Id -> Student_Id & Student_Name -> Student_Name are trivial
dependencies too.
Non-trivial functional dependency
If a functional dependency X->Y holds true where Y is not a subset of X then
this dependency is called non -trivial Functional dependency.
For example:
An employee table with three attributes: emp_id, emp_name, emp_address.
The
following functional dependencies are non-trivial:
emp_id -> emp_name (emp_name is not a subset of emp_id)
emp_id -> emp_address (emp_address is not a subset of emp_id)

On the other hand, the following dependencies are trivial:


{emp_id, emp_name} -> emp_name [

Multivalued dependency:
MVD or multivalued dependency means that for a single value of attribute ‘a’
multiple values of attribute ‘b’ exist. Multivalued dependency occurs when
there are more than one independent multivalued attributes in a table.
For example:
Consider a bike manufacture company, which produces two colors (Black and
white) in each model every year.
bike_model manuf_year color
M1001 2007 Black
M1001 2007 Red
M2012 2008 Black
M2012 2008 Red
M2222 2009 Black
M2222 2009 Red

Here columns manuf_year and color are independent of each other and
dependent on bike_model. In this case these two columns are said to be
multivalued dependent on bike_model. These dependencies can be represented
like this:
bike_model ->-> manuf_year
bike_model ->-> color
Transitive dependency:
A functional dependency is said to be transitive if it is indirectly formed by
two functional dependencies.
X -> Z is a transitive dependency if the following three functional
dependencies hold true:
⚫ X->Y
⚫ Y does not ->X
⚫ Y->Z
Example: Let’s take an example to understand it better:
Book Author Author_age
Game of Thrones George R. R. Martin 66
Harry Potter J. K. Rowling 49
Dying of the Light George R. R. Marti 66

{Book} ->{Author} (if we know the book, we knows the author name)
{Author} does not ->{Book}
{Author} -> {Author_age}

Therefore, as per the rule of transitive dependency: {Book} -> {Author_age}


should hold, that makes sense because if we know the book name we can know
the author’s age.
Anomalies in DBMS
There are three types of anomalies that occur when the database is not
normalized. These are Insertion, update and deletion anomaly.

Update anomaly:
In the above table we have two rows for employee Rick as he belongs to two
departments of the company. If we want to update the address of Rick then we
have to update the same in two rows or the data will become inconsistent. If
somehow, the correct address gets updated in one department but not in other then
as per the database, Rick would be having two different addresses, which is not
correct and would lead to inconsistent data.
Insert anomaly:
Suppose a new employee joins the company, who is under training and currently
not assigned to any department then we would not be able to insert the data into
the table if emp_dept field doesn’t allow nulls.
Delete anomaly: Suppose, if at a point of time the company closes the
department D890 then deleting the rows that are having emp_dept as D890 would
also delete the information of employee Maggie since she is assigned only to this
department.
To overcome these anomalies, we need to normalize the data.

Normalization
Normalization is a process of organizing the data in database for avoiding data
redundancy which solves the problem of insertion anomaly, update anomaly &
deletion anomaly.

 First normal form(1NF)


 Second normal form(2NF)
 Third normal form(3NF)
 Boyce & Codd normal form (BCNF)
First Normal form (1NF)
It states that the domain of an attribute must include only atomic
(simple, indivisible) values and the value of any attribute in a tuple must
be a single value from the domain of that attribute.
As per the rule of first normal form, an attribute (column) of a table cannot hold
multiple values. The only attribute values permitted by 1NF are single atomic
(or indivisible) values.

After applying 1NF


Second Normal Form (2NF)
Second normal form (2NF) is based on the concept of full functional dependency.

A functional dependency X → Y is a full functional dependency if removal of


any attribute value A from X means that the dependency does not hold any more;
that is, for any attribute value A ε X, (X – {A}) does not functionally determine Y.

A functional dependency X→Y is a partial dependency if some attribute value A ε


X can be removed from X and the dependency still holds; that is, for some A ε
X, (X – {A}) → Y.
Partial Dependency
If the proper subset of candidate key determines non-prime attribute, it is called
partial dependency.

Prime attribute: An attribute, which is a part of the candidate-key, is known as a


prime attribute.

Non-prime attribute: An attribute, which is not a part of the candidate-key, is


said to be a non-prime attribute.
Definition
A relation schema R is said to be in 2NF
If it is in 1NF and if every nonprime attribute of R is fully functionally dependent
on the primary key of R. (there should be no partial dependencies).

Candidate Keys: {teacher_id, subject}


Primary key: {teacher_id, subject}
Non-prime attribute: teacher_age

The table is in 1 NF because each attribute has atomic values. However, it is not in
2NF because non-prime attribute teacher_age is dependent on teacher_id alone
which is a proper subset of candidate key. i.e., teacher_id → teacher_age.
This violates the rule for 2NF as the rule says “no non-prime attribute is
dependent on the proper subset of any candidate key of the table”.
Third Normal Form (3NF)
According to Codd’s original definition, a relation schema R is in 3NF if it
satisfies 2NF and no nonprime attribute of R is transitively dependent on the
primary key (it should not have transitive dependencies).

A table design is said to be in 3NF if both the following conditions hold:


 Table must be in 2NF
 Transitive functional dependency of non-prime attribute on any super key
should be removed.

A relation is in 3NF if at least one of the following condition holds in every non-
trivial function dependency X –> Y
X is a super key.
Y is a prime attribute (each element of Y is part of some candidate key)
emp_distri
emp_id emp_name emp_zip emp_state emp_city
c t

1001 John 282005 UP Agra Dayal Bagh

1002 Ajeet 222008 TN Chennai M-City


Urrapakk
1006 Lora 282007 TN Chennai
a m
1101 Lilly 292008 UK Pauri Bhagwan
1201 Steve 222999 MP Gwalior Ratan

Here, emp_state, emp_city & emp_district dependent on emp_zip and, emp_zip


is dependent on emp_id that makes non-prime attributes (emp_state, emp_city &
emp_district) transitively dependent on super key (emp_id). This violates the rule
of 3NF.

To make this table complies with 3NF we have to break the table into two tables to
remove the transitive dependency:
Employee table
emp_id emp_name emp_zip
1001 John 282005
1002 Ajeet 222008
1006 Lora 282007
1101 Lilly 292008
1201 Steve 222999

Employee_zip

emp_zip emp_state emp_city emp_district

282005 UP Agra Dayal Bagh

222008 TN Chennai M-City

282007 TN Chennai Urrapakkam

292008 UK Pauri Bhagwan

222999 MP Gwalior Ratan


Boyce Codd normal form
(BCNF)
It is an advance version of 3NF that’s why it is also referred as 3.5NF. BCNF is
stricter than 3NF. A table complies with BCNF if it is in 3NF and for every
functional dependency X->Y, X should be the Super key of the table.

EMP_ID EMP_COU EMP_DEP DEPT_TYP EMP_DEP


NTRY T_NO E T

264 India 283 D394 Designing


264 India 300 D394 Testing
364 UK 232 D283 Stores
364 UK 549 D283 Developin
g

Functional dependencies in the table above:


emp_id -> emp_Country,
emp_dept_no -> {dept_type, emp_dept)
Candidate key: {emp_id, emp_dept_no}
The table is not in BCNF as neither emp_id nor emp_dept_no alone are keys. To
make the table comply with BCNF we can break the table in three tables like
this:
EMP_ID EMP_COUNTRY

264 India
364 UK

EMP_DEPT DEPT_TYPE EMP_DEPT_NO

Designing D394 283


Testing D394 300
Stores D283 232
Developing D283 549
EMP_ID EMP_DEPT_NO

264 283
264 300
364 232
364 549

Closure of Functional Dependencies


The set of all possible Functional Dependencies that can be derived from the
F is called closure of Functional Dependencies denoted by F+. To compute
closure, we should apply the following rules repeatedly for the given FD’s
called as Armstrong Axioms.
Consider a relation R (A, B, C, D) and F= {ABC, BCD}.
Let us consider ABC. From this functional dependency we can derive AB and
AC by using decomposition rule.

From BCD we can derive BC and BD by using decomposition rule.
Let us consider AB and BC. From these two functional dependencies we
can derive AC by applying Transitivity rule.
From AB and BD we can derive AD.
So, final F+ = {ABC, BCD, AB, AC, BC, BD, AC,
AD}.

Attribute Closure
Let XY be the Functional Dependency. Attribute closure of X denoted
with X+ with respect to F is the set of all attributes A such that XA can
be inferred using Armstrong Axioms.

Algorithm
Closure= X;
Repeat until there is no change in closure
{
If there is an FD UV in F
Such that U ⊆ Closure,
Then set Closure=
Closure 𝖴 V.
}
ABC, BCD.
Super Key
Let R be the relational schema, and X be the set of attributes over R. If X+
determine all the attributes of R, then X is said to be super key of R.

To Identify Super keys, we need to follow some steps


 Compute Closure for the attributes or combination of attributes on the
LHS of Functional Dependency.
 If any closure includes all the attributes, then that can be declared as a
key for the table.

Let R (ABCDE) is a relational schema with following functional dependencies.


AB → C
DE →
B CD
→E
Step 1:
Identify the
closure of
LHS of FD -
(AB)+ = ABC
No Super Key Found in step
1. Step 2:
If no super Key found from step 1, then follow step 2 to find
a New key by applying augment rule.
Apply Augment rule until all attributes are mentioned in the
closure
result. So, choosing (CD)+ as it contains more attributes than
others one
i.e. CDEB,
(ACD)+ = ABCDE {by augment Rule}
Hence (ACD)+ determines all the attributes of R. So
ACD is a Super Key.

As ACD is a super key, we make the combination of


remaining attributes
With ACD. So, superkey are -
ACDB,
ACDE,
ACDBE.
Step 3:
Follow step 3 to identify more superkey from new Superkey (ACD) by
applying Pseudo Transitive rule -Check the other Functional
Dependencies in which the LHS is a subset of new super key, and that
on its RHS contains some other attribute of new Superkey.
There is only one i.e. AB → C
Applying so gives you a key that are certainly superkey, but
not necessarily irreducible
Ones:
A (AB) D = ABD:
Superkey Other Super
Keys will be ADBE,
ABDC, (Already Found)
ABDCE. (Already Found)
Repeat the procedure again for the new superkey (ABD) till we get all superkey.
Step 3 Continued... (To find Superkey from ABD)
For ABD, We have a functional dependency again i.e. {DE → B}.
So, A (DE) D = ADE: Superkey
Other Super keys will be -
ADEB, (Already Found)
ADEC, (Already Found)
ADEBC. (Already Found)
After finding all super keys we need
to find the candidate key. Candidate
Key is a
Super Key whose no proper subset
is a Super key.

Problem 1
Given a relation R( A, B, C, D) and Functional Dependency set FD = { AB →
CD, B → C }, determine whether the given R is in 2NF? If not convert it into 2
NF.
Problem 2
Given a relation R( P, Q, R, S, T) and Functional Dependency set FD = { PQ → R,
S → T }, determine whether the given R is in 2NF? If not convert it into 2 NF.
Problem 3
Data Redundancy
Data redundancy in database means that some data fields are repeated in the
database. This data repetition may occur either if a field is repeated in two or
more tables or if the field is repeated within the table.
Disadvantages of data redundancy
1. Increases the size of the database unnecessarily.
2. Cause’s data inconsistency.
3. Decrease’s efficiency of database.
4. May cause data corruption.

Decomposition:
Decomposition is the process of dividing the normal form into tables for remove
the anamoly. Decomposition makes easy to find the data in database. It removes
the inconsistency, duplication. Decomposition means replacing a relation with a
collection of smaller relations.
Properties of Decomposition: While decomposing a relation R into R1, R2… Rn.
The decomposition must satisfy the following properties. Those are

 Lossless Join Decomposition.


 Dependency Preserving Decomposition.
Lossless Join Decomposition
Assume that a relation R with set of functional dependencies F. If R is
decomposed into relations R1 and R2, then this decomposition is said to be
lossless decomposition (or lossless-join decomposition) if and only if at least one
of the following functional dependencies holds in the closure of set of functional
dependencies F+ ;
R1 ∩ R2 → R1
or
R1 ∩ R2 → R2
R1 ∩ R2 gives you the attribute or set of attributes that is/are used in joining R1
and R2. The above functional dependencies ensure that the attributes involved
in the natural join of R1 and R2 are candidate keys for at least one of the
relations R1 and R2.
Consider the following relation R( A , B , C )-

R( A , B , C )
A B C

1 2 1
2 5 3
3 3 3
Consider this relation is decomposed into two sub relations R1( A , B ) and R2( B , C )-

The two sub relations are-

R1( A , B ) R2( B , C )

A B B C
1 2 2 1
2 5 5 3
3 3 3 3
The dependency preservation decomposition is another property of decomposed
relational database schema D in which each functional dependency X -> Y
specified in F either appeared directly in one of the relation schemas Ri in the
decomposed D or could be inferred from the dependencies that appear in some Ri.

The dependencies are preserved because each dependency in F represents a


constraint on the database. If decomposition is not dependency-preserving, some
dependency is lost in the decomposition.
Example:
Let a relation R(A,B,C,D) and set a FDs F = { A -> B , A -> C , C -> D} are
given.
A relation R is decomposed into
R1 = (A, B, C) with FDs F1 = {A -> B, A -> C},
and R2 = (C, D) with FDs F2 = {C -> D}.
F' = F1 𝖴 F2 = {A -> B, A -> C, C ->
D} so, F' = F.
And so, F'+ = F+
Algorithm
Input: X → Y in F and a decomposition of R {R1, R2, …, Rn}
Output: return true if X → Y is in G+, i.e., Y is a subset of Z else
return false
Begin
Z: = X;
while changes to Z occur do
For i: = 1 to n do
Z := Z 𝖴 ((Z ∩ Ri)+ ∩ Ri) w.r.t. F;
If Y is a subset of Z then return
true Else return false;
End;
Example
R (ABCDEF) has following FD’s F = {A→BCD, A→EF, BC→AD,
BC→E, BC→F, B→F, D→E} D = {ABCD, BF, DE} check whether
decomposition is dependency preserving or not.

Check whether BC → E preserved in the decomposition or not


How to find the highest normal form of a
relation
Steps to find the highest normal form of a relation:
1. Find all possible candidate keys of the relation.
2. Divide
all attributes into two categories: prime attributes and non-
prime attributes.
3. Checkfor 1st normal form then 2nd and so on. If it fails to satisfy nth
normal form condition, highest normal form will be n-1.
Example
Find the highest normal form of a relation R(A,B,C,D,E) with FD set as {BC-
>D,
AC->BE, B->E}
Step 1.
As we can see, (AC)+ = {A, C,B,E,D} but none of its subset can determine all
attribute of relation, So AC will be candidate key. A or C can’t be derived from
any other attribute of the relation, so there will be only 1 candidate key {AC}.

Step 2.
Prime attribute are those attribute which are part of candidate key {A, C} in this
example and others will be non-prime {B, D, E} in this example.
Step 3.
The relation R is in 1st normal form as a relational DBMS does not allow multi-
valued or composite attribute.

The relation is in 2nd normal form because BC->D is in 2nd normal form (BC is
not proper subset of candidate key AC) and AC->BE is in 2nd normal form (AC is
candidate key) and B->E is in 2nd normal form (B is not a proper subset of
candidate key AC).

The relation is not in 3rd normal form because in BC->D (neither BC is a super
key nor D is a prime attribute) and in B->E (neither B is a super key nor E is a
prime attribute) but to satisfy 3rd normal for, either LHS of an FD should be super
key or RHS should be prime attribute.

So the highest normal form of relation will be 2nd Normal form.


Find the highest normal form of a relation R(A,B,C,D,E) with FD set {A->D,
B->A, BC->D, AC->BE}

Find the highest normal form of a relation R (P, Q, R, S, T) with Functional


Dependency set (Q->P, P->R, QR->S, PR->QT).
Fourth Normal Form (4NF):
A relation R is in Fourth Normal Form (4NF) if and only if the
following conditions are satisfied simultaneously:
 R is already in 3NF or BCNF.
 If it contains no multi-valued dependencies.

Multi-Valued Dependency (MVD):


MVD is the dependency where one attribute value is potentially a 'multi-
valued fact' about another. MVD occurs when there are more than one
independent multivalued attributes in a table.
To understand it clearly, consider a table with Car_model, Manf_year and
color
In this example, maf_year and color are independent of each other but dependent
on car_model. In this example, these two columns are said to be multivalue
dependent on car_model.

This dependence can be represented like


this: car_model -> maf_year
car_model-> colour

Car_model Maf_year Color


H001 2017 Metallic
H001 2017 Green
H005 2018 Metallic
H005 2018 Blue
H010 2015 Metallic
H033 2012 Gray
To remove MVD and bring the above table into 4NF we will divide it into
two tables with ( car_mdl, Man_yr) as one table and (car-mdl, color) as
another table.
Fifth Normal Form (5NF)
A relation R is said to be in 5Th normal form if and only if it satisfies the
following conditions.
 R is already in 4NF
 A relation R is decomposed into two relations must have loss less
join dependency. (shouldn’t have join dependency)
or
 If we can decompose table further to eliminate redundancy
and anomaly, and when we re-join the decomposed tables by means
of candidate keys, we should not be losing the original data or any
new record set should not arise. In simple words, joining two or
more decomposed table should not lose records nor create new
records

Example

Consider the below schema, “if a company makes a product and an agent is an agent
for that company, then he always sells that product for the company”. Under these
circumstances, the ACP table is shown as:
The relation ACP is again decomposed into 3 relations. Now, the natural Join
of all the three relations will be shown as:
Result of natural join of R1 and R3 over ‘Company’ and then natural join of
R13
and R2 over ‘Agent’ and ‘Product’ is given below.

Hence, in this example, all the redundancies are eliminated, and the
decomposition of ACP is a lossless join decomposition. Hence the relation is in
5NF as it does not violate the property of lossless join.
Inclusion Dependencies
It is a statement of the form that some columns of a relation contained in
another column.
Example: Foreign Key.
 We should not split groups of attributes that participate in an inclusion
dependency.
For example AB is inclusion dependent on CD that means AB ⊆ CD. While
decomposing the above schema contains AB we should ensure that at least one of
the schema’s obtained in that decomposition contains both A and B.
 Most of the inclusion dependencies are key based.
 In E-R diagram IS A hierarchies also leads to key-based
inclusion dependencies.

You might also like