Professional Documents
Culture Documents
Normalization
Ahmed M. Zeki
ITIS 213
Objectives
The purpose of normalization.
How normalization can be used when designing a relational database.
The potential problems associated with redundant data in base relations.
The concept of functional dependency, which describes the relationship
between attributes.
The characteristics of functional dependencies used in normalization.
How to identify functional dependencies for a given relation.
How functional dependencies identify the primary key for a relation.
How to undertake the process of normalization.
How normalization uses functional dependencies to group attributes into
relations that are in a known normal form.
How to identify the most commonly used normal forms, namely First Normal
Form (1NF), Second Normal Form (2NF), and Third Normal Form (3NF).
The problems associated with relations that break the rules of 1NF, 2NF, or
3NF.
How to represent attributes shown on a form as 3NF relations using
normalization.
2 / 56
Introduction
The main objective of DB design is to create
an accurate representation of:
◦ data
◦ relationships between the data
◦ constraints on the data that is pertinent to the
enterprise
3 / 56
Introduction
Attributes describe some property of the data
or of the relationships between the data that
is important to the enterprise.
Normalization examines the relationships
Purpose:
◦ To identify a suitable set of relations that support
the data requirements of an enterprise.
5 / 56
The Purpose of Normalization
Characteristics of a suitable set of relations:
6 / 56
The Purpose of Normalization
Benefit of a DB that has a suitable set of
relations:
◦ The DB will be easier for the user to access
◦ Easier to maintain the data
◦ Take up minimal storage space
7 / 56
Data Redundancy and Update
Anomalies
Major aim of RDB design is to group
attributes into relations to minimize data
redundancy.
benefit:
◦ Updates to the data stored in the DB are achieved
with a minimal number of operations reducing
the opportunities for data inconsistencies.
8 / 56
Data Redundancy and Update
Anomalies
RDB relies on the existence of a certain
amount of data redundancy.
This redundancy is in the form of copies of
StaffBranch is the
alternative format of the
Staff and Branch
relations, which contains
redundancy.
9 / 56
Problems Associated with Data
Redundancy
Relations that have redundant data may have
problems called update anomalies, which are
classified as:
◦ Insertion anomalies
◦ Deletion anomalies
◦ Modification anomalies
10 / 56
Insertion Anomalies
Types:
1. Every time we add a new staff into the
StaffBranch, we must include the correct branch.
May lead to inconsistency !
While Staff and Branch relations don’t suffer from this
potential inconsistency
2. To insert details of a new branch that currently
has no staff into the StaffBranch relation, it is
necessary to enter nulls into the attributes for
staff, such as staffNo. But staffNo is the PK
filling it with nulls will violates entity integrity,
which is not allowed!
While Staff and Branch relations don’t suffer from this
problem
11 / 56
Deletion Anomalies
Deleting a tuple from StaffBranch which is the
last member of staff located at a branch
the details about that branch are lost from DB
◦ While Staff and Branch relations don’t suffer from
this problem
12 / 56
Modification Anomalies
Updating the address of a branch we must
update the tuples of all staff located at that
branch, if not inconsistency
13 / 56
Modification Anomalies
Prosperities of decomposition:
◦ Loosless-join property: ensures that any instance of
the original relation can be identified from
corresponding instances in the smaller relations
14 / 56
Functional Dependencies (FD)
Describe the relationship between attributes
in a relation.
◦ Ex: If A and B are attributes of relation R, B is
functionally dependent on A (represented as A
B), if each value of A is associated with exactly one
value of B.
15 / 56
Functional Dependencies (FD)
FD is a property of the meaning (semantics)
of the attributes in a relation. The semantics
indicate how attributes relate to one another,
and specify the FDs between attributes.
16 / 56
Functional Dependencies (FD)
A B means
◦ for a given value of A we find only one value of B
(i.e. when two tuples have the same value of A they
must have the same value of B)
◦ but for a given value of B there may be several
values of A
17 / 56
Ex1: FD
For a given staffNo we can
determine the position of that staff
◦ Ex: SL21 Manager but Manager ?
i.e staffNo functionally determines position (1:1)
but not the opposite (1:*)
In Normalization we are
interested to identify FDs
between attributes of a
relation that have a 1:1
relationship between the
attributes that makes up the
determinant (left) and the
attributes (right). 18 / 56
Ex2 FD that holds for all time
19 / 56
Ex2: FD that Holds for All Time
So we have to clearly understand the purpose
of each attribute in that relation.
◦ Ex: the purpose of staffNo is to uniqely identify
each staff, but the purpose of sName is to hold the
names of staff.
20 / 56
Full FD
The determinants should have the minimal
number of attributes necessary to maintain
the FD with the attributes on the right hand-
side.
If A and B are attributes in a relation, B is fully
22 / 56
Summary: Characteristics of FDs we
use in Normalization
There is a 1:1 relationship between the
attribute(s) on the left side and those on the
right side of the FD (not the other direction).
They hold for all time
The determinant has the minimal number of
23 / 56
Transitive Dependency (TD)
Its existence in a relation can potentially
cause the types of update anomaly.
24 / 56
Ex4: of TD
staffNo sName, position, salary, branchNo, bAddress
branchNo bAddress
25 / 56
Identifying FDs
If the meaning of the attributes and
relationships are well understood the
identification of FD should be simple.
26 / 56
EX5: Identify FDs
Examine the semantic of the attributes in the
StaffBranch
Assume that the position held and the branch
27 / 56
Ex6: absence of Info, using sample
data
28 / 56
Ex6: absence of Info, using sample
data
Assume that the data values in the table are
representative of all possible values that can
be held by attributes A, B, C, D, and E.
Examine the Sample relation and identify
30 / 56
Identifying the PK for a Relation
using FD
Main purpose of identifying a set of FDs for a
relation is to specify the set of integrity
constraints that must hold on a relation.
◦ Most important one is the PK
31 / 56
Ex7: Identifying the PK for a Relation
using FD
Refer to Ex5, we have found 5 FDs
◦ Identify attributes (or group of them) that uniquely
identifies each tuple in this relation.
◦ The only candidate key of the StaffBranch relation
and therefore the PK is staffNo, as all other
attributes are FD on staffNo.
◦ If a relation has more than one candidate key, we
identify the one that is to act as the PK for the
relation.
◦ All attributes that
are not part of the
PK should be FD on
the key.
32 / 56
Ex8: Identifying PK for Sample
Relation
Refer to Ex6. We have found 4 FDs
Examine each determinant for each FD to
relations.
When requirement is not met, the relation
34 / 56
The Process of Normalization
All NF are based on the concept of FDs
except 1NF.
NF beyond 3NF are very rare.
As normalization proceeds, the relations
35 / 56
The Process of Normalization
It is a bottom-up approach
36 / 56
Unnormalized Form (UNF)
A table that contains one or more repeating
groups.
37 / 56
First Normal Form (1NF)
A relation in which the intersection of each
row and column contains one and only one
value.
38 / 56
UNF to 1NF
Identify and remove repeating groups.
◦ Repeating groups: is an attribute, or group of
attributes, within a table that occurs with multiple
values for a single occurrence.
39 / 56
UNF to 1NF
Two approaches to remove repeating groups:
◦ Flattening the table: entering appropriate data in
the empty columns of rows containing the
repeating data. i.e we fill in the blanks by
duplicating the non-repeating data, where
required.
40 / 56
UNF to 1NF
Both approaches are correct and in 1NF, but
the 1st introduces more redundancy than the
original UNF. While the 2nd produces more
tables with less redundancy than in the
original UNF table.
41 / 56
Ex9: 1NF
John Kay is leasing a property
Assume that a client rents a
given property only once and
can’t rent more than one
property at any one time.
Unnormalized Table
1. Identify the key attribute 2. Identify the repeating group
42 / 56
Ex9: 1NF
UNF 1NF using approach 1:
43 / 56
Ex9: 1NF
rentFinish is not appropriate as
a component of a candidate
Identify the FDs key as it may contain nulls
45 / 56
Second Normal Form (2NF)
The 1NF produced (from both approaches)
contains significant amount of redundancy.
2NF is based on the concept of full FD.
2NF applies to relations with composite keys.
A relation with a single-attribute PK is already
in at least 2NF.
A relation not in 2NF may suffer from the
update anomalies.
2NF is a relation that is in 1NF and every non
46 / 56
2NF
Normalization of 1NF relation to 2NF involves
the removal of PD by placing them in a new
relation along with a copy of their
determinant.
47 / 56
Ex10: 2NF
Rewriting the FDs
48 / 56
Ex10: 2NF
Use those FDs
to test whether
the ClientRental relation is in 2NF by
identifying the presence of any PDs on the
PK.
Note that:
fd2 clientNo cName
(PD)
fd3 propertyNo pAddress, rent, ownerNo, oName
(PD)
i.e it is not in 2NF
49 / 56
Ex10: 2NF
To transform it
to 2NF:
◦ Create new
tables so that the
non-PK
attributes are
removed along with a copy of the part of the PK on
which they are fully FD.
50 / 56
Third Normal Form (3NF)
2NF still suffers from update
anomalies.
◦ Eg. Updating the name of Tony Shaw, we need to
do it twice in the PropertyOwner table
This problem is because the TD
3NF is a relation that is in the 1NF and 2NF
From Ex10:
Client
fd2 clientNo cName
(PK)
Client & Rental relations have no TD,
hence already in 3NF
Rental
fd1 clientNo, propertyNo rentStart, rentFinish
(PK)
Fd5’ clientNo, rentStart PropertyNo, rentFinish
(CK)
Fd6’ prepertyNo, rentStart clientNo, rentFinish
(CK)
53 / 56
Summary
The ClientRental table has been
transformed by normalization into
4 relations in 3NF
54 / 56
General Definitions of 2NF & 3NF
Those definitions take into account other
candidate keys of a relation if exist.
2NF: a relation that is in 1NF and every non-
56 / 56