Chapter (14) - Normalization

Chapter 14
Normalization
Ahmed M. Zeki
ITIS 213
Objectives
 The purpose of normalization.
 How normalization can be used when designing a relational database.
 The potential problems associated with redundant data in base relations.
 The concept of functional dependency, which describes the relationship
between attributes.
 The characteristics of functional dependencies used in normalization.
 How to identify functional dependencies for a given relation.
 How functional dependencies identify the primary key for a relation.
 How to undertake the process of normalization.
 How normalization uses functional dependencies to group attributes into
relations that are in a known normal form.
 How to identify the most commonly used normal forms, namely First Normal
Form (1NF), Second Normal Form (2NF), and Third Normal Form (3NF).
 The problems associated with relations that break the rules of 1NF, 2NF, or
3NF.
 How to represent attributes shown on a form as 3NF relations using
normalization.
2 / 56
Introduction
 The main objective of DB design is to create
an accurate representation of:
◦ data
◦ relationships between the data
◦ constraints on the data that is pertinent to the
enterprise
 To do so we used DB design techniques:

◦ ER modeling
◦ Normalization
3 / 56
Introduction
 Attributes describe some property of the data
or of the relationships between the data that
is important to the enterprise.
 Normalization examines the relationships
(called functional dependencies) between

attributes.
 It uses a series of tests (normal forms) to help
identify the optimal grouping for these

attributes to ultimately identify a set of
suitable relations that supports the data
requirements of the enterprise.
4 / 56
The Purpose of Normalization
 Normalization is a technique for producing a
set of relations with describe properties,
given the data requirements of an enterprise.
 Purpose:
◦ To identify a suitable set of relations that support
the data requirements of an enterprise.
5 / 56
 Characteristics of a suitable set of relations:
◦ The minimal number of attributes necessary to

support the data requirements of the enterprise;
◦ Attributes with a close logical relationship are

found in the same relation;
◦ Minimal redundancy with each attribute

represented only once with the important exception
of attributes that form all or part of foreign keys
6 / 56
  Benefit of a DB that has a suitable set of
relations:
◦ The DB will be easier for the user to access
◦ Easier to maintain the data
◦ Take up minimal storage space
7 / 56
Data Redundancy and Update
Anomalies
 Major aim of RDB design is to group
attributes into relations to minimize data
redundancy.
  benefit:
◦ Updates to the data stored in the DB are achieved
with a minimal number of operations  reducing
the opportunities for data inconsistencies.
◦ Reduction in the file storage space required by the

base relations thus minimizing costs.
8 / 56
Data Redundancy and Update
Anomalies
 RDB relies on the existence of a certain
amount of data redundancy.
 This redundancy is in the form of copies of
PKs acting as FKs in related relations.
StaffBranch is the
alternative format of the
Staff and Branch
relations, which contains
redundancy.
9 / 56
Problems Associated with Data
Redundancy
 Relations that have redundant data may have
problems called update anomalies, which are
classified as:
◦ Insertion anomalies
◦ Deletion anomalies
◦ Modification anomalies
10 / 56
Insertion Anomalies
 Types:
1. Every time we add a new staff into the
StaffBranch, we must include the correct branch.
 May lead to inconsistency !
 While Staff and Branch relations don’t suffer from this
potential inconsistency
2. To insert details of a new branch that currently
has no staff into the StaffBranch relation, it is
necessary to enter nulls into the attributes for
staff, such as staffNo. But staffNo is the PK 
filling it with nulls will violates entity integrity,
which is not allowed!
 While Staff and Branch relations don’t suffer from this
problem
11 / 56
Deletion Anomalies
 Deleting a tuple from StaffBranch which is the
last member of staff located at a branch 
the details about that branch are lost from DB
◦ While Staff and Branch relations don’t suffer from
this problem
12 / 56
Modification Anomalies
 Updating the address of a branch  we must
update the tuples of all staff located at that
branch, if not  inconsistency
 Using Staff and Branch instead of StaffBranch

is called decomposition of a larger relation
into smaller relations
13 / 56
Modification Anomalies
 Prosperities of decomposition:
◦ Loosless-join property: ensures that any instance of
the original relation can be identified from
corresponding instances in the smaller relations
◦ Dependency preservation property: ensures that a

constraint on the original relation can be
maintained by simply enforcing some constraint on
each of the smaller relations.
14 / 56
Functional Dependencies (FD)
 Describe the relationship between attributes
in a relation.
◦ Ex: If A and B are attributes of relation R, B is
functionally dependent on A (represented as A 
B), if each value of A is associated with exactly one
value of B.
 The reverse of FD is determinant, i.e “A is a

determinant of B”
15 / 56
 FD is a property of the meaning (semantics)
of the attributes in a relation. The semantics
indicate how attributes relate to one another,
and specify the FDs between attributes.
 When a FD is present, the dependency is

specified as a constraint between the
attributes.
16 / 56
 A  B means
◦ for a given value of A we find only one value of B
 (i.e. when two tuples have the same value of A they
must have the same value of B)
◦ but for a given value of B there may be several
values of A
17 / 56
Ex1: FD
 For a given staffNo we can
determine the position of that staff
◦ Ex: SL21  Manager but Manager  ?
 i.e staffNo functionally determines position (1:1)
but not the opposite (1:*)
 In Normalization we are
interested to identify FDs
between attributes of a
relation that have a 1:1
relationship between the
attributes that makes up the
determinant (left) and the
attributes (right). 18 / 56
Ex2 FD that holds for all time
 Given one staffNo, we can

determine the sName of the staff
◦ The relationship between staffNo and sNAme is 1:1
i.e for each staffNo there is only one name
 Also given one staff name we can determine

the staffNo, but this is true in this example
but may not be true all the time.
◦ The relationship between sNAme and staffNo is 1:*
i.e there can be several staffNo associated with a
name
19 / 56
Ex2: FD that Holds for All Time
 So we have to clearly understand the purpose
of each attribute in that relation.
◦ Ex: the purpose of staffNo is to uniqely identify
each staff, but the purpose of sName is to hold the
names of staff.
 FD is a property of a relational schema

(intension) and not a property of a particular
instance of the schema (extension)
20 / 56
Full FD
 The determinants should have the minimal
number of attributes necessary to maintain
the FD with the attributes on the right hand-
side.
 If A and B are attributes in a relation, B is fully
FD on A if B is FD on A but not on any proper

set of A.
◦ a FD A  B is a full FD if removal of any attribute
from A result in the dependency no longer existing.
◦ a FD A  B is a partially dependency if there is
some attribute that can be removed from A and yet
the dependency still holds.
21 / 56
Ex3: Full FD
 Consider the following FD
that exists in the Staff relation
staffNo, sName  branchNo
 It is partial dependency because each value of
(staffNo, sName) is associated with a single

value of branchNo, but it is not a full FD
because branchNo is also FD on a subset of
(Staff, sName), namely staffNo.
 So staffNo  branchNo is a full FD
22 / 56
Summary: Characteristics of FDs we
use in Normalization
 There is a 1:1 relationship between the
attribute(s) on the left side and those on the
right side of the FD (not the other direction).
 They hold for all time
 The determinant has the minimal number of
attributes necessary to maintain the

dependency with the attribute(s) on the right
side. i.e there must be a full FD between the
attribute(s) on the left and right sides of the
dependency.
23 / 56
Transitive Dependency (TD)
 Its existence in a relation can potentially
cause the types of update anomaly.
 Describes a condition where A, B, and C are

attributes of a relation such that if A → B and
B → C, then C is transitively dependent on A
via B (provided that A is not functionally
dependent on B or C).
24 / 56
Ex4: of TD
 staffNo  sName, position, salary, branchNo, bAddress
 branchNo  bAddress
 The TD branchNo  bAddress exists on

staffNo via branchNo.
 i.e staffNo FD the bAddress via branchNo and
neither branchNo nor bAddress FDs staffNo.
25 / 56
Identifying FDs
 If the meaning of the attributes and
relationships are well understood  the
identification of FD should be simple.
 If information is not available  use common

sense.
26 / 56
EX5: Identify FDs
 Examine the semantic of the attributes in the
StaffBranch
 Assume that the position held and the branch
determine the salary.

 staffNo  sName, postion, salary, branchNo, bAddress
 branchNo  bAddress
 bAddress  branchNo
 branchNo, position  salary
 bAddress, position  salary
27 / 56
Ex6: absence of Info, using sample
data
28 / 56
data
 Assume that the data values in the table are
representative of all possible values that can
be held by attributes A, B, C, D, and E.
 Examine the Sample relation and identify
when values in one column are consistent

with the presence of particular values in other
columns. Go from left to right.
 Then look at combinations of columns, i.e
when values in 2 or more columns are

consistent with the appearance of values in
other columns.
29 / 56
data
 When a appears in A, z appears in C, and
when e appears in A, r appears in C.
Hence there is 1:1 relationship between A & C
i.e A functionally determines C (fd1)
:
:
 A  C (fd1)
 C  A (fd2)
 B  D (fd3)
 A, B  E (fd4)
30 / 56
Identifying the PK for a Relation
using FD
 Main purpose of identifying a set of FDs for a
relation is to specify the set of integrity
constraints that must hold on a relation.
◦ Most important one is the PK
31 / 56
Ex7: Identifying the PK for a Relation
using FD
 Refer to Ex5, we have found 5 FDs
◦ Identify attributes (or group of them) that uniquely
identifies each tuple in this relation.
◦ The only candidate key of the StaffBranch relation
and therefore the PK is staffNo, as all other
attributes are FD on staffNo.
◦ If a relation has more than one candidate key, we
identify the one that is to act as the PK for the
relation.
◦ All attributes that
are not part of the
PK should be FD on
the key.
32 / 56
Ex8: Identifying PK for Sample
Relation
 Refer to Ex6. We have found 4 FDs
 Examine each determinant for each FD to
identify the candidate key.

 A suitable determinant must functionally
determine the other attributes. The only one

that functionally determine all the other
attributes is (A,B).
 i.e. the attributes that make
up the determinant (A,B) can

determine all the other
attributes in the relation either separately as
A or B or together as (A,B). 33 / 56
The Process of Normalization
 Normalization is a formal technique of
analyzing relations based on their PK (or
candidate keys) and FDs.
 Involves a series of rules to test individual
relations.
 When requirement is not met, the relation
violating the requirement must be

decomposed into relations that individually
meet the requirement of normalization.
34 / 56
 All NF are based on the concept of FDs
except 1NF.
 NF beyond 3NF are very rare.
 As normalization proceeds, the relations
become progressively more restricted

(stronger) in format and also less vulnerable
to update anomalies.
35 / 56
 It is a bottom-up approach
The output of 1NF in

some cases is already
in 2NF.
36 / 56
Unnormalized Form (UNF)
 A table that contains one or more repeating
groups.
37 / 56
First Normal Form (1NF)
 A relation in which the intersection of each
row and column contains one and only one
value.
38 / 56
UNF to 1NF
 Identify and remove repeating groups.
◦ Repeating groups: is an attribute, or group of
attributes, within a table that occurs with multiple
values for a single occurrence.
39 / 56
UNF to 1NF
 Two approaches to remove repeating groups:
◦ Flattening the table: entering appropriate data in
the empty columns of rows containing the
repeating data. i.e we fill in the blanks by
duplicating the non-repeating data, where
required.
◦ Placing the repeating data, along with a copy of the

original key attribute(s), in a separate relation.
 Sometimes the unnormalized table may contain more
than one repeating group, or repeating groups within
repeating groups. In such cases, this approach is
applied repeatedly until no repeating groups remain.
40 / 56
UNF to 1NF
 Both approaches are correct and in 1NF, but
the 1st introduces more redundancy than the
original UNF. While the 2nd produces more
tables with less redundancy than in the
original UNF table.
 i.e. approach 2 moves the original UNF table

further along the normalization process than
approach 1.
41 / 56
Ex9: 1NF
 John Kay is leasing a property
 Assume that a client rents a
given property only once and
can’t rent more than one
property at any one time.
 Unnormalized Table 
1. Identify the key attribute 2. Identify the repeating group
42 / 56
Ex9: 1NF
 UNF  1NF using approach 1:
43 / 56
Ex9: 1NF
rentFinish is not appropriate as
a component of a candidate
 Identify the FDs key as it may contain nulls
 Use FDs to identify candidate keys:

◦ clientNo, propertyNo  chose this as PK)
◦ clientNo, rentStart
◦ propertyNo, rentStart 44 / 56
Ex9: 1NF
 UNF  1NF using approach 2:
 Remove repeating group by placing them
along with the original key attribute (clientNo)

in a separate table.
With the help of the FDs, identify
the PK for both relations.
Now both tables are in 1NF.
45 / 56
Second Normal Form (2NF)
 The 1NF produced (from both approaches)
contains significant amount of redundancy.
 2NF is based on the concept of full FD.
 2NF applies to relations with composite keys.
 A relation with a single-attribute PK is already
in at least 2NF.
 A relation not in 2NF may suffer from the
update anomalies.
 2NF is a relation that is in 1NF and every non
PK attribute is fully FD on the PK, i.e. no PDs.
46 / 56
2NF
 Normalization of 1NF relation to 2NF involves
the removal of PD by placing them in a new
relation along with a copy of their
determinant.
47 / 56
Ex10: 2NF
 Rewriting the FDs
fd1 clientNo, propertyNo  rentStart, rentFinish

(PK)
fd2 clientNo  cName
(PD)
fd3 propertyNo  pAddress, rent, ownerNo, oName
(PD)
fd4 ownerNo  oName
(TD)
fd5 clientNo, rentStart  propertyNo, pAddress, rentFinish,
rent, ownerNo, oName (CK)
fd6 prepertyNo, rentStart  clientNo, cName, rentFinish
(CK)
48 / 56
Ex10: 2NF
 Use those FDs
to test whether
the ClientRental relation is in 2NF by
identifying the presence of any PDs on the
PK.
 Note that:
(PD)
fd3 propertyNo  pAddress, rent, ownerNo, oName
(PD)
 i.e it is not in 2NF
49 / 56
Ex10: 2NF
 To transform it
to 2NF:
◦ Create new
tables so that the
non-PK
attributes are
removed along with a copy of the part of the PK on
which they are fully FD.
Client (clientNo, cName)

Rental (clientNo, propertyNo, rentStart, rentFinish)
PropertyOwner (propertyNo, pAddress, rent, ownerNo, oName)
50 / 56
Third Normal Form (3NF)
 2NF still suffers from update
anomalies.
◦ Eg. Updating the name of Tony Shaw, we need to
do it twice in the PropertyOwner table
 This problem is because the TD
 3NF is a relation that is in the 1NF and 2NF
and in which no non-PK attribute is TD on the

PK.
 To transform 2NF to 3NF remove the TD by
placing the attributes in a new relation along

with a copy of the determinant.
51 / 56
Ex11. 3NF
 From Ex10:
Client
(PK)
Client & Rental relations have no TD,
hence already in 3NF
Rental
fd1 clientNo, propertyNo  rentStart, rentFinish
(PK)
Fd5’ clientNo, rentStart  PropertyNo, rentFinish
(CK)
Fd6’ prepertyNo, rentStart  clientNo, rentFinish
(CK)
Remove the TD by creating two new

PropertyOwner relations PropertyForRent & Owner
fd3 propertyNo  pAddress, rent, ownerNo, oName 52 / 56
Ex11. 3NF
 Both tables are in 3NF because no more TD
53 / 56
Summary
The ClientRental table has been
transformed by normalization into
4 relations in 3NF
Client (clientNo, cName)

Rental (clientNo, propertyNo, rentStart, rentFinish)
PropertyForRent (propertyNo, pAddress, rent, ownerNo)
Owner (ownerNo, oName)
54 / 56
General Definitions of 2NF & 3NF
 Those definitions take into account other
candidate keys of a relation if exist.
 2NF: a relation that is in 1NF and every non-
candidate key attribute is fully FD on any

candidate key.
 3NF: a relation that is in 1NF and 2NF and in
which no non-candidate-key attribute is TD

on any candidate key.
 i.e when using those definitions, we must be
aware of PDs and TDs on all candidate keys

and not just the PK.
55 / 56
General Definitions of 2NF & 3NF
 The general definitions may reveal some
hidden redundancy.
 Following the general definitions may
increase the complexity.

 Tradeoff is possible.
 In most of the cases the same decomposition
of relations will be produced in both cases.
56 / 56

Chapter (14) - Normalization

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter (14) - Normalization

Uploaded by

Copyright:

Available Formats

Chapter 14

 To do so we used DB design techniques:

(called functional dependencies) between

identify the optimal grouping for these

◦ The minimal number of attributes necessary to

◦ Attributes with a close logical relationship are

◦ Minimal redundancy with each attribute

◦ Reduction in the file storage space required by the

PKs acting as FKs in related relations.

 Using Staff and Branch instead of StaffBranch

◦ Dependency preservation property: ensures that a

 The reverse of FD is determinant, i.e “A is a

 When a FD is present, the dependency is

 Given one staffNo, we can

 Also given one staff name we can determine

 FD is a property of a relational schema

FD on A if B is FD on A but not on any proper

(staffNo, sName) is associated with a single

attributes necessary to maintain the

 Describes a condition where A, B, and C are

 The TD branchNo  bAddress exists on

neither branchNo nor bAddress FDs staffNo.

 If information is not available  use common

determine the salary.

when values in one column are consistent

when values in 2 or more columns are

identify the candidate key.

determine the other attributes. The only one

up the determinant (A,B) can

violating the requirement must be

become progressively more restricted

The output of 1NF in

◦ Placing the repeating data, along with a copy of the

 i.e. approach 2 moves the original UNF table

 Use FDs to identify candidate keys:

along with the original key attribute (clientNo)

PK attribute is fully FD on the PK, i.e. no PDs.

fd1 clientNo, propertyNo  rentStart, rentFinish

Client (clientNo, cName)

and in which no non-PK attribute is TD on the

placing the attributes in a new relation along

Remove the TD by creating two new

 Both tables are in 3NF because no more TD

Client (clientNo, cName)

candidate key attribute is fully FD on any

which no non-candidate-key attribute is TD

aware of PDs and TDs on all candidate keys

increase the complexity.

of relations will be produced in both cases.

You might also like