You are on page 1of 56

Chapter 14

Normalization

Ahmed M. Zeki

ITIS 213
Objectives
 The purpose of normalization.
 How normalization can be used when designing a relational database.
 The potential problems associated with redundant data in base relations.
 The concept of functional dependency, which describes the relationship
between attributes.
 The characteristics of functional dependencies used in normalization.
 How to identify functional dependencies for a given relation.
 How functional dependencies identify the primary key for a relation.
 How to undertake the process of normalization.
 How normalization uses functional dependencies to group attributes into
relations that are in a known normal form.
 How to identify the most commonly used normal forms, namely First Normal
Form (1NF), Second Normal Form (2NF), and Third Normal Form (3NF).
 The problems associated with relations that break the rules of 1NF, 2NF, or
3NF.
 How to represent attributes shown on a form as 3NF relations using
normalization.

2 / 56
Introduction
 The main objective of DB design is to create
an accurate representation of:
◦ data
◦ relationships between the data
◦ constraints on the data that is pertinent to the
enterprise

 To do so we used DB design techniques:


◦ ER modeling
◦ Normalization

3 / 56
Introduction
 Attributes describe some property of the data
or of the relationships between the data that
is important to the enterprise.
 Normalization examines the relationships

(called functional dependencies) between


attributes.
 It uses a series of tests (normal forms) to help

identify the optimal grouping for these


attributes to ultimately identify a set of
suitable relations that supports the data
requirements of the enterprise.
4 / 56
The Purpose of Normalization
 Normalization is a technique for producing a
set of relations with describe properties,
given the data requirements of an enterprise.

 Purpose:
◦ To identify a suitable set of relations that support
the data requirements of an enterprise.

5 / 56
The Purpose of Normalization
 Characteristics of a suitable set of relations:

◦ The minimal number of attributes necessary to


support the data requirements of the enterprise;

◦ Attributes with a close logical relationship are


found in the same relation;

◦ Minimal redundancy with each attribute


represented only once with the important exception
of attributes that form all or part of foreign keys

6 / 56
The Purpose of Normalization
  Benefit of a DB that has a suitable set of
relations:
◦ The DB will be easier for the user to access
◦ Easier to maintain the data
◦ Take up minimal storage space

7 / 56
Data Redundancy and Update
Anomalies
 Major aim of RDB design is to group
attributes into relations to minimize data
redundancy.

  benefit:
◦ Updates to the data stored in the DB are achieved
with a minimal number of operations  reducing
the opportunities for data inconsistencies.

◦ Reduction in the file storage space required by the


base relations thus minimizing costs.

8 / 56
Data Redundancy and Update
Anomalies
 RDB relies on the existence of a certain
amount of data redundancy.
 This redundancy is in the form of copies of

PKs acting as FKs in related relations.

StaffBranch is the
alternative format of the
Staff and Branch
relations, which contains
redundancy.

9 / 56
Problems Associated with Data
Redundancy
 Relations that have redundant data may have
problems called update anomalies, which are
classified as:
◦ Insertion anomalies
◦ Deletion anomalies
◦ Modification anomalies

10 / 56
Insertion Anomalies
 Types:
1. Every time we add a new staff into the
StaffBranch, we must include the correct branch.
 May lead to inconsistency !
 While Staff and Branch relations don’t suffer from this
potential inconsistency
2. To insert details of a new branch that currently
has no staff into the StaffBranch relation, it is
necessary to enter nulls into the attributes for
staff, such as staffNo. But staffNo is the PK 
filling it with nulls will violates entity integrity,
which is not allowed!
 While Staff and Branch relations don’t suffer from this
problem
11 / 56
Deletion Anomalies
 Deleting a tuple from StaffBranch which is the
last member of staff located at a branch 
the details about that branch are lost from DB
◦ While Staff and Branch relations don’t suffer from
this problem

12 / 56
Modification Anomalies
 Updating the address of a branch  we must
update the tuples of all staff located at that
branch, if not  inconsistency

 Using Staff and Branch instead of StaffBranch


is called decomposition of a larger relation
into smaller relations

13 / 56
Modification Anomalies
 Prosperities of decomposition:
◦ Loosless-join property: ensures that any instance of
the original relation can be identified from
corresponding instances in the smaller relations

◦ Dependency preservation property: ensures that a


constraint on the original relation can be
maintained by simply enforcing some constraint on
each of the smaller relations.

14 / 56
Functional Dependencies (FD)
 Describe the relationship between attributes
in a relation.
◦ Ex: If A and B are attributes of relation R, B is
functionally dependent on A (represented as A 
B), if each value of A is associated with exactly one
value of B.

 The reverse of FD is determinant, i.e “A is a


determinant of B”

15 / 56
Functional Dependencies (FD)
 FD is a property of the meaning (semantics)
of the attributes in a relation. The semantics
indicate how attributes relate to one another,
and specify the FDs between attributes.

 When a FD is present, the dependency is


specified as a constraint between the
attributes.

16 / 56
Functional Dependencies (FD)
 A  B means
◦ for a given value of A we find only one value of B
 (i.e. when two tuples have the same value of A they
must have the same value of B)
◦ but for a given value of B there may be several
values of A

17 / 56
Ex1: FD
 For a given staffNo we can
determine the position of that staff
◦ Ex: SL21  Manager but Manager  ?
 i.e staffNo functionally determines position (1:1)
but not the opposite (1:*)
 In Normalization we are
interested to identify FDs
between attributes of a
relation that have a 1:1
relationship between the
attributes that makes up the
determinant (left) and the
attributes (right). 18 / 56
Ex2 FD that holds for all time

 Given one staffNo, we can


determine the sName of the staff
◦ The relationship between staffNo and sNAme is 1:1
i.e for each staffNo there is only one name

 Also given one staff name we can determine


the staffNo, but this is true in this example
but may not be true all the time.
◦ The relationship between sNAme and staffNo is 1:*
i.e there can be several staffNo associated with a
name

19 / 56
Ex2: FD that Holds for All Time
 So we have to clearly understand the purpose
of each attribute in that relation.
◦ Ex: the purpose of staffNo is to uniqely identify
each staff, but the purpose of sName is to hold the
names of staff.

 FD is a property of a relational schema


(intension) and not a property of a particular
instance of the schema (extension)

20 / 56
Full FD
 The determinants should have the minimal
number of attributes necessary to maintain
the FD with the attributes on the right hand-
side.
 If A and B are attributes in a relation, B is fully

FD on A if B is FD on A but not on any proper


set of A.
◦ a FD A  B is a full FD if removal of any attribute
from A result in the dependency no longer existing.
◦ a FD A  B is a partially dependency if there is
some attribute that can be removed from A and yet
the dependency still holds.
21 / 56
Ex3: Full FD
 Consider the following FD
that exists in the Staff relation
staffNo, sName  branchNo
 It is partial dependency because each value of

(staffNo, sName) is associated with a single


value of branchNo, but it is not a full FD
because branchNo is also FD on a subset of
(Staff, sName), namely staffNo.
 So staffNo  branchNo is a full FD

22 / 56
Summary: Characteristics of FDs we
use in Normalization
 There is a 1:1 relationship between the
attribute(s) on the left side and those on the
right side of the FD (not the other direction).
 They hold for all time
 The determinant has the minimal number of

attributes necessary to maintain the


dependency with the attribute(s) on the right
side. i.e there must be a full FD between the
attribute(s) on the left and right sides of the
dependency.

23 / 56
Transitive Dependency (TD)
 Its existence in a relation can potentially
cause the types of update anomaly.

 Describes a condition where A, B, and C are


attributes of a relation such that if A → B and
B → C, then C is transitively dependent on A
via B (provided that A is not functionally
dependent on B or C).

24 / 56
Ex4: of TD
 staffNo  sName, position, salary, branchNo, bAddress
 branchNo  bAddress

 The TD branchNo  bAddress exists on


staffNo via branchNo.
 i.e staffNo FD the bAddress via branchNo and

neither branchNo nor bAddress FDs staffNo.

25 / 56
Identifying FDs
 If the meaning of the attributes and
relationships are well understood  the
identification of FD should be simple.

 If information is not available  use common


sense.

26 / 56
EX5: Identify FDs
 Examine the semantic of the attributes in the
StaffBranch
 Assume that the position held and the branch

determine the salary.


 staffNo  sName, postion, salary, branchNo, bAddress
 branchNo  bAddress
 bAddress  branchNo
 branchNo, position  salary
 bAddress, position  salary

27 / 56
Ex6: absence of Info, using sample
data

28 / 56
Ex6: absence of Info, using sample
data
 Assume that the data values in the table are
representative of all possible values that can
be held by attributes A, B, C, D, and E.
 Examine the Sample relation and identify

when values in one column are consistent


with the presence of particular values in other
columns. Go from left to right.
 Then look at combinations of columns, i.e

when values in 2 or more columns are


consistent with the appearance of values in
other columns.
29 / 56
Ex6: absence of Info, using sample
data
 When a appears in A, z appears in C, and
when e appears in A, r appears in C.
Hence there is 1:1 relationship between A & C
i.e A functionally determines C (fd1)
:
:
 A  C (fd1)
 C  A (fd2)
 B  D (fd3)
 A, B  E (fd4)

30 / 56
Identifying the PK for a Relation
using FD
 Main purpose of identifying a set of FDs for a
relation is to specify the set of integrity
constraints that must hold on a relation.
◦ Most important one is the PK

31 / 56
Ex7: Identifying the PK for a Relation
using FD
 Refer to Ex5, we have found 5 FDs
◦ Identify attributes (or group of them) that uniquely
identifies each tuple in this relation.
◦ The only candidate key of the StaffBranch relation
and therefore the PK is staffNo, as all other
attributes are FD on staffNo.
◦ If a relation has more than one candidate key, we
identify the one that is to act as the PK for the
relation.
◦ All attributes that
are not part of the
PK should be FD on
the key.
32 / 56
Ex8: Identifying PK for Sample
Relation
 Refer to Ex6. We have found 4 FDs
 Examine each determinant for each FD to

identify the candidate key.


 A suitable determinant must functionally

determine the other attributes. The only one


that functionally determine all the other
attributes is (A,B).
 i.e. the attributes that make

up the determinant (A,B) can


determine all the other
attributes in the relation either separately as
A or B or together as (A,B). 33 / 56
The Process of Normalization
 Normalization is a formal technique of
analyzing relations based on their PK (or
candidate keys) and FDs.
 Involves a series of rules to test individual

relations.
 When requirement is not met, the relation

violating the requirement must be


decomposed into relations that individually
meet the requirement of normalization.

34 / 56
The Process of Normalization
 All NF are based on the concept of FDs
except 1NF.
 NF beyond 3NF are very rare.
 As normalization proceeds, the relations

become progressively more restricted


(stronger) in format and also less vulnerable
to update anomalies.

35 / 56
The Process of Normalization
 It is a bottom-up approach

The output of 1NF in


some cases is already
in 2NF.

36 / 56
Unnormalized Form (UNF)
 A table that contains one or more repeating
groups.

37 / 56
First Normal Form (1NF)
 A relation in which the intersection of each
row and column contains one and only one
value.

38 / 56
UNF to 1NF
 Identify and remove repeating groups.
◦ Repeating groups: is an attribute, or group of
attributes, within a table that occurs with multiple
values for a single occurrence.

39 / 56
UNF to 1NF
 Two approaches to remove repeating groups:
◦ Flattening the table: entering appropriate data in
the empty columns of rows containing the
repeating data. i.e we fill in the blanks by
duplicating the non-repeating data, where
required.

◦ Placing the repeating data, along with a copy of the


original key attribute(s), in a separate relation.
 Sometimes the unnormalized table may contain more
than one repeating group, or repeating groups within
repeating groups. In such cases, this approach is
applied repeatedly until no repeating groups remain.

40 / 56
UNF to 1NF
 Both approaches are correct and in 1NF, but
the 1st introduces more redundancy than the
original UNF. While the 2nd produces more
tables with less redundancy than in the
original UNF table.

 i.e. approach 2 moves the original UNF table


further along the normalization process than
approach 1.

41 / 56
Ex9: 1NF
 John Kay is leasing a property
 Assume that a client rents a
given property only once and
can’t rent more than one
property at any one time.
 Unnormalized Table 
1. Identify the key attribute 2. Identify the repeating group

42 / 56
Ex9: 1NF
 UNF  1NF using approach 1:

43 / 56
Ex9: 1NF
rentFinish is not appropriate as
a component of a candidate
 Identify the FDs key as it may contain nulls

 Use FDs to identify candidate keys:


◦ clientNo, propertyNo  chose this as PK)
◦ clientNo, rentStart
◦ propertyNo, rentStart 44 / 56
Ex9: 1NF
 UNF  1NF using approach 2:
 Remove repeating group by placing them

along with the original key attribute (clientNo)


in a separate table.
With the help of the FDs, identify
the PK for both relations.
Now both tables are in 1NF.

45 / 56
Second Normal Form (2NF)
 The 1NF produced (from both approaches)
contains significant amount of redundancy.
 2NF is based on the concept of full FD.
 2NF applies to relations with composite keys.
 A relation with a single-attribute PK is already

in at least 2NF.
 A relation not in 2NF may suffer from the

update anomalies.
 2NF is a relation that is in 1NF and every non

PK attribute is fully FD on the PK, i.e. no PDs.

46 / 56
2NF
 Normalization of 1NF relation to 2NF involves
the removal of PD by placing them in a new
relation along with a copy of their
determinant.

47 / 56
Ex10: 2NF
 Rewriting the FDs

fd1 clientNo, propertyNo  rentStart, rentFinish


(PK)
fd2 clientNo  cName
(PD)
fd3 propertyNo  pAddress, rent, ownerNo, oName
(PD)
fd4 ownerNo  oName
(TD)
fd5 clientNo, rentStart  propertyNo, pAddress, rentFinish,
rent, ownerNo, oName (CK)
fd6 prepertyNo, rentStart  clientNo, cName, rentFinish
(CK)

48 / 56
Ex10: 2NF
 Use those FDs
to test whether
the ClientRental relation is in 2NF by
identifying the presence of any PDs on the
PK.
 Note that:
fd2 clientNo  cName
(PD)
fd3 propertyNo  pAddress, rent, ownerNo, oName
(PD)
 i.e it is not in 2NF

49 / 56
Ex10: 2NF
 To transform it
to 2NF:
◦ Create new
tables so that the
non-PK
attributes are
removed along with a copy of the part of the PK on
which they are fully FD.

Client (clientNo, cName)


Rental (clientNo, propertyNo, rentStart, rentFinish)
PropertyOwner (propertyNo, pAddress, rent, ownerNo, oName)

50 / 56
Third Normal Form (3NF)
 2NF still suffers from update
anomalies.
◦ Eg. Updating the name of Tony Shaw, we need to
do it twice in the PropertyOwner table
 This problem is because the TD
 3NF is a relation that is in the 1NF and 2NF

and in which no non-PK attribute is TD on the


PK.
 To transform 2NF to 3NF remove the TD by

placing the attributes in a new relation along


with a copy of the determinant.
51 / 56
Ex11. 3NF

 From Ex10:
Client
fd2 clientNo  cName
(PK)
Client & Rental relations have no TD,
hence already in 3NF
Rental
fd1 clientNo, propertyNo  rentStart, rentFinish
(PK)
Fd5’ clientNo, rentStart  PropertyNo, rentFinish
(CK)
Fd6’ prepertyNo, rentStart  clientNo, rentFinish
(CK)

Remove the TD by creating two new


PropertyOwner relations PropertyForRent & Owner
fd3 propertyNo  pAddress, rent, ownerNo, oName 52 / 56
Ex11. 3NF

 Both tables are in 3NF because no more TD

53 / 56
Summary
The ClientRental table has been
transformed by normalization into
4 relations in 3NF

Client (clientNo, cName)


Rental (clientNo, propertyNo, rentStart, rentFinish)
PropertyForRent (propertyNo, pAddress, rent, ownerNo)
Owner (ownerNo, oName)

54 / 56
General Definitions of 2NF & 3NF
 Those definitions take into account other
candidate keys of a relation if exist.
 2NF: a relation that is in 1NF and every non-

candidate key attribute is fully FD on any


candidate key.
 3NF: a relation that is in 1NF and 2NF and in

which no non-candidate-key attribute is TD


on any candidate key.
 i.e when using those definitions, we must be

aware of PDs and TDs on all candidate keys


and not just the PK.
55 / 56
General Definitions of 2NF & 3NF
 The general definitions may reveal some
hidden redundancy.
 Following the general definitions may

increase the complexity.


 Tradeoff is possible.
 In most of the cases the same decomposition

of relations will be produced in both cases.

56 / 56

You might also like