Topic 3

Topic 3:Data normalization
B Y: H A R U N K A M A U
By: Harun kamau-maseno university CCS 319:Database Administration

What is Normalization?
Normalization is the process of removing redundant data from

your tables to improve storage efficiency, data integrity, and
scalability.
Normalization generally involves splitting existing tables into
multiple ones, which must be re-joined or linked each time a
query is issued.
Why normalization?
 The main goal of Database Normalization is to restructure the logical data
model of a database to:
 Eliminate Redundancy
 Organize Data Efficiently
 Reduce the potential for Data Anomalies.

Redundancy issues
 Remove Unnecessary data,

 Remove Duplicate data,
 Remove surplus data and
 Remove unneeded data
Duplication Vs Redundant Data
 Duplicated Data: When an attribute has two or more identical values
 Redundant Data: If you can delete data with a loss of information
Data Efficiency
 Data should be Competently,
 Data should be Resourcefully and
 Data should be Satisfactory.

Data Anomalies
Data anomalies are inconsistencies in the data stored in

a database as a result of an operation such as update,
insertion, and/or deletion.
Such inconsistencies may arise when have a particular
record stored in multiple locations and not all of the
copies are updated.
We can prevent such anomalies by implementing 7
different level of normalization called Normal Forms
(NF)

Functional Dependency
Functional dependency FD is a set of constraints between two attributes in a relation. Functional dependency says that
if two tuples have same values for attributes A1, A2,..., An, then those two tuples must have to have same values for
attributes B1, B2, ..., Bn.
Functional dependency is represented by an arrow sign → that is, X→Y, where X functionally determines Y. The left-
hand side attributes determine the values of attributes on the right-hand side.
Armstrong's Axioms
If F is a set of functional dependencies then the closure of F, denoted as F+ , is the set of all functional dependencies
logically implied by F. Armstrong's Axioms are a set of rules, that when applied repeatedly, generates a closure of
functional dependencies.
Reflexive rule - If alpha is a set of attributes and beta is subset of alpha, then alpha holds beta.
Augmentation rule - If a → b holds and y is attribute set, then ay → by also holds. That is adding attributes in
dependencies, does not change the basic dependencies.
Transitivity rule - Same as transitive rule in algebra, if a → b holds and b → c holds, then a → c also holds. a →
b is called as a functionally that determines b.
Trivial Functional Dependency
Trivial - If a functional dependency FD X → Y holds, where Y is a subset of X, then it is called a trivial FD. Trivial
FDs always hold.
Non-trivial - If an FD X → Y holds, where Y is not a subset of X, then it is called a non-trivial FD.
Completely non-trivial - If an FD X → Y holds, where x intersect Y = F, it is said to be a completely non-trivial
FD.

Why Normalization?
If a database design is not perfect, it may contain anomalies, which are like a
bad dream for any database administrator. Managing a database with anomalies
is next to impossible.
Update anomalies - If data items are scattered and are not linked to each
other properly, then it could lead to strange situations. For example, when we
try to update one data item having its copies scattered over several places, a
few instances get updated properly while a few others are left with old values.
Such instances leave the database in an inconsistent state.
Deletion anomalies - We tried to delete a record, but parts of it was left
undeleted because of unawareness, the data is also saved somewhere else.
Insert anomalies - We tried to insert data in a record that does not exist at
all.
Normalization is a method to remove all these anomalies and bring the
database to a consistent state.

Normalization types
Different Type of Normalization

In database design, we start with one single table, with all possible columns. A lot of
redundant data would be present since it’s a single table.
The process of removing the redundant data, by splitting up the table in a well defined
fashion is called normalization.
Levels of Normalization
First Normal Form (1NF)
Second Normal Form (2NF)
Third Normal Form (3NF)
Boyce-Codd Normal Form (BCNF)
Fourth Normal Form (4NF)
Fifth Normal Form (5NF)
Domain Key Normal Form (DKNF)
Most databases should be 3NF or BCNF in order to avoid the database anomalies.

Graphical representation
Each higher level is a subset of the lower level

Normalization in action
Database Normalization was first proposed by Edgar F.

Codd.
Codd defined the first three Normal Forms, which we’ll look
into, of the 7 known NormalForms.
In order to do normalization we must know what the
requirements are for each of the three Normal Forms that
we’ll go over.
One of the key requirements to remember is that Normal
Forms are progressive. That is, in order to have 3rd NF we
must have 2nd NF and in order to have 2nd NF we must have
1st NF.
First normal form
 A table is in its first normal form if it contains no repeating attributes or

groups of attributes
Non-Normalised Table

Converting to 1NF
 To convert data for unnormalized form to 1NF, simply convert

any repeated attributes into part of the candidate key
 STUDENT(Number, Name, Classes)=>classes
 Normalized table

First Normal Form (1NF) confirmation
The requirements to satisfy the 1stNF:

 Each table has a primary key: minimal set of attributes which can uniquely
identify a record
 The values in each column of a table are atomic (No multivalue attributes
allowed).
 There are no repeating groups: two columns do not store similar
information in the same table.
 A relation is said to be in first normal form if and only if all underlying
domains contain atomic values only.
 After 1NF, we can still have redundant data.

Exercise
Convert the following table into 1NF

Second Normal Form(2NF)
 A table is in the second normal form if it's in the first normal form
AND no column that is not part of the primary key is dependent only
a portion of the primary key.
 Functional Dependency
 The concept of functional dependency in central to normalisation
and, in particular, strongly related to 2NF.
 If ‘X’ is a set of attributes within a relation, then we say ‘A’ (an
attribute or set of attributes), is functionally dependent on X, if and
only if, for every combination of X, there is only one corresponding
value of A
 We write this as :
X -> A

Converting 1NF to 2NF
A table in 1NF
Functional Dependency
It is clear that :
RefNo -> Name, Address, Status or, most correctly,
AccNo, RefNo -> Name, Address, Status
So split the table to remove dependency

Table splitting
Splitting the tables we get

2NF Confirmation
– All requirements for 1stNF must be met.

– Redundant data across multiple rows of a table must be
moved to a separate table.
The resulting tables must be related to each other by use
of foreign key.
A relation is said to be in 2NF if and only if it is in1NF
and every non key attribute is fully dependent on the
primary key.
After 2NF, we can still have redundant data.

Exercise
Convert the following table to 2NF

Third Normal form(3NF)
A table is in the third normal form if it is the second

normal form and there are no non-key columns dependent
on other non-key columns that could not act as the
primary key.
A table in 2NF

Converting to third normal form
 After converting to third normal form

3NF confirmation
 • The requirements to satisfy the 3rd NF:

 – All requirements for 2nd NF must be met.
 – Eliminate fields that do not depend on the primary key;
 That is, any field that is dependent not only on the primary key but also on
another field must be moved to another table.
 A relation is said to be in 3NF, if and only if it is in 2NF and every non key
attribute is non-transitively dependent on the primary key.

Boyce-Codd Normal Form (BCNF)
All attributes in a relation should be dependent upon

the key, the whole key and nothing but the key.
A table in 3NF

Converting to BCNF
1) Identify Primary Key

2) Identify Composite Key
3) Identify Candidate Key
4) Which Columns are having repetition of data?
1.Primary Key:
 – CourseNo
2. Composite Key:
 – Lecturer + Time
 – Time + Room
3. Candidate Key:
 – CourseNo
 – Time
4. Columns are having repetition of data

 Lecturer
 Room

Cont’d

Is there redundancy?
Redundancy in 3NF
The combination of ROOM, TIME is unique to each tuple,
no room is used twice at the same time (thus it is in 3NF).
 But, we know there is a redundancy in that ROOM depends
LECTURER, therefore, we split the table...
IS there difference between 3NF and BCNF
Most relations in 3NF are also in BCNF, the only time this
may not be true is when there is more than one candidate
key for a relation and at least one of is composite.

After conversion

Topic 3

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Topic 3

Uploaded by

Copyright:

Available Formats

Topic 3:Data normalization

By: Harun kamau-maseno university CCS 319:Database Administration

Normalization is the process of removing redundant data from

 Organize Data Efficiently

 Reduce the potential for Data Anomalies.

By: Harun kamau-maseno university CCS 319:Database Administration

 Remove Unnecessary data,

By: Harun kamau-maseno university CCS 319:Database Administration

Data anomalies are inconsistencies in the data stored in

By: Harun kamau-maseno university CCS 319:Database Administration

By: Harun kamau-maseno university CCS 319:Database Administration

By: Harun kamau-maseno university CCS 319:Database Administration

Different Type of Normalization

By: Harun kamau-maseno university CCS 319:Database Administration

Each higher level is a subset of the lower level

By: Harun kamau-maseno university CCS 319:Database Administration

Database Normalization was first proposed by Edgar F.

 A table is in its first normal form if it contains no repeating attributes or

By: Harun kamau-maseno university CCS 319:Database Administration

 To convert data for unnormalized form to 1NF, simply convert

By: Harun kamau-maseno university CCS 319:Database Administration

The requirements to satisfy the 1stNF:

By: Harun kamau-maseno university CCS 319:Database Administration

Convert the following table into 1NF

By: Harun kamau-maseno university CCS 319:Database Administration

By: Harun kamau-maseno university CCS 319:Database Administration

By: Harun kamau-maseno university CCS 319:Database Administration

Splitting the tables we get

By: Harun kamau-maseno university CCS 319:Database Administration

– All requirements for 1stNF must be met.

By: Harun kamau-maseno university CCS 319:Database Administration

Convert the following table to 2NF

By: Harun kamau-maseno university CCS 319:Database Administration

A table is in the third normal form if it is the second

By: Harun kamau-maseno university CCS 319:Database Administration

 After converting to third normal form

By: Harun kamau-maseno university CCS 319:Database Administration

 • The requirements to satisfy the 3rd NF:

By: Harun kamau-maseno university CCS 319:Database Administration

All attributes in a relation should be dependent upon

By: Harun kamau-maseno university CCS 319:Database Administration

1) Identify Primary Key

4. Columns are having repetition of data

By: Harun kamau-maseno university CCS 319:Database Administration

By: Harun kamau-maseno university CCS 319:Database Administration

By: Harun kamau-maseno university CCS 319:Database Administration

By: Harun kamau-maseno university CCS 319:Database Administration

You might also like