You are on page 1of 26

Topic 3:Data normalization

B Y: H A R U N K A M A U

By: Harun kamau-maseno university CCS 319:Database Administration


What is Normalization?

Normalization is the process of removing redundant data from


your tables to improve storage efficiency, data integrity, and
scalability.
Normalization generally involves splitting existing tables into
multiple ones, which must be re-joined or linked each time a
query is issued.
Why normalization?
 The main goal of Database Normalization is to restructure the logical data
model of a database to:
 Eliminate Redundancy

 Organize Data Efficiently

 Reduce the potential for Data Anomalies.

By: Harun kamau-maseno university CCS 319:Database Administration


Redundancy issues

 Remove Unnecessary data,


 Remove Duplicate data,
 Remove surplus data and
 Remove unneeded data
Duplication Vs Redundant Data
 Duplicated Data: When an attribute has two or more identical values
 Redundant Data: If you can delete data with a loss of information
Data Efficiency
 Data should be Competently,
 Data should be Resourcefully and
 Data should be Satisfactory.

By: Harun kamau-maseno university CCS 319:Database Administration


Data Anomalies

Data anomalies are inconsistencies in the data stored in


a database as a result of an operation such as update,
insertion, and/or deletion.
Such inconsistencies may arise when have a particular
record stored in multiple locations and not all of the
copies are updated.
We can prevent such anomalies by implementing 7
different level of normalization called Normal Forms
(NF)

By: Harun kamau-maseno university CCS 319:Database Administration


Functional Dependency

Functional dependency FD is a set of constraints between two attributes in a relation. Functional dependency says that
if two tuples have same values for attributes A1, A2,..., An, then those two tuples must have to have same values for
attributes B1, B2, ..., Bn.
Functional dependency is represented by an arrow sign → that is, X→Y, where X functionally determines Y. The left-
hand side attributes determine the values of attributes on the right-hand side.
Armstrong's Axioms
If F is a set of functional dependencies then the closure of F, denoted as F+ , is the set of all functional dependencies
logically implied by F. Armstrong's Axioms are a set of rules, that when applied repeatedly, generates a closure of
functional dependencies.
Reflexive rule - If alpha is a set of attributes and beta is subset of alpha, then alpha holds beta.
Augmentation rule - If a → b holds and y is attribute set, then ay → by also holds. That is adding attributes in
dependencies, does not change the basic dependencies.
Transitivity rule - Same as transitive rule in algebra, if a → b holds and b → c holds, then a → c also holds. a →
b is called as a functionally that determines b.
Trivial Functional Dependency
Trivial - If a functional dependency FD X → Y holds, where Y is a subset of X, then it is called a trivial FD. Trivial
FDs always hold.
Non-trivial - If an FD X → Y holds, where Y is not a subset of X, then it is called a non-trivial FD.
Completely non-trivial - If an FD X → Y holds, where x intersect Y = F, it is said to be a completely non-trivial
FD.

By: Harun kamau-maseno university CCS 319:Database Administration


Why Normalization?

If a database design is not perfect, it may contain anomalies, which are like a
bad dream for any database administrator. Managing a database with anomalies
is next to impossible.
Update anomalies - If data items are scattered and are not linked to each
other properly, then it could lead to strange situations. For example, when we
try to update one data item having its copies scattered over several places, a
few instances get updated properly while a few others are left with old values.
Such instances leave the database in an inconsistent state.
Deletion anomalies - We tried to delete a record, but parts of it was left
undeleted because of unawareness, the data is also saved somewhere else.
Insert anomalies - We tried to insert data in a record that does not exist at
all.
Normalization is a method to remove all these anomalies and bring the
database to a consistent state.

By: Harun kamau-maseno university CCS 319:Database Administration


Normalization types

Different Type of Normalization


In database design, we start with one single table, with all possible columns. A lot of
redundant data would be present since it’s a single table.
The process of removing the redundant data, by splitting up the table in a well defined
fashion is called normalization.
Levels of Normalization
First Normal Form (1NF)
Second Normal Form (2NF)
Third Normal Form (3NF)
Boyce-Codd Normal Form (BCNF)
Fourth Normal Form (4NF)
Fifth Normal Form (5NF)
Domain Key Normal Form (DKNF)

Most databases should be 3NF or BCNF in order to avoid the database anomalies.

By: Harun kamau-maseno university CCS 319:Database Administration


Graphical representation

Each higher level is a subset of the lower level

By: Harun kamau-maseno university CCS 319:Database Administration


Normalization in action

Database Normalization was first proposed by Edgar F.


Codd.
Codd defined the first three Normal Forms, which we’ll look
into, of the 7 known NormalForms.
In order to do normalization we must know what the
requirements are for each of the three Normal Forms that
we’ll go over.
One of the key requirements to remember is that Normal
Forms are progressive. That is, in order to have 3rd NF we
must have 2nd NF and in order to have 2nd NF we must have
1st NF.
By: Harun kamau-maseno university CCS 319:Database Administration
First normal form

 A table is in its first normal form if it contains no repeating attributes or


groups of attributes
Non-Normalised Table

By: Harun kamau-maseno university CCS 319:Database Administration


Converting to 1NF

 To convert data for unnormalized form to 1NF, simply convert


any repeated attributes into part of the candidate key
 STUDENT(Number, Name, Classes)=>classes
 Normalized table

By: Harun kamau-maseno university CCS 319:Database Administration


First Normal Form (1NF) confirmation

The requirements to satisfy the 1stNF:


 Each table has a primary key: minimal set of attributes which can uniquely
identify a record
 The values in each column of a table are atomic (No multivalue attributes
allowed).
 There are no repeating groups: two columns do not store similar
information in the same table.
 A relation is said to be in first normal form if and only if all underlying
domains contain atomic values only.
 After 1NF, we can still have redundant data.

By: Harun kamau-maseno university CCS 319:Database Administration


Exercise

Convert the following table into 1NF

By: Harun kamau-maseno university CCS 319:Database Administration


Second Normal Form(2NF)

 A table is in the second normal form if it's in the first normal form
AND no column that is not part of the primary key is dependent only
a portion of the primary key.
 Functional Dependency
 The concept of functional dependency in central to normalisation
and, in particular, strongly related to 2NF.
 If ‘X’ is a set of attributes within a relation, then we say ‘A’ (an
attribute or set of attributes), is functionally dependent on X, if and
only if, for every combination of X, there is only one corresponding
value of A
 We write this as :
X -> A

By: Harun kamau-maseno university CCS 319:Database Administration


Converting 1NF to 2NF

A table in 1NF

Functional Dependency
It is clear that :
RefNo -> Name, Address, Status or, most correctly,
AccNo, RefNo -> Name, Address, Status
So split the table to remove dependency

By: Harun kamau-maseno university CCS 319:Database Administration


Table splitting

Splitting the tables we get

By: Harun kamau-maseno university CCS 319:Database Administration


2NF Confirmation

– All requirements for 1stNF must be met.


– Redundant data across multiple rows of a table must be
moved to a separate table.
The resulting tables must be related to each other by use
of foreign key.
A relation is said to be in 2NF if and only if it is in1NF
and every non key attribute is fully dependent on the
primary key.
After 2NF, we can still have redundant data.

By: Harun kamau-maseno university CCS 319:Database Administration


Exercise

Convert the following table to 2NF

By: Harun kamau-maseno university CCS 319:Database Administration


Third Normal form(3NF)

A table is in the third normal form if it is the second


normal form and there are no non-key columns dependent
on other non-key columns that could not act as the
primary key.
A table in 2NF

By: Harun kamau-maseno university CCS 319:Database Administration


Converting to third normal form

 After converting to third normal form

By: Harun kamau-maseno university CCS 319:Database Administration


3NF confirmation

 • The requirements to satisfy the 3rd NF:


 – All requirements for 2nd NF must be met.
 – Eliminate fields that do not depend on the primary key;
 That is, any field that is dependent not only on the primary key but also on
another field must be moved to another table.
 A relation is said to be in 3NF, if and only if it is in 2NF and every non key
attribute is non-transitively dependent on the primary key.

By: Harun kamau-maseno university CCS 319:Database Administration


Boyce-Codd Normal Form (BCNF)

All attributes in a relation should be dependent upon


the key, the whole key and nothing but the key.
A table in 3NF

By: Harun kamau-maseno university CCS 319:Database Administration


Converting to BCNF

1) Identify Primary Key


2) Identify Composite Key
3) Identify Candidate Key
4) Which Columns are having repetition of data?

1.Primary Key:
 – CourseNo

2. Composite Key:
 – Lecturer + Time

 – Time + Room

3. Candidate Key:
 – CourseNo

 – Time

4. Columns are having repetition of data


 Lecturer
 Room

By: Harun kamau-maseno university CCS 319:Database Administration


Cont’d

By: Harun kamau-maseno university CCS 319:Database Administration


Is there redundancy?

Redundancy in 3NF
The combination of ROOM, TIME is unique to each tuple,
no room is used twice at the same time (thus it is in 3NF).
 But, we know there is a redundancy in that ROOM depends
LECTURER, therefore, we split the table...
IS there difference between 3NF and BCNF
Most relations in 3NF are also in BCNF, the only time this
may not be true is when there is more than one candidate
key for a relation and at least one of is composite.

By: Harun kamau-maseno university CCS 319:Database Administration


After conversion

By: Harun kamau-maseno university CCS 319:Database Administration

You might also like