You are on page 1of 34

Fundamentals of Database

Systems(CoSc2041)
Chapter Four
Normalization
Chapter Outline
• Introduction
• Types of normalization
• 1NF
• 2nd NF
• 3rd NF
Problem with un-normalized data
• If a table is not properly normalized and have data redundancy then it will not only eat up
extra memory space but will also make it difficult to handle and update the database,
without facing data loss.
• Insert, update and delete anomalies are very frequent if database is not normalized.
Update anomalies − If data items are scattered and are not linked to each other
properly, then it could lead to strange situations.
• For example, when we try to update one data item having its copies scattered over several
places, a few instances get updated properly while a few others are left with old values.
Such instances leave the database in an inconsistent state.
Deletion anomalies − We tried to delete a record, but parts of it was left undeleted
because of unawareness, the data is also saved somewhere else.
Insert anomalies − We tried to insert data in a record that does not exist at all.
• Normalization is a method to remove all these anomalies and bring the database to a
consistent state.
Normalization
• Normalization is a mathematically rich and scientific process that reduces data redundancy.

• It is a process that “improves” a database design by generating relations that are of higher
normal forms.

• The objective of normalization is “to create relations where every dependency is on the key,
the whole key, and nothing but the key”.

• We discuss four normal forms: 1NF, 2NF, 3NF, and BCNF

• Database Normalization is a technique of organizing the data in the database.


• Normalization is a systematic approach of decomposing tables to eliminate data
redundancy(repetition) and undesirable characteristics like Insertion, update and deletion
anomalies.
• It is a multi-step process that puts data into tabular form, removing duplicated data from the
relation tables.
• Normalization of data can be considered as a “filtering” or “purification” process to make
the design have successively better quality.
• Unsatisfactory relation schemas that do not meet certain conditions—the normal form
tests—are decomposed into smaller relation schemas that meet the tests and hence
possess the desirable properties.
• Definition: The normal form of a relation refers to the highest normal form condition
that it meets, and hence indicates the degree to which it has been normalized.
• Normalization is used for mainly two purposes,
• Eliminating redundant(useless) data.
• Ensuring data dependencies make sense i.e data is logically stored.
• There is a sequence to normal forms:

• 1NF is considered the weakest,

• 2NF is stronger than 1NF,

• 3NF is stronger than 2NF, and

• BCNF is considered the strongest

 Also,

• any relation that is in BCNF, is in 3NF;

• any relation in 3NF is in 2NF; and

• any relation in 2NF is in 1NF.


Levels of Normalization
• Levels of normalization based on the amount of redundancy in the database.
• Various levels of normalization are:
• First Normal Form (1NF)
• Second Normal Form (2NF)
• Third Normal Form (3NF)

Redundancy
• Boyce-Codd Normal Form (BCNF)

Number of Tables

Number of Tables
• Fourth Normal Form (4NF)
• Fifth Normal Form (5NF)
• Domain Key Normal Form (DKNF)

Most
Mostdatabases
databasesshould
shouldbe
be3NF
3NFororBCNF
BCNFin inorder
orderto
toavoid
avoidthe
the
database
databaseanomalies.
anomalies.
Level of Normalization
• We consider a relation in BCNF to be fully normalized.

• The benefit of higher normal forms is that update semantics for the affected data

are simplified.
• This means that applications required to maintain the database are simpler.

• A design that has a lower normal form than another design has more redundancy.

Uncontrolled redundancy can lead to data integrity problems.


• First we introduce the concept of functional dependency
Functional Dependencies

• We say an attribute, B, has a functional dependency on another attribute, A, if for any two

records, which have the same value for A, then the values for B in these two records must

be the same. We illustrate this as:

• Example: Suppose we keep track of employee email addresses, and we only track one

email address for each employee. Suppose each employee is identified by their unique

employee number.

• We say there is a functional dependency of email address on employee number:

• employee number  email address


Functional Dependencies
EmpNum EmpEmail EmpFname EmpLname
123 jdoe@abc.com John Doe
456 psmith@abc.com Peter Smith
555 alee1@abc.com Alan Lee
633 pdoe@abc.com Peter Doe
787 alee2@abc.com Alan Lee

If EmpNum is the PK then the FDs:


EmpNum  EmpEmail
EmpNum  EmpFname
EmpNum  EmpLname
• Must exist
• Functional Dependencies
EmpNum  EmpEmail 3 different ways you might see
EmpNum  EmpFname FDs depicted
EmpNum  EmpLname
Determinant

Transitive dependency

• Consider attributes A, B, and C, and where

• A  B and B  C.

• Functional dependencies are transitive, which means that we also have the functional
dependency A  C

• We say that C is transitively dependent on A through B.


First Normal Form
• Tt was defined to disallow multivalued attributes, composite attributes, and their
combinations.
• It states that the domain of an attribute must include only atomic (simple, indivisible)
values and that the value of any attribute in a tuple must be a single value from the
domain of that attribute.
• Hence, 1NF disallows having a set of values, a tuple of values, or a combination of both as
an attribute value for a single tuple.

• In other words, 1NF disallows relations within relations or relations as attribute values
within tuples.

• 1NF places restrictions on the structure of relations.

• Values must be simple.

• There are no repeating groups in the table.

• All attributes are dependent on the primary key.


First Normal Form
First Normal Form
Unnormalized data
• 1 NF
Second Normal Form

• (2NF) is based on the concept of full functional dependency.


• Definition: A relation schema R is in 2NF if every nonprime attribute A in R is fully
functionally dependent on the primary key of R.
• A functional dependency X → Y is a full functional dependency if removal of any
attribute A from X means that the dependency does not hold any more;
• i.e. for any attribute A ε X, (X – {A}) does not functionally determine Y. A functional
dependency X → Y is a partial dependency if some attribute A ε X can be removed
from X and the dependency still holds; that is, for some A ε X, (X – {A}) → Y.
• The test for 2NF involves testing for functional dependencies whose left-hand side
attributes are part of the primary key.
• If the primary key contains a single attribute, the test need not be applied at all.
 A relation is in 2NF if it is in 1NF, and every non-prime attribute is fully dependent on
each candidate key. (That is, we don’t have any partial functional dependency.)
• 2NF (and 3NF) both involve the concepts of prime and non-prime attributes.

• Definition: An attribute of relation schema R is called a prime attribute of R if it is a


member of some candidate key of R.

• An attribute is called non-prime if it is not a prime attribute—that is, if it is not a


member of any candidate key.

• Relations that are not in BCNF have data redundancies

• A relation in 2NF will not have any partial dependencies


Third Normal Form
(3NF) is based on the concept of transitive dependency.

A relation is in 3NF if the relation is in 2NF and all determinants of prime attributes are
candidate keys.

That is, for any functional dependency: X  Y, where Y is a non-prime attribute ,X is a


candidate key.

This definition of 3NF differs from BCNF only in the specification of prime attributes -
3NF is weaker than BCNF. (BCNF requires all determinants to be candidate keys.)

a relation schema R is in 3NF if it satisfies 2NF and no nonprime attribute of R is


transitively dependent on the primary key.
Boyce-Codd Normal Form
• Definition: A relation schema R is in BCNF if whenever a nontrivial functional
dependency X → A holds in R, then X is a super key of R.

• Every relation in BCNF is also in 3NF; however, a relation in 3NF is not necessarily in
BCNF.

• BCNF is defined very simply:


• A relation is in BCNF if it is in 3NF and
• If every determinant is a candidate key.
If our database will be used for OLTP (on line transaction processing), then BCNF is
our target. Usually, we meet this objective. However, we might denormalize (3NF,
2NF, or 1NF) for performance reasons.

You might also like