You are on page 1of 36

Normalization

Introduction
• Normalization is a process of
organizing the data in
database to avoid data
redundancy, insertion
anomaly, update anomaly &
deletion anomaly
Anomalies in DBMS
• three types of anomalies that occur when the
database is not normalized
• Insertion anomaly
• update anomaly
• deletion anomaly
NON normalized table
Insertion anomaly

• Suppose a new employee joins


the company, who is under
training and currently not
assigned to any department then
we would not be able to insert
the data into the table if
Emp_Dept field doesn’t allow null
Deletion anomaly
• Suppose a new employee joins the
company, who is under training and
currently not assigned to any
department then we would not be
able to insert the data into the table
if Emp_Dept field doesn’t allow null
Update anomaly
• In the above table we have two rows for
employee Rick as he belongs to two
departments of the company.
• update the address of Rick then we have to
update the same in two rows or the data
will become inconsistent.
• If somehow, the correct address gets
updated in one department but not in
other then as per the database,
• Rick would be having two different
Types
• First Normal Form (1NF)
• Second Normal Form (2NF)
• Third Normal Form (3NF)
• Boyce-Codd Normal Form (BCNF)
• Fourth Normal Form (4NF)
• Fifth Normal Form (5NF)
First Normal Form (1NF)

4 rules:
• It should only have single(atomic) valued
attributes/columns.
• Values stored in a column should be of the same
domain
• All the columns in a table should have unique
names.
• And the order in which data is stored, does not
matter.
Second Normal Form (2NF)

• It should be in the First Normal form.


• And, it should not have Partial Dependency.
Third Normal Form (3NF)

• It is in the Second Normal form.


• And, it doesn't have Transitive Dependency.
Boyce and Codd Normal Form (BCNF)

• It is a higher version of the Third Normal form.


This form deals with certain type of anomaly that
is not handled by 3NF. A 3NF table which does
not have multiple overlapping candidate keys is
said to be in BCNF. For a table to be in BCNF,
following conditions must be satisfied:
• R must be in 3rd Normal Form
• and, for each functional dependency ( X → Y ), X
should be a super Key.
4NF
• It should be in the Boyce-Codd Normal Form
(BCNF).
• the table should not have any Multi-valued
Dependency.
Fifth Normal Form (5 NF)
• A table in the Fourth Normal
• It won’t have lossless decomposition into
smaller tables.
• 5NF is satisfied when all the tables are broken
into as many tables as possible in order to
avoid redundancy.
• 5NF is also known as Project-join normal form
(PJ/NF).
1nf
1nf
2nf
2nf
3nf

Super keys: {Emp_Id}, {Emp_Id, Emp_Name}, {Emp_Id, Emp_Name, Emp_Zip}…so on


Candidate Keys: {Emp_Id}
Non-prime attributes: all attributes except Emp_Id are non-prime as they are not part of any candidate keys.
3nf
3nf
BCNF
BCNF
BCNF

Candidate keys:
For the first table: EMP_ID
For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}
4NF
What is Multi-valued Dependency?

• For a dependency A → B, if for a single value of


A, multiple value of B exists, then the table may
have multi-valued dependency.
• Also, a table should have at-least 3 columns for
it to have a multi-valued dependency.
• And, for a relation R(A,B,C), if there is a multi-
valued dependency between, A and B, then B
and C should be independent of each other.
Table with multivalue dependency
1NF A relation is in 1NF if it contains an atomic value.

2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional
dependent on the primary key.

3NF A relation will be in 3NF if it is in 2NF and no transition dependency exists.

BCNF A stronger definition of 3NF is known as Boyce Codd's normal form.a->b a is super
key

4NF A relation will be in 4NF if it is in Boyce Codd's normal form and has no multi-valued
dependency.

5NF A relation is in 5NF. If it is in 4NF and does not contain any join dependency, joining
should be lossless.
ADV
• Normalization helps to minimize data
redundancy.
• Greater overall database organization.
• Data consistency within the database.
• Much more flexible database design.
• Enforces the concept of relational integrity.
D-ADV
• You cannot start building the database before
knowing what the user needs.
• The performance degrades when normalizing
the relations to higher normal forms, i.e., 4NF,
5NF.
• It is very time-consuming and difficult to
normalize relations of a higher degree.
• Careless decomposition may lead to a bad
database design, leading to serious problems
Denormalization in Databases
• normalize tables, we break them into multiple
smaller tables
• retrieve data from multiple tables, we need to
perform some kind of join operation on them.
• In that case, we use the denormalization
technique that eliminates the drawback of
normalization.
Pros of Denormalization

1. Enhance Query Performance


2. Make database more convenient to manage
3. Facilitate and accelerate reporting
Cons of Denormalization
• It takes large storage due to data
redundancy.
• It makes it expensive to updates and
inserts data in a table.
• It makes update and inserts code harder
to write.
• Since data can be modified in several
ways, it makes data inconsistent.
differnce
• The denormalization is different from normalization
.
in the following manner:
• Denormalization -merge data from multiple tables
into a single table that can be queried quickly.
• Normalization,- delete redundant data from a
database and replace it with non-redundant and
reliable data.
differnce
.Used-time
• Denorm-joins are costly& queries are run regularly

• Norm-large number of insert/update/delete

operations are performed

• joins between those tables are not expensive.


.

You might also like