You are on page 1of 13

DSC: Database Management System (DBMS)-NEP

UNIT-4 Data Normalization

Data Normalization: Anomalies in relational database design. Decomposition.


Functional dependencies. Normalization. First normal form, Second normal form,
Third normal form. Boyce-Codd normal form.

In Database Management System (DBMS), anomaly means the inconsistency


occurred in the relational table during the operations performed on the relational
table.

Reasons for anomalies

 If lot of redundant data present in our database


 If a table is constructed in a very poor manner then there is a chance of
database anomaly.
 If all the data is stored in a single table.

Database anomalies, affect the process of inserting, deleting, and modifying data in
the relations and also the integrity of the database suffers.

Types of Anomalies in DBMS

Insert Anomaly: If there is a new row inserted in the table and it creates the
inconsistency in the table then it is called the insertion anomaly.

Update Anomaly: When we update some rows in the table, and if it leads to the
inconsistency of the table then this anomaly occurs. This type of anomaly is known as
an updation anomaly.

Delete Anomaly: If we delete some rows from the table and if any other information
or data which is required is also deleted from the database, this is called the deletion
anomaly in the database.

Decomposition

Decomposition is the process of breaking an original relation into multiple sub


relations. Decomposition helps to remove anomalies, redundancy, and other
problems in a DBMS.

There are two types of decomposition as shown below:

1
DSC: Database Management System (DBMS)-NEP

1. Lossless decomposition

A lossless decomposition of a relation ensures that:

a) No information is lost during decomposition. This is why the term lossless is


used in this decomposition as no information is lost.

b) If a relation R is divided into two relations R1 and R2 using lossless decomposition


then the natural join of R1 and R2 would return the original relation R.

Rules of Lossless decomposition: For these rules, we are assuming that a relation R
is divided into two relations R1 and R2.

1. Natural join of R1 and R2 should return the original relation R.

R1 U R2 = R
2. The intersection of R1 and R2 should not be null. This is because there are some
common attributes present in relation R1 and R2.

R1 ∩ R2 ≠ 0
3. The intersection of R1 and R2 is either a super key of R1 or R2, or both the
relations R1 and R2.

R1 ∩ R2 = super key of R1 or R2 or both


Example

KOSHYS INSTITUTE OF MANAGEMENT STUDIES- BCA DEPARTMENT 2


DSC: Database Management System (DBMS)-NEP

2. Lossy Decomposition

Just like the name suggests, whenever we decompose a relation into multiple
relational schemas, then the loss of data/information is unavoidable i.e we will not be
able to recover Complete information as present in the original relation.

In lossy decomposition, one or more above rules will fail.

KOSHYS INSTITUTE OF MANAGEMENT STUDIES- BCA DEPARTMENT 3


DSC: Database Management System (DBMS)-NEP

Example
Relational Schema = X (P, Q, R)
Decompositions,
X1 (P, Q)
X2 (P, R)

Thus, X1 ⨝ X2 will be equal to

KOSHYS INSTITUTE OF MANAGEMENT STUDIES- BCA DEPARTMENT 4


DSC: Database Management System (DBMS)-NEP

Here, since X ⊂ X1 ⨝ X2,


Thus, this is a lossy join decomposition.
Functional Dependency (FD)
Functional Dependency (FD) is a constraint that determines the relation of one
attribute to another attribute in a Database Management System (DBMS).

It typically exists between the primary key and non-key attribute within a table.

X→Y

The left side of FD is known as a determinant, the right side of the production is
known as a dependent.

Example: Emp_Id → Emp_Name

Types of Functional dependencies in DBMS:

1. Trivial functional dependency


2. Non-Trivial functional dependency

Trivial functional dependency

A → B has trivial functional dependency if B is a subset of A.

The following dependencies are also trivial like: A → A, B → B

Example: { DeptId, DeptName } -> Dept Id

Non-Trivial functional dependency

o A → B has a non-trivial functional dependency if B is not a subset of A.


o When A intersection B is NULL, then A → B is called as complete non-trivial.

KOSHYS INSTITUTE OF MANAGEMENT STUDIES- BCA DEPARTMENT 5


DSC: Database Management System (DBMS)-NEP

Example : DeptId -> DeptName

Normalization

o Normalization is the process of organizing the data in the database.


o Normalization is used to minimize the redundancy from a relation or set of
relations. It is also used to eliminate undesirable characteristics like Insertion,
Update, and Deletion Anomalies.
o Normalization divides the larger table into smaller and links them using
relationships.
o The normal form is used to reduce redundancy from the database table.

The database normalization process is further categorized into the following

types:

1. First Normal Form (1 NF)


2. Second Normal Form (2 NF)
3. Third Normal Form (3 NF)
4. Boyce Codd Normal Form or Fourth Normal Form ( BCNF or 4 NF)
5. Fifth Normal Form (5 NF)
6. Sixth Normal Form (6 NF)

First Normal Form (1 NF)

A relation will be 1NF if all the attributes in a relation must have atomic domains.

KOSHYS INSTITUTE OF MANAGEMENT STUDIES- BCA DEPARTMENT 6


DSC: Database Management System (DBMS)-NEP

Second Normal Form (2 NF)

For a relational table to be in second normal form, it must satisfy the following rules:

1. The table must be in first normal form.


2. It must not contain any partial dependency, i.e., all non-prime attributes are
fully functionally dependent on the primary key.

In the above table, the prime attributes of the table are Employee Code and
Project ID. We have partial dependencies in this table because Employee Name can
be

KOSHYS INSTITUTE OF MANAGEMENT STUDIES- BCA DEPARTMENT 7


DSC: Database Management System (DBMS)-NEP

determined by Employee Code and Project Name can be determined by Project ID.
Thus, the above relational table violates the rule of 2NF.

To remove partial dependencies from this table and normalize it into second
normal form, we can decompose the <EmployeeProjectDetail> table into the
following two tables:

KOSHYS INSTITUTE OF MANAGEMENT STUDIES- BCA DEPARTMENT 8


DSC: Database Management System (DBMS)-NEP

Third Normal Form (3NF)

For a relational table to be in third normal form, it must satisfy the following rules:

1. The table must be in the second normal form.


2. No non-prime attribute is transitively dependent on the primary key.
3. For each functional dependency X -> Z at least one of the following
conditions hold:

 X is a super key of the table.


 Z is a prime attribute of the table.Example

The above table is not in 3NF because it has Employee Code -> Employee City transitive depe

NARASIMHA MURTHY GK ,KRUPANIDHI DEGREE COLLEGE 9


DSC: Database Management System (DBMS)-NEP

 Employee Code -> Employee Zipcode


 Employee Zipcode -> Employee City

Also, Employee Zipcode is not a super key and Employee City is not a prime attribute.

To remove transitive dependency from this table and normalize it into the third
normal form, we can decompose the <EmployeeDetail> table into the following two
tables:

NARASIMHA MURTHY GK ,KRUPANIDHI DEGREE COLLEGE 1


DSC: Database Management System (DBMS)-NEP

Boyce-Codd Normal Form (BCNF)

o BCNF is the advance version of 3NF. It is stricter than 3NF.


o A table is in BCNF if every functional dependency X → Y, X is the super key of
the table.
o For BCNF, the table should be in 3NF, and for every FD, LHS is super key.

NARASIMHA MURTHY GK ,KRUPANIDHI DEGREE COLLEGE 1


DSC: Database Management System (DBMS)-NEP

Key in DBMS

A key refers to an attribute/a set of attributes that help us identify a row


(or tuple) uniquely in a table (or relation). A key is also used when we want to
establish relationships between the different columns and tables of a relational
database.

Super Key

Super key is a single key or a group of multiple keys that can uniquely identify tuples
in a table.

 Super Key can contain multiple attributes that might not be able to

independently identify tuples in a table, but when grouped with certain keys,

they can identify tuples uniquely.

For instance – (employee_Id, Employee_Name), (employee_Id, Passport_number),

(employee_Id, SSN), etc. can all be Super keys as they can all uniquely identify the

tuples of the table. This is so because of the presence of the employee_Id attribute

which is able to uniquely identify the tuples.

NARASIMHA MURTHY GK ,KRUPANIDHI DEGREE COLLEGE 1


DSC: Database Management System (DBMS)-NEP

Candidate key

Candidate key is a single key or a group of multiple keys that uniquely identify rows

in a table.

The value for the Candidate key is unique and non-null for all tuples. And
every table has to have at least one Candidate key. But there can be more than one
Candidate Key too.

In the above example ,both employee_Id and Passport_number ,SSN can act
as a Candidate for the table as they contain unique and non-null values.

Primary key

Primary key is the Candidate key selected by the database administrator to uniquely

identify tuples in a table.

Out of all the Candidate keys that can be possible for a table, there can be

only one key that will be used to retrieve unique tuples from the table. This

Candidate key is called the Primary Key.

FOREIGN KEY-refer unit 3

NARASIMHA MURTHY GK ,KRUPANIDHI DEGREE COLLEGE 1

You might also like