Normalization Data Anomalies

Data Anomalies
Anomalies are problems that can occur in poorly planned, un-normalized databases.
Or
An anomaly is an inconsistency between one part of the data and another part.
Insertion Anomaly - The nature of a database may be such that it is not possible to
add a required piece of data unless another piece of unavailable data is also added.
E.g. A library database that cannot store the details of a new member until that
member has taken out a book.
Deletion Anomaly - A record of data can legitimately be deleted from a database,
and the deletion can result in the deletion of the only instance of other, required
data, E.g. Deleting a book loan from a library member can remove all details of the
particular book from the database such as the author, book title etc.
Modification/Update Anomaly - Incorrect data may have to be changed, which
could involve many records having to be changed, leading to the possibility of some
changes being made incorrectly.
Example:
Suppose a manufacturing company stores the employee details in a table
named employee that has four attributes.
emp_id emp_name emp_address emp_dept
101 Rick Delhi D001
101 Rick Delhi D002
123 Maggie Agra D890
166 Glenn Chennai D900
166 Glenn Chennai D004

The above table is not normalized. We will see the problems that we face
when a table is not normalized.
Update anomaly: In the above table we have two rows for employee Rick
as he belongs to two departments of the company. If we want to update
the address of Rick then we have to update the same in two rows or the
data will become inconsistent. If somehow, the correct address gets
updated in one department but not in other then as per the database,
Rick would be having two different addresses, which is not correct and
would lead to inconsistent data.
Insert anomaly: Suppose a new employee joins the company, who is

under training and currently not assigned to any department then we
would not be able to insert the data into the table if emp_dept field
doesn’t allow nulls.
Delete anomaly: Suppose, if at a point of time the company closes the

department D890 then deleting the rows that are having emp_dept as
D890 would also delete the information of employee Maggie since she is
assigned only to this department.
To overcome these anomalies, we need to normalize the data. In the next
section we will discuss about normalization.
Functional dependencies:
A dependency is a constraint that applies the relationship between
attributes.
Functional dependency(FD):
A functional dependency is a constraint A between any 2 attributes.
If there is a dependency in a database such that attribute B is dependent
upon attribute A, it is represented as:
AB
Here attribute B ‘s value is determined by only attribute A.
For example, in employee table consider 2 attributes Social Security
number (SSN) and name.
it can be said that name is dependent upon SSN ( SSN  name) because
an employee's name can be uniquely determined from an SSN.
Transitive dependency:
A transitive dependency requires three or more attributes that have a
functional dependency between them.
Means, A  C is a transitive dependency when it is true only because both
A  B and BC are true.
For example, Consider AUTHORS table:
 Book → Author: Here, the Book attribute determines Author attribute
 Author → Author_Nationality: Likewise, the Author attribute
determines the Author_Nationality,
 Book →Author_Nationality: If we know the book name, we can
determine the nationality via the Author column.
Multivalued Dependencies:
Multivalued dependencies occur when the presence of one/more rows in
a table implies the presence of one/more other rows in that same table.
A multivalued dependency is written X Y.

For example, a car company that manufactures many models of car, but
always makes both red and blue colors of each model.
A table that contains the model name, color, and year of each car there is
a multivalued dependency.
If there is a row for a certain model name and year in blue, there must
also be a similar row corresponding to the red version of that same car.
For example, in the Students table below, the Student_Name determines

the Major and Student_Name detrrmines Sport.
Student_Nam Major Sport
e
Ravi Art History Soccer
Ravi Art History Volleyball
Ravi Art History Tennis
Beth Chemistry Tennis
Beth Chemistry Soccer
The problem here is that both Ravi and Beth play several sports.
It is necessary to add a new row for every additional sport.
This table has introduced a multivalued dependency because
Student_Name ->-> Major
Student_Name ->-> Sport
What is Join Dependency?
If a table can be recreated by joining multiple tables and each of this table
have a subset of the attributes of the table, then the table is in Join
Dependency.
It is a generalization of Multivalued Dependency.
Example:
<Employee>
EmpName EmpSkills EmpJob (Assigned Work)
Tom Networking EJ001
Harry Web Development EJ002
Katie Programming EJ002
The above table can be decomposed into the following three tables;
therefore it is not in 5NF:
<EmployeeSkills>
EmpName EmpSkills
Tom Networking
Harry Web Development
Katie Programming
<EmployeeJob>
EmpName EmpJob
Tom EJ001
Harry EJ002
Katie EJ002
<JobSkills>
EmpSkills EmpJob
Networking EJ001
Web Development EJ002
Programming EJ002
Our Join Dependency:
{(EmpName, EmpSkills ), ( EmpName, EmpJob), (EmpSkills, EmpJob)}
The above relations have join dependency, so they are not in 5NF.
That would mean that a join relation of the above three relations is equal
to our original relation <Employee>.
Normalization in DBMS:
Normalization is a database design technique that begins by examining
the relationships (called functional dependencies) between attributes.
Normalization uses a series of tests (described as normal forms) to help
identify the optimal grouping of attributes.
Definition:
Normalization is the process of organizing the data in the database.
Or
Normalization is a step by step decomposing of large tables in to small by
eliminate data redundancy.
The Purpose of Normalization

The purpose of normalization is to identify:
• the minimal number of attributes necessary.
• attributes with a close logical relationship (functional dependency).
• minimal redundancy.
Types of Normalization:
The database normalization process is divided into following:
 First Normal Form (1NF)
 Second Normal Form (2NF)
 Third Normal Form (3NF)
 Boyce-Codd Normal Form (BCNF)
 Fourth Normal Form (4NF)
 Fifth Normal Form (5NF)
First Normal Form (1NF):
A table is said to be in 1NF:
If Each column is unique(no repeating groups).
Example:
Sample Employee table, it displays employees are working with multiple
departments.
Employee Ag Department
e
In the above table Department
Melvin 32 Marketing, Sales
column is having multiple values.
Edward 45 Quality Assurance Employee table after converted in

to 1NF:
Alex
Employee 36
Ag Human Resource
Department
e
Melvin 32 Marketing
Melvin 32 Sales
Edward 45 Quality Assurance
Alex 36 Human Resource
Second Normal Form (2NF):

1. It is 1 NF.
2. All attributes should depend solely on the unique
identifier/PK( No partial dependency/No other FDs).
Example: Sample Products table:
productI product Brand

D
1 Monitor Apple
2 Monitor Samsung
3 Scanner HP
4 Head phone JBL
In the above table ProductID is PK and Brand is dependent on Product.

Product table is converted in to 2NF:
Products Category table:
productI product
D
1 Monitor
2 Scanner
3 Head phone
Brand table:
brandI brand
D
1 Apple
2 Samsung
3 HP
4 JBL
Products Brand table:
pbID productID brandID
1 1 1
2 1 2
3 2 3
4 3 4
Third Normal Form (3NF):

1. It is 2 NF.
2. There is No transitive dependency
Example:
we have 3 tables, Student, Subject and Score.
Student Table
student_id name reg_no branch address
10 Akon 07-WY CSE Kerala
11 Akon 08-WY IT Gujarat
12 Bkon 09-WY IT Rajasthan
Subject Table
subject_id subject_name teacher
1 Java Java Teacher
2 C++ C++ Teacher
3 Php Php Teacher
Score Table
score_id student_id subject_id marks
1 10 1 70
2 10 2 75
3 11 1 80
In the Score table, we need to store some more information, which is the
exam name and total marks, so let's add 2 more columns to the Score
table.
score_id student_id subject_id marks exam_name total_marks
What is Transitive Dependency?

With exam_name and total_marks added to our Score table, it saves more
data now. Primary key for our Score table is a composite key, which
means it's made up of two attributes or columns
→ student_id + subject_id.
Our new column exam_name depends on both student and subject.
And for some subjects you have Prctical exams and for some you don't. So
we can say that exam_name is dependent on
both student_id and subject_id.
This is Transitive Dependency. When a non-prime attribute depends on
other non-prime attributes rather than depending upon the prime
attributes or primary key.
How to remove Transitive Dependency?

Again the solution is very simple. Take out the
columns exam_name and total_marks from Score table and put them in
an Exam table and use the exam_id wherever required.
Score Table: In 3rd Normal Form
score_id student_id subject_id marks exam_id
The new Exam table
exam_id exam_name total_marks
1 Workshop 200
2 Mains 70
3 Practicals 30
Boyce-Codd Normal Form (BCNF)

A table is said to be in BCNF:
1. It is 3 NF.
2. Every determinant/column is a candidate key
Example:
exam_id Htno Sub_id

Normalization Data Anomalies

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Normalization Data Anomalies

Uploaded by

Copyright:

Available Formats

Data Anomalies

emp_id emp_name emp_address emp_dept

101 Rick Delhi D001

101 Rick Delhi D002

123 Maggie Agra D890

166 Glenn Chennai D900

166 Glenn Chennai D004

Insert anomaly: Suppose a new employee joins the company, who is

Delete anomaly: Suppose, if at a point of time the company closes the

A multivalued dependency is written X Y.

For example, in the Students table below, the Student_Name determines

Our Join Dependency:

{(EmpName, EmpSkills ), ( EmpName, EmpJob), (EmpSkills, EmpJob)}

The Purpose of Normalization

Edward 45 Quality Assurance Employee table after converted in

Edward 45 Quality Assurance

Alex 36 Human Resource

Second Normal Form (2NF):

productI product Brand

4 Head phone JBL

In the above table ProductID is PK and Brand is dependent on Product.

Products Category table:

pbID productID brandID

Third Normal Form (3NF):

student_id name reg_no branch address

10 Akon 07-WY CSE Kerala

11 Akon 08-WY IT Gujarat

12 Bkon 09-WY IT Rajasthan

subject_id subject_name teacher

1 Java Java Teacher

2 C++ C++ Teacher

3 Php Php Teacher

score_id student_id subject_id marks exam_name total_marks

What is Transitive Dependency?

How to remove Transitive Dependency?

score_id student_id subject_id marks exam_id

The new Exam table

exam_id exam_name total_marks

Boyce-Codd Normal Form (BCNF)

exam_id Htno Sub_id

You might also like