You are on page 1of 15

Data Anomalies

Anomalies are problems that can occur in poorly planned, un-normalized databases.
Or
An anomaly is an inconsistency between one part of the data and another part. 

Insertion Anomaly - The nature of a database may be such that it is not possible to
add a required piece of data unless another piece of unavailable data is also added.
E.g. A library database that cannot store the details of a new member until that
member has taken out a book.
Deletion Anomaly - A record of data can legitimately be deleted from a database,
and the deletion can result in the deletion of the only instance of other, required
data, E.g. Deleting a book loan from a library member can remove all details of the
particular book from the database such as the author, book title etc.
Modification/Update Anomaly - Incorrect data may have to be changed, which
could involve many records having to be changed, leading to the possibility of some
changes being made incorrectly.

Example:
Suppose a manufacturing company stores the employee details in a table
named employee that has four attributes.

emp_id emp_name emp_address emp_dept

101 Rick Delhi D001

101 Rick Delhi D002

123 Maggie Agra D890

166 Glenn Chennai D900

166 Glenn Chennai D004


The above table is not normalized. We will see the problems that we face
when a table is not normalized.

Update anomaly: In the above table we have two rows for employee Rick
as he belongs to two departments of the company. If we want to update
the address of Rick then we have to update the same in two rows or the
data will become inconsistent. If somehow, the correct address gets
updated in one department but not in other then as per the database,
Rick would be having two different addresses, which is not correct and
would lead to inconsistent data.

Insert anomaly: Suppose a new employee joins the company, who is


under training and currently not assigned to any department then we
would not be able to insert the data into the table if emp_dept field
doesn’t allow nulls.

Delete anomaly: Suppose, if at a point of time the company closes the


department D890 then deleting the rows that are having emp_dept as
D890 would also delete the information of employee Maggie since she is
assigned only to this department.
To overcome these anomalies, we need to normalize the data. In the next
section we will discuss about normalization.
Functional dependencies:
A dependency is a constraint that applies the relationship between
attributes.
Functional dependency(FD):
A functional dependency is a constraint A between any 2 attributes.
If there is a dependency in a database such that attribute B is dependent
upon attribute A, it is represented as:

AB
Here attribute B ‘s value is determined by only attribute A.
For example, in employee table consider 2 attributes Social Security
number (SSN) and name.
it can be said that name is dependent upon SSN ( SSN  name) because
an employee's name can be uniquely determined from an SSN. 

Transitive dependency:
A transitive dependency requires three or more attributes  that have a
functional dependency between them.
Means, A  C is a transitive dependency when it is true only because both
A  B and BC are true.
For example, Consider AUTHORS table:
 Book  →  Author: Here, the Book  attribute determines Author attribute
 Author → Author_Nationality:  Likewise, the Author attribute
determines the Author_Nationality,
 Book →Author_Nationality: If we know the book name, we can
determine the nationality via the Author column.
Multivalued Dependencies:
Multivalued dependencies occur when the presence of one/more rows in
a table implies the presence of one/more other rows in that same table.

A multivalued dependency is written X Y.


For example, a car company that manufactures many models of car, but
always makes both red and blue colors of each model.
A table that contains the model name, color, and year of each car there is
a multivalued dependency.
If there is a row for a certain model name and year in blue, there must
also be a similar row corresponding to the red version of that same car.

 For example, in the Students table below, the Student_Name determines


the Major and Student_Name detrrmines Sport.
Student_Nam Major Sport
e
Ravi Art History Soccer
Ravi Art History Volleyball
Ravi Art History Tennis
Beth Chemistry Tennis
Beth Chemistry Soccer

The problem here is that both Ravi and Beth play several sports.
It is necessary to add a new row for every additional sport. 
This table has introduced a multivalued dependency because
Student_Name  ->-> Major
Student_Name ->-> Sport
What is Join Dependency?
If a table can be recreated by joining multiple tables and each of this table
have a subset of the attributes of the table, then the table is in Join
Dependency.
It is a generalization of Multivalued Dependency.
Example:

<Employee>
EmpName EmpSkills EmpJob (Assigned Work)
Tom Networking EJ001
Harry Web Development EJ002
Katie Programming EJ002

The above table can be decomposed into the following three tables;
therefore it is not in 5NF:

<EmployeeSkills>
EmpName EmpSkills
Tom Networking
Harry Web Development
Katie Programming
<EmployeeJob>
EmpName EmpJob
Tom EJ001
Harry EJ002
Katie EJ002

<JobSkills>
EmpSkills EmpJob
Networking EJ001
Web Development EJ002
Programming EJ002

Our Join Dependency:

{(EmpName, EmpSkills ), ( EmpName, EmpJob), (EmpSkills, EmpJob)}

The above relations have join dependency, so they are not in 5NF.
That would mean that a join relation of the above three relations is equal
to our original relation <Employee>.
Normalization in DBMS:
Normalization is a database design technique that begins by examining
the relationships (called functional dependencies) between attributes.
Normalization uses a series of tests (described as normal forms) to help
identify the optimal grouping of attributes.
Definition:
Normalization is the process of organizing the data in the database.
Or
Normalization is a step by step decomposing of large tables in to small by
eliminate data redundancy.

The Purpose of Normalization


The purpose of normalization is to identify:
• the minimal number of attributes necessary.
• attributes with a close logical relationship (functional dependency).
• minimal redundancy.

Types of Normalization:
The database normalization process is divided into following:
 First Normal Form (1NF)
 Second Normal Form (2NF)
 Third Normal Form (3NF)
 Boyce-Codd Normal Form (BCNF)
 Fourth Normal Form (4NF)
 Fifth Normal Form (5NF)
First Normal Form (1NF):
A table is said to be in 1NF:
If Each column is unique(no repeating groups).
Example:
Sample Employee table, it displays employees are working with multiple
departments.

Employee Ag Department
e
In the above table Department
Melvin 32 Marketing, Sales
column is having multiple values.

Edward 45 Quality Assurance Employee table after converted in


to 1NF:
Alex
Employee 36
Ag Human Resource
Department
e
Melvin 32 Marketing

Melvin 32 Sales

Edward 45 Quality Assurance

Alex 36 Human Resource

Second Normal Form (2NF):


A table is said to be in 2NF:
1. It is 1 NF.
2. All attributes should depend solely on the unique
identifier/PK( No partial dependency/No other FDs).
Example: Sample Products table:

productI product Brand


D

1 Monitor Apple

2 Monitor Samsung

3 Scanner HP

4 Head phone JBL

In the above table ProductID is PK and Brand is dependent on Product.


Product table is converted in to 2NF:

Products Category table:

productI product
D

1 Monitor

2 Scanner

3 Head phone

Brand table:

brandI brand
D

1 Apple

2 Samsung

3 HP

4 JBL
Products Brand table:

pbID productID brandID

1 1 1

2 1 2

3 2 3

4 3 4

Third Normal Form (3NF):


A table is said to be in 3NF:
1. It is 2 NF.
2. There is No transitive dependency
Example:
we have 3 tables, Student, Subject and Score.
Student Table

student_id name reg_no branch address

10 Akon 07-WY CSE Kerala

11 Akon 08-WY IT Gujarat

12 Bkon 09-WY IT Rajasthan

Subject Table

subject_id subject_name teacher

1 Java Java Teacher

2 C++ C++ Teacher

3 Php Php Teacher

Score Table
score_id student_id subject_id marks

1 10 1 70

2 10 2 75

3 11 1 80

In the Score table, we need to store some more information, which is the
exam name and total marks, so let's add 2 more columns to the Score
table.

score_id student_id subject_id marks exam_name total_marks

What is Transitive Dependency?


With exam_name and total_marks added to our Score table, it saves more
data now. Primary key for our Score table is a composite key, which
means it's made up of two attributes or columns
→ student_id + subject_id.
Our new column exam_name depends on both student and subject.
And for some subjects you have Prctical exams and for some you don't. So
we can say that exam_name is dependent on
both student_id and subject_id.
This is Transitive Dependency. When a non-prime attribute depends on
other non-prime attributes rather than depending upon the prime
attributes or primary key.

How to remove Transitive Dependency?


Again the solution is very simple. Take out the
columns exam_name and total_marks from Score table and put them in
an Exam table and use the exam_id wherever required.
Score Table: In 3rd Normal Form

score_id student_id subject_id marks exam_id

The new Exam table

exam_id exam_name total_marks

1 Workshop 200

2 Mains 70

3 Practicals 30

Boyce-Codd Normal Form (BCNF)


A table is said to be in BCNF:
1. It is 3 NF.
2. Every determinant/column is a candidate key

Example:

exam_id Htno Sub_id

You might also like