You are on page 1of 25

NORMALIZATION

LECTURE 5
Normalization
Normalization is the process of efficiently organizing data in a database. There are
two goals of the normalization process:
■ Eliminating redundant data (for example, storing the same data in more than one
table)
■ Ensuring data dependencies make sense (only storing related data in a table).
Both of these are worthy goals as they reduce the amount of space a database
consumes and ensure that data is logically stored.
Problem without Normalization
■ Without Normalization, it becomes difficult to handle and update the database,
without facing data loss.
■ Insertion, Updation and Deletion Anamolies are very frequent if Database is not
normalized.
Data redundancy
Database Anomalies
Database anomalies are the problems in relations that occur due to redundancy
in the relations. These anomalies affect the process of inserting, deleting and
modifying data in the relations.
Some important data may be lost if a relation  is updated that contains database
anomalies. It is important to remove these anomalies in order t perform
different processing on the relations without any problem.
Anomalies
Example: consider the table below
Anomalies cont’d

■ Update anomaly: to update the a new branch address of B003 from 163
main, st Glasgow to 164 main, st Glasgow all the rows have to be changed
that otherwise there will be data inconsistency

■ Insertion anomaly: To insert details of a new branch that currently has no


members of staff into the StaffBranch relation, it is necessary to enter nulls
into the attributes for staff, such as staffNo. However, as staffNo is the
primary key for the StaffBranch relation, attempting to enter nulls for staffNo
violates entity integrity

■ Deletion anomaly:-if we delete the tuple for staff number SA9 (Mary Howe)
from the StaffBranch relation, the details relating to branch number B007
are lost from the database
Normalization rule are divided into
following normal form.
■ First Normal Form
■ Second Normal Form
■ Third normal form
First Normal Form (1NF)
First normal form (1NF) sets the very basic rules for an organized database:
■ Eliminate duplicative columns from the same table.
■ A column should hold values of the same type
■ Each column should have a unique name
Second Normal Form (2NF)
Second normal form (2NF) further addresses the concept of removing
duplicative data:
■ Meet all the requirements of the first normal form.
■ Remove subsets of data that apply to multiple rows of a table and place them
in separate tables.
■ Create relationships between these new tables and their predecessors
through the use of foreign keys.
Third normal form

Third normal form (3NF) goes one large step further:


■ Meet all the requirements of the second normal form.
■ Remove columns that are not dependent upon the primary key.
Dependencies
Functional dependencies
■ An important concept associated with normalization is functional
dependency, which describes the relationship between attributes
■ The attribute B is fully functionally dependent on the attribute A if each value
of A determines one and only one value of B
■ Consider a relation with attributes a and B, where attribute B is functionally
dependent on attribute A. If we know the value of a and we examine the
relation that holds this dependency, we find only one value of B in all the
tuples that have a given value of A, at any moment in time
Dependencies
Transitive dependence
■ A condition where A, B, and C are attributes of a relation such that if A ® B
and B ® C, then C is transitively dependent on a via B
■ Consider the following functional dependencies within the StaffBranch
relation shown :
staffNo ® sName, position, salary, branchNo, bAddress
branchNo ® bAddress
The transitive dependency branchNo ® bAddress exists on staffNo via
branchNo.
In other words, the staffNo attribute functionally determines the bAddress via
the branchNo attribute and neither branchNo nor bAddress functionally
determines staffNo.
Example

■ To get a better idea of the normalization process, consider the simplified


database activities of a construction company that manages several building
projects. Each project has its own project number, name, employees
assigned to it, and so on. Each employee has an employee number, name, and
job classification, such as engineer or computer technician.
■ The company charges its clients by billing the hours spent on each contract.
The hourly billing rate is dependent on the employee’s position. For example,
one hour of computer technician time is billed at a different rate than one
hour of engineer time
Consider the table below
■ Step 1:
Eliminate the Repeating Groups Start by presenting the data in a tabular format,
where each cell has a single value and there are no repeating groups. To
eliminate the repeating groups, eliminate the nulls by making sure that each
repeating group attribute contains an appropriate data value
■ Step 2: Identify the Primary Key
Even a casual observer will note that PROJ_NUM is not an adequate primary key
because the project number does not uniquely identify all of the remaining
entity (row) attributes. For example, the PROJ_NUM value 15 can identify any
one of five employees. To maintain a proper primary key that will uniquely
identify any attribute value, the new key must be composed of a combination of
PROJ_NUM and EMP_NUM
■ Step 3:
Identify All Dependencies The identification of the PK in Step 2 means that you
have already identified the following dependency:
PROJ_NUM, EMP_NUM → PROJ_NAME, EMP_NAME, JOB_CLASS, CHG_HOUR,
HOURS
Dependencies
2NF
3NF
Example 2
Normalize the table below
1NF
2NF
3NF
DE normalization
■ Although normalization is a very important database design ingredient, you
should not assume that the highest level of normalization is always the most
desirable. Generally, the higher the normal form, the more relational join
operations required to produce a specified output and the more resources
required by the database system to respond to end-user queries.
■ A successful design must also consider end-user demand for fast performance.
Therefore, you will occasionally be expected to denormalize some portions of a
database design in order to meet performance requirements.
■ Denormalization produces a lower normal form; that is, a 3NF will be converted to
a 2NF through denormalization. However, the price you pay for increased
performance through denormalization is greater data redundancy
NORMALIZE THE TABLE BELOW

You might also like