You are on page 1of 27

Data Normalization

Normalization
Database normalization is a technique for designing relational database tables to minimize duplication of information and to safeguard the database against certain types of logical or structural problems, namely data anomalies

Objectives

Data normalization aims to derive at records which avoid

Repetition of Data

Update anomalies
Insert Anomalies Delete Anomalies

Student

Stud_Id S1001 S1001 S1001

Stud_name Address Module_id Module Smith Tom Main Street 1 DBT Smith Tom Main Street 2 DCN Smith Tom Main Street 3 SE Mountain S1060 Jones Mary Shadow 1 DBT

Instructor Dept Williams 10 Wilson 10 Sam 20 Williams 10

The Process of Normalization

Usually three steps (in industry) giving rise to


First Normal Form (1NF)


Second Normal Form (2NF) Third Normal Form (3NF)

In academia

Boyce -Codd Normal Form (BCNF) Fourth Normal Form (4NF)

Steps in Data Normalization


UNORMALISED ENTITY step1 ... Eliminate repeating groups 1st NORMAL FORM step2 ...Eliminate partial dependencies 2nd NORMAL FORM

step3 ... Eliminate Transitive dependencies


3rd NORMAL FORM step4 ... remove multi-dependencies 4th NORMAL FORM

step4 ..every determinate a key


BOYCE-CODD NORMAL FORM

Attributes - Repeating Groups

When a group of attributes has multiple values then we say there is a repeating group of attributes in the entity
COM PANY NAME ADDRESS BRANCH NAME BRANCH ADDRESS

A123

ABC Ltd

100 High St

ABC1

Manchester

ABC2

London

ABC3

Glasgow

(BRANCH_NAME, BRANCH_ADDRESS) is a repeating group

Functional Dependency

Consider a relation R that has two attributes A and B. The attribute B


of the relation is functionally dependent on the attribute A if and only if for each value of A no more than one value of B is associated.

In other words, the value of attribute A uniquely determines the value of B Stud_id -> stud_name Stud_id -> Date of Birth Module_id -> Module name Marks -> grade

Full Functional Dependency

Let A and B be distinct collections of attributes from a relation R B is then fully functionally dependent on A if B is not functionally dependent on any subset of A. (Stud_id, Module_id) -> marks

First Normal Form

A relation is in 1NF if and only if every attribute is single valued for each tuple or row. A relation is in 1NF if and only if there are no repeating groups of Attribute values.

Example of First Normal Form


ORDER NUMBER
SUPPLIER NUMBER ORDER DATE DELIVERY DATE PART NO. O463 1492 3164 PART-DESC Hook Bolt Spanner 1023 500028 09/05/88 25/07/88 QTY-ORD 150 1000 10 TOTAL PRICE 15.00 10.00 5.00 30.00

UN-NORMALISED ENTITY TYPE PURCHASE-ORDER (ORDER#, SUPPLIER#, ORDER-DATE DELIVERY-DATE, (PART#, PART-DESCRIPTION, QUANTITY-ORDERED, PRICE), TOTAL-PRICE)

Example in 1NF
ORDER NUMBER
SUPPLIER NUMBER ORDER DATE DELIVERY DATE PART NO. O463 1492 3164 PART-DESC Hook Bolt Spanner 1023 500028 09/05/88 25/07/88 QTY-ORD 150 1000 10 TOTAL PRICE 15.00 10.00 5.00 30.00

ENTITY TYPES IN 1NF PURCHASE-ORDER (ORDER#, SUPPLIER#, ORDER-DATE DELIVERY-DATE, TOTAL-PRICE) PURCHASE-ITEM-1 ( ORDER#, PART#, PART-DESCRIPTION, QUANTITY-ORDERED, PRICE)

Example
REGISTRATION FORM STUDENT NUMBER STUDENT NAME STUDENT ADDRESS COURSE NO
PM951 S212 S0843215 P. Smith 1, South Downs Hale

COURSE
Computing Biology

TUTOR NAME
T. Long S. Short

TUTOR NO
037428 096524

STUDENT (Student#, student-name, student-address) ENROLMENT (Student#, Course#, course-title, tutor-name,tutor-staff#

1st Normal Form

Process results in separation of different objects BUT anomalies may still exist

PURCHASE-ITEM-1( ORDER#, PART#, PARTDESCRIPTION,QUANTITY-ORDERED, PRICE)

PART-DESCRIPTION appears on every PURCHASE-ITEM occurrence. This may result in anomalies when updating or deleting records The problem in the example is that PART-DESCRIPTION is functionally dependent only on PART# (part of the identifier)

Second Normal Form

A relation is in 2NF if and only if it is in 1NF and all the non-key attributes are fully functionally dependent on the key. Any entity type in 1NF is transformed to 2NF

Identify functional dependencies Re-write entity types so that each non-identifying attribute is functionally dependent on the whole of the identifier

Example
ORDER NUMBER
SUPPLIER NUMBE R ORDER DATE DELIVERY DATE PART NO. O463 1492 3164 PART-DE SC Hook Bolt Spanner 1023 500028 09/05/88 25/07/88 QTY-ORD 150 1000 10 TOTAL PRICE 15.00 10.00 5.00 30.00

ENTITY TYPES IN 1NF PURCHASE-ORDER (ORDER#, SUPPLIER#, ORDER-DATE DELIVERY-DATE, TOTAL-PRICE) PURCHASE-ITEM-1 ( ORDER#, PART#, PART-DESCRIPTION, QUANTITY-ORDERED, PRICE)

Functional Dependencies

PURCHASE-ORDER PURCHASE-ITEM-1

(ORDER#, SUPPLIER#, ORDER-DATE DELIVERY-DATE, TOTAL-PRICE) ( ORDER#, PART#, PART-DESCRIPTION, QUANTITY-ORDERED, PRICE)
PRICE

ORDER# PART#

QUANTITYORDERED

PARTDESCRIPTION

In 2nd Normal Form

Decompose PURCHASE-ITEM into two entity types

PURCHASE-ITEM (Order#, Part#, Quantity-Ordered, Price) PART (Part#, Part-Description)

Original entity types are decomposed into three entity types in 2nd normal form

PURCHASE-ORDER (Order#,Supplier#, Order-Date, Delivery-Date, Total-Price) PURCHASE-ITEM (Order#, Part#,Quantity-Ordered, Price) PART (Part#, Part-Description)

Example in 2NF
REGISTRATION FORM STUDENT NUMBER STUDENT NAME STUDENT ADDRESS COURSE NO
PM951 S212 S0843215 P. Smith 1, South Downs Hale

COURSE
Computing Biology

TUTOR NAME
T. Long S. Short

TUTOR NO
037428 096524

ENTITY TYPES IN 2NF STUDENT (Student#,Student-Name, Student-Address) ENROLMENT ( Student#, Course#, Tutor-Name, Tutor-Staff#) COURSE (Course#, Course-Title)

Third normal Form

A relation is in 3NF if and only if it is in 2NF and no non-key attribute is transitively dependent on the key.

Any enity type in 2NF is transformed in 3NF

Determine functional dependencies between non identifying attributes Decompose enity into new entities

Example
REGISTRATION FORM STUDENT NUMBER STUDENT NAME STUDENT ADDRESS COURSE NO
PM951 S212 S0843215 P. Smith 1, South Downs Hale

COURSE
Computing Biology

TUTOR NAME
T. Long S. Short

TUTOR NO
037428 096524

ENTITY TYPES IN 2NF STUDENT (Student#,Student-Name, Student-Adderss) ENROLMENT ( Student#, Course#, Tutor-Name, Tutor-Staff#) COURSE (Course#,, Course-Title)

Functional Dependencies

STUDENT (Student#,Student-Name, Student-Adderss) ENROLMENT ( Student#, Course#, Tutor-Name, Tutor-Staff#) COURSE (Course#,, Course-Title)

Tutor-staff# Student# Course# Tutor-name

Example in 3NF
REGISTRATION FORM STUDENT NUMBER STUDENT NAME STUDENT ADDRESS COURSE NO
PM 951 S212 S0843215 P. Smith 1, South Downs Hale

COURSE
Computing Biology

TUTOR NAME
T. Long S. Short

TUTOR NO
037428 096524

ENTITY TYPES IN 3NF STUDENT (Student#,Student-Name, Student-Adderss) ENROLMENT ( Student#, Course#, Tutor-Staff#)

COURSE (Course#,, Course-Title) TUTOR (Tutor-Staff#, Tutor-Name)

Example of Normal Forms

Let Work be a relation scheme that stores information about projects in a large business organization. Work Assumptions:
1. Each project has a unique name, but names of employees are not unique. 2. Each project has one manager, whose name is stored in PROJMGR.
( PROJNAME,PROJMGR,STARTDATE,EMPID, HOURS,EMPNAME, BUDGET,SALARY, EMPDEPT,EMPMGR, RATING)

3. Many employees may be assigned to work on each project, and an employee may be assigned to more than one project. HOURS tells the number of hours per week that a particular employee is assigned to work on a particular project.

4. BUDGET stores the amount budgeted for a project, and a STARTDATE gives the starting date for a project.
5. SALARY gives the annual salary of an employee 6. EMPMGR gives the name of the employees manager, who is not the same as the project manager. 7. EMPDEPT gives the the employees department. Department names are unique. The employees manager is the manager of the employees department. 8. RATING gives the employees rating for a particular project. The project manager assigns the rating at the end of the employees work on that project.

Functional Dependencies

PROJNAME

PROJMGR, BUDGET, STARTDATE


EMPNAME, SALARY, EMPMGR, EMPDEPT

EMPID

PROJNAME, EMPID

HOURS, RATING

Analysis of the sample FDs

1NF: Since PROGNAME and EMPID is the composite key, each cell contain single value so WORK is in 1NF.

The partial dependencies are


PROJNAME EMPID PROJMGR, BUDGET, STARTDATE EMPNAME, SALARY, EMPMGR, EMPDEPT

Transform the relation work into an equivalent collection of 2NF relations The relations schemes are: PROJ ( PROJNAME, PROJMGR, BUDGET, STARTDATE) EMP ( EMPID, EMPNAME, SALARY, EMPMGR, EMPDEPT) WORK1 ( PROJANAME, EMPID, HOURS, RATING)

The above relations schemes are in 2NF.

3rd Normal Form


PROJ(PROJNAME,PROJMGR, BUDGET, STARTDATE) EMP1(EMPID, EMPNAME, SALARY, EMPDEPT) DEPT(EMPDEPT, EMPMGR) WORK1(PROJNAME, EMPID, HOURS, RATING)