Professional Documents
Culture Documents
matiom
UpdateSL
mirror
m
Chapter 9
NORMALIZATION
INTRODUCTION
The Nomalization Process was first
proposed by codd in 1972. Initialy
codd proposed three Normal Forms which he called first, second and third
Normal Form. The stronger definition of 3rd Normal Form was proposed by
Bovce and codd Normal Form which is known as BCNF. All these forms are
based on the functional dependencies among the attribute of the relation.
Later fourth and fifth Normal Form were proposed, based on the concepts
of Multi-valued Dependencies and Join Dependencies respectively.
Normalization is a design technique that is widely used in designing
relational model. In order to design a relational model system, we have to
decide logical structure of the database. Logical structure of the database
is designed so that basic operations on the database like Insert, Update,
Delete or Retrieve can be performed without any problems.
Definition
Normalization of data can be defined as a process during which redundant
their attributes into smaller
relation schemas are decomposed by breaking up
relation schemas that possess desirable properties.
bjectives of Normalization
are:
The objectives of the normalization process
relation schemas based
To create a formal framework for analyzing
and the functional dependencies among their
on their keys on
attributes.
based on a
T o obtain powerful relational retrieval algorithms
collection of primitive relational operators.
To free relations from undesirable insertion, update and deletion
MS ACC
To reduce the need for restructuring the relations as neu
are introduced.
anoali
ew data
To carry out series of tests on individual relation schema
the relational database can be normalized to some degree
test fails, the relation violating that test must be deco
relations that individually meet the normalization tests.d
Whe
The entire normalization process is based upon the analysis ofra
their schema, their primary keys and their functional dependencies
E.F. Codd proposed three normal forms known as irst, second.
normal form.
9.1 FUNDAMENTALS OF NORMALIZATION ***w.
Relational data base tables, whether they are derived from ER
or from some other design method, sometime suffer from some se mode
problems in terms of performance, integrity and maintainahit
For example, when the entire data base is defined as a single lity
table, it can result in a large amount of redundant data and lene
searches for just a small number of target rows. length
It can result in long and expensive updates, and
deletions-
particular can result in the elimination of useful data as an uwanta
side effect.
Consider a relation sales in which product, order no, cust name
address, sale person name are all stored in a single table.
SALES
Product Order Cust Cust Credit Date Sale Person
Name No. Name Address
Name
Vacuum cleaner 1458 Ekem Patiala 5.8.2007|Baljit
Computer 1492 Srithi Samana 8.8.2007Simar
Vaccum cleaner 1492 Hiten Bathinda 8
8.8.2008|Ekemijot
Computer 1499 Mayank Patiala 9
9.1.2008|Harsimran
Calculator 1503 Hiten Nabha 6 9.2.2008 Govind
I n this table we see that certain product are redundant which w
lead to Wasting the storage space.
I f we are having queries such as "Which customer order the vaccu
cleaner in month of August." Would require the lot of search in th
entire table. In addition update the address of Hiten.
I t is very difficult to update the table.
I f we want to delete the item purchase by Hiten. Then which o
No. is to be deleted 1492 or 1499. So it is very confusion to
such of problem.
Ina
type
T o handle these type of problem the concept of normalization cal-
ca
into the existence.
ln yhich the
This
tribute
can take
pla
will not effect the
other effect
4.
the abov
above
very easily the Attributes.
without other Attribute
s To
achieve
10Ovides
provides the data properties
base of Attributes.
affecting the
a
relation other
ae other Attributes.
framework fordesigner with
nal the
normalizationAttribue
nalization process
keysrs and on the analyzing relation
functional dependencies
de amons process
of test
ies of tests that lation schemes based
9.
2.
A
Ahat the relation can be carried based
ong their
their
attributes.
on
so that
aSut
data aned out
fails, the relation base can be an individual relation schemes
test fails,
ations thatnormalized schenm
3. Normal
that
elations that
individuallyviolation
meet
to
test must be any degree.
decomposed
ormal forms, when consideredthe normalization tests. posed into
in
e. When
wn
notguarantee a in isolation
good data base
from other factor, do
DrOceeding further about design.
portant concepts which will benormalization
used in form we will discuss
ider is normalization form.
SOIe
Consi the relation
Student, Roll No. is Prime
Prime Attribute : Attribute.
Hke
The Prime
a Primary Key. With the
Attribute is the attribute which act
help of
information of all other non primary primary key we
key attributes. can obtain the
STUDENT
Name Roll No. Address Number
P.K.
STUDENT is a relation with the attributes Name, Roll No., Address
the Roll No.,
and Number. Roll No. is the Primary key. By knowing
informations of the students.
we can get all other the one
schema which has more man
2. Candidate Key : A relation
candidate key.
minimal" key, each is called attribute if
attribute is called non prime
, Non Prime Attribute : An
attribute.
it is not primary key in a table and that key
the additional key is
Super Key : If we
add
type of key attribute
key then such
a s a primary
y als0 act
called Super Key.
12 FUNCTIONAL DEPENDENCIES DBMS.
role in
important two
very between any
plays that exist
Functional depende relationship
is called
determined.
is and other
Fund dependence
determinant
called the
fields one field is
PMS VWI
MMS WIUA
F o r each value of determinant there is associated one a
TH MSMS ASN
ne and omly
value of determined.
If X is determinant and Y is determined then we say that
determined Y and is graphically represented as X
x functun
Y
Y is functionally determined by X
Y
10 2
15 3
20
25 5
30 6
Each value of X there is associated and only one value of vv
Example The following
: table illustrate that X does not
funchs.
dependent on Y nction
5
10 2
5 3
5 4
10 20
Because for X 5 there is more than one value of
=
Y
And X 10 there is more than value of Y. n this
=
hema a se of
explain the functional
ed eet set of functional
functional dependencies F. Supposedependence F on Ris
R is logica
Adependencies
10
B R =(A, B, we
C, G, H,are) give
siven a relatior
A C
CG H NAME
CG I
B H ROLL NO. SEX
The functional dependency
ically
A H ADDRESS
implied. That is we can show
onal dependencies
functional
Z.
y and X
>>
1 DBMS WITH MS
ACC
8
R1 X 1
1
1 2
2
2 2
1
2 2
3 3 3
The fifth and sixth rows of R1 (when the X value is 2) satisfy the
interchange conditions in preceding definition. In both row of value f
2 so the MVD condition does not hold. The seventh row {8, 3, 3) satisk.
definition trivially.
In table R2
Ra
2
2
2
2
The Y value of fifth and sixth are different and inter changing the-
and 2 values for Y results in row (2, 2, 2) that does not appear in the tahle
Thus in Ro there is no MVD between X and Y or Y and Z even though th
irst four row satisfy MVD conditions.
In Table R3
R3 X
1 1 1
2
2 2 2
The first three rows do not satisfy the cretrion for an MVD, sino
changing MVD Y from 1 to 2 in second row results in a row that appear n
table. Similarly changing Z from 1 to 2 in the third row results in a not
appearing row. Therefore, R3 does not have any MVD between X and 1
o
between X and Z.
requirements that B
function
depends A) and >A
C
(B not
O
not fur
onally depends A) are >A (C Figure 9.1
A and B are nonprime attributes.necessary to ensure that
attributes
FORMS OF NORMALIZATION
girst
5.1 First
Normalization Form (1 NF)
tion
relatio is said
to be in the
l
aion has at most a single value. In NF
at most if and
other words
yonly ifil every entry of the
elation every entry of the
relation is said to be 1NF if and only if all
the
the atomic value or single value. underlying domain contains
Consider a Relation Salary
Salary
Name Basic Pay DA HRA Total
Ram 8000 5000 200 13200
8000 3000 11000
Ram, Gita 6000 2000,2000 200 8200
In this relation some of cels are empty i.e. there is no value in it. And
i some cells there is more than values exists. So it is not in 1NF.
To be in the INF this relation must look like
Salary
Name Basic Pay DA HRA Total
6000
Anomalies in the 1NF of
which will lead to variety
nere is redundancy in 1INF relation
data anomalies.
dificult.
Insertion is very
(i) want to delete the
not be done. If we
re
(ti Deletion can
two records will
be deleted. So it is very difficult t of Ram
to update the record of Ram. Then
ete aveg
(iv) If we want
is very difficult to
decide.
ich
which record -
updated it
Normal Form (2 NF)
9.5.2 Second
be in 2NF if it is in
A relation is said to
PROJ# HOURS P HANDLER P NAME
NO. LOCATION
fd
fd2
fd2
cONVERTED INTO 2 NF
NO PROJ # HOUR NO PHANDLER
() (0)
PROJ# PNAME LOCATION
()
(9 1 NF
(i) Every non key attribute is fully dependent on
Primary Key.
Now every non key attribute is Functional
has
Dependent on Primary k
overcome the problem of updation still we have a
problem in normal
Problems in 2NF
( Insertion. We can not insert the fact that
particular project handler.
a
particular projec
(i Deletion. If we delete the tuple which contain
the No. it will de=
not only the information for the
concerned project but also os
information of handler in Table I.
(ii Updation. To change the value is very difficult. To
problem we move to the nextnormal from i.e. 3NF.
overcom
9.5.3 Third Normal Form (3 NF)
A relation is said to be in third normal form
( It is in 2NF. (3NF) if and onuy
(ii Every non key attribute is non on
primary key. transitively dependel
NOFMALIZATO ON
151
X
X1 Amabala Amabla 30
X2 Patiala Patiala 40
X3 Malerkotla Bathinda 35
X4 Sangrur Sangrur 45
X5 Bathinda
ion X and Y are having the Problem of Transtivity.
t h t h e relation
XCfCty City Status
relation we have removed the trans
Isitivity.
Thus b y s u c h
u l e to
RRule
ttransform a relation into Third Normal Form
of table
9 . 5 . 3 . 1
Third Norm mal
a form applies that every non-prime a t t r i b u t e not be
Third on primary key, or we can say that, there
should
mustbedependent
d
another non-prim
at a non-prime attribute is determined by be removed
the case t h a
transitive functional dependency should
te So this
altribute
be in Second Normal form. For exampie
and also the table must
table with following fields.
the tabl
ler a
consi
A* A
Convert to
B B C
C
Figure 9.2.
Student_Detail Table
Street city Zip State
S t u d e n t n a m e DOB and state
Studentid but street, city
is Primary key, called
table Student_id other fields is
In this
The dependency between
zip and m o v e the street
depends upon Zip. Hence to apply
3NF, we need to
transitive dependency. as primary key.
table, with Zip DOB Zip
state to n e w Student_name
city and StudentDetail
Table : Student_id
New state
Zip Street city
Address Table : transtive dependency is,
The advantage
of removing
is reduced.
Amount of data duplication
Data integrity achieved.
3NF Relations caused either by
9.5.3.2 Data Anomalies in data anomalies
of
to get rid of the
a
dependencies
helped us or by
T h e 3NF
on the Primary Keya t t r i b u t e .
dependencies
another nonprime
ransitive
attribute o n
nonprime
Relations in 3NF are still susceptible to data anomalie
es partic
when the relations have two overlapping candidate ke.
nonprime attribute functionally determines a primeeys or
ME.ANC
The following exanmple will illustrate this.
Example: Consider the Manufacturer relation shoWn .
attrlbuteWhen
each manufacturer has a
items (identified by their
unique ID and name. Man
unique itemn
elow wh
numbers) in the amounte,ind
Pro.
Manufacturers may produce more than one item an
manufacturers may produce the same items.
Manufacturer ( Id No, Name, Item_No, Quantity)
ffer
Manufacturer
ld No Name tem No
M101 Electronics USA H3772 QuantiN
M101 Electronics USA J08732 1000
M101 Electronics USA Y23490 700
M322 Electronics-R-Us 200
H3772
900
This
Manufacturer relation has two candidate keys:
and (Name, Item_No) that (ID, Item
is in 3NF because
overlap on the attribute Item_No. The rel
there is only one nonprime
impossible that this attribute
attribute and therefore
can determine another nonprime attrih
The relation Manufacturer is
for susceptible to update anomalies. Con-
example the case in which one of the manufacturers
changes its no
If the value of this attribute is not changed in all of the
correspond-
tuples there is the possibility of having an inconsistent
database.
9.5.4 BCNF
BCNF is better than 3NF. A relation is in BCNF
must be in 3NF I
vice versa is not true that a relation is in 3NF
not necessarily be in
BC-
BCNF state that
Arelation Ris in BCNF if and only if every determinant is
Here determinant is a a candidate ke
simple attribute or composite attribute on which sor-
other attribute is fully
functionally dependent.
For example Qty is FFD on (Sno, Pno)
(Sno, Pno) Qty
(Sno, Pno) is composite determinant
Sno S name
Here Sno is simple attribute determinant.
In order to show the difference between 3NF and BCNF We
consider the overlapping of the
Two candidate
candidate key.
key overlap
each and have attribute
if they involve two or more ttrib
in common. au
xample in
f o r
relation Invoice
Iaroice
Tnvoice Name 153
101
101
Patiala Steel temNo. Cty
Patiala Steel 3275 Oty
101 Patiala Steel 3371 100
102 Cotton Mi 20
7312
Here N a n is
Name unique for each 1274 500
f the relation is invoice 600
D of no
F Do
navoice
l n v o i c e
common tat
C
of candidate
ribub0,
didate keys. Doth the c keys,because there are
ame, Item_No) out of
Possible
FD diagram of this case is:
eys, so this is a
Qantity
Ttem_No
Qantity tem_
Name
ld_No ld No
Name
Figure 9.3.
Here
ra hoth the relations are in 3NF, because every non-key attribute is
-transitively fully functional dependent on the primary key.
fo ahove relation, there is only one non-key attribute i.e.
t is FFD and non transitively dependent on the primary kev.
Quantity and
Name, Id_No are not non-key attributes because they can participate
as shown in FD diagram.
into the primary key
But, Manufacturer relation is not in BCNF because this relation has
four determinants
(Id_no, Item_No)
(Name,Item_No)
Item_No
Name
Example
For consider a relation
example,
SSP (Sno, Sname, Pno, Qty)
for each Sno.
Nere, Sname is considered unique
FD of above relation is
Sno,Pno) 9ty
Sname,Pno) 9ty
Sno Sname
Sname Sno
156
DBMS WITH MS
overlapping candidate keys,beca
This relation has two
candidate keys (Sno, Pno) and (Sname, Pno) ose of whi the
two composite
the candidate keys, so this is
in both
is c o m m o n attribute ue cos
overlapping of candidate keys.
Possible FD diagram of
this case is:
Qantity Pno
Qantity Pno
Sno Sname
Sname Sno
Figure9.4
in 3NF, because every non-ke
Here, both the relations are on the primary keyb
attri
non-transitively fully functional dependent
attribute i.e, Oh
In above relation, there is only one non-key.
FFD and non transitively dependent on the primary key. and
attributes because they can
an particin
Sname, Sno are not non-key
into the primary key as shown in FD diagram.
But, SSP relation is not in BCNF because this relation ho
1as i
determinants:
(Sno, Pno)
(Sname, Pno)
(Sno)
(Sname)
Out of these four determinants two determinants (Sno, Pno) and (Sta
Pno) are unique but Sno and Sname determinants are not candidate k
In order to make this relation in BCNF we non-loss decompose
relation in two projections SN (Sno, Sname) and SP (Sno, Pno, Qty.
SN relation has two determinants Sno, Sname and both are uniqu
SP has one determinant (Sno, Pno) and is also unique.
These two relations (SN, SP) removes all anomalies of SSP relation
5NF is of little practical use to the database designer, but it h
interestfrom a theoretical point of and a discussion of it is inci
view
here to complete the picture of the further normal forms.
In of
all the further normal forms discussed so far, no loss decomps
was achieved by the decomposing of a single table into two separate
No loss decomposition is possible because of the availability of he
operator as part of the relational model. In considering 5NF, coIs
must be given to tables where this non-loss on
achieved by decomposition into three or more decomposition ca
separate Laa
decomposition is not always possible as is shown the by following
NORMALIZATION
157
Consider the table
AGENT COMPANY_PRODUCT (Agent, Company, Product_Name)
This table lists agents, the companies they work for and the procauct
those the
sell for
they sell
companies. The agents do not
necessaruyomple
An example of
of
roducts suppliedbe:by
products
the companies
they do business with.
this table might
Agent Product_Name
Company
Savy Nut
ABC
Savy ABC Screw
Savy CDE Bolt
Vicky ABC Bolt
The table is necessary in order to show all the information required.
sells ABC's Nuts and Screws, but not ABC's Bolts. Vicky
Savy, for example, The tabie
ior CDE and does not sell ABC's Nuts or Screws.
is not an agent
it contains no multi-valued dependency. It does, however,
is in 4NHF because fact that Savy 1s
an element of redundancy in that it records the
contain this redundaney
1or ABC twice. But there is no way of eliminating
an agent into its
information. Suppose that the table is decomposed
without losing
two projections,
P1 and P2.
P1
Agent Company
ABC
Savy
Savy CDE
ABC
Vicky
P2
Agent Product Name
Nut
Savy
Screw
Savy
Savy Bolt
Bolt
Vicky which
been eliminated, but
the information about
has
The redundancy
and which of these products they
supply
companies make which products over
natural join of these projections
to which agentshas been lost. The
the 'agent' columns is: Product_Name
Agent Company
Nut
ABC
Savy ABC Screw
Savy ABC
Bolt
Savy CDE
Nut
Savy CDE Screw
Savy CDE Bolt
Savy ABC Bolt
Vicky
158
The table from this join
OBMS WITH MG A
resulting Is
spurious, since the ast
of the table contains incorrect information. Now suppose thate
table were to be decomposed into three tables, the two projection
two eo
projectio
steriskeq,
P2 which have already shown, and the final, possible projectionP
P3 P3.
Company Product Name
ABC Nut
ABC Screw
ABC Bolt
CDE Bolt
If a
join is taken of all three projections, 1irst or Pl and P2
spurious) result shown above, and then of this result with P3 w de
P1
Agent
CESS Savy
ompany
159
recomposition
of the original the order in
1s a correct achieved. Again,
s three projections
was
result. The original
O n t o the affect the
final
h are performed does not non-loss
decomposable
was
ons because it
v1olated 5NF simply
stS ththree projections.
in
g , 6A C O
STUDY OF DA data
nalization: Process
o r g a n i z i n g
of efficiently
N o r m
RELATIONS
(attributes group
g r o u p e d together)
Goal:
of data,
data, relationships
Eliminate
Eliminate
redundant data
Ensure ata in aa Data Base.
Data Base.
Normalizauon:Series
ies or violates the of
satisfies or tests on a relation to determine whetner it
Atate: meet pracucal business requirements of a normal form.
Normalization: A requirements.
technique for producing a
sirable properties, given set of relations with
the data
Redundant Relation requirements of an
enterprise.
Staff Relation
Staff No. S Name S Address Position Salary Branch No. |
101 Ram Ludhiana Clerk 30000 5000
102 Sham Patiala Manager 50000 5001
101 Ram Ludhiana Clerk 30000 5000
Branch Relation
Branch No. Branch Address Telephone No.
Ludhiana 367546
5000 234569
Patiala
5001
456780
5002 Rajpura
Position Salary
Staff No. S Name S
Address Branch N
101 Ram Ludhiana Clerk 30000
5000
102 Sham Patiala Manager 50000 5001
101 Ram Ludhiana 30000Clerk 5000
First normal form (1NF): A relation in which the intersection of
row and column contains one and only one value.
UNF 1NF:
Remove repeating groups:
Entering appropriate data in the empty columns of rows.
Placing repeating data along with a copy of the original key attrib
in a separate relation.
Street 31 Aug.
350
No Name
NO Kay 1993 C040
1995 Tina
Glasgow
John 5 Novar Driver, 1
Sept. 1 Sept.
Murphy
PG16
Kay Glasgow 1995 450
1996 CO93
CRTO Aline 6 Lawrence 1 Sept. Tony
PG4 Stewart| Street, Glasgow 1992 10 June Shaw
350 CO40
CRS6 2 Manor Road,
1993 Tina
PG36
Aline 10Oct. 1 Dec. Murphy
StewartGlasgow 1993 375 Co93
CRS6 1994 Tony
Aline 5 Novar Drive, 1 Jan Shaw
PG16 10 Aug.
Stewart | Glasgow 1995 1995
450 CO93
CRS6 Tony
Dependencies in Shaw
nction.
Customer_Rental Relation
Customer_N
T No. Property_No. - RentStart, RentFinish
Customer_No - CName
No
P r o p e r t y _ N o
PAddress, Rent, Owner_No. Oname
Owner_No. - OName
INF 2NF: