You are on page 1of 13

Normalization : 1NF, 2NF, 3NF, BCNF, 4NF, and 5NF

Normalization refers to rigorous standards for good design, designed formally, and methods for
testing a DBs design as given below.To design a good relational database schema, by which an existing
schema is modified to bring its component tables into compliance fulfillment! with a series of progressive
normal forms as below.
Normalization is a process of loo"ing at the tablerelational schema! in the #DB$% to test whether
they pass a series of test. The tests are used to avoid update anomalies, redundancy, and ambiguity from the
data base schema. &f the table fails bad relation! in a test, the solution is decomposing the attributes of bad
relation table! into smaller tables.
The goal of database normalization is to ensure that every non'"ey column in every table is directly
dependent on the "ey, the whole "ey and nothing but the "ey and with this goal come benefits in the form of
reduced redundancies, fewer anomalies, and improved efficiencies (hile normalization tends to increase
the duplication of data, it does not introduce redundancy, which is unnecessary duplication.!.
Normalization up to B) normal form B)N*! is based on functional dependencies and "ey
About the Key:
)olumns! C is primary key for table if+
!roperty 1: ,ll columns in T are functionally dependent on C
!roperty 2: No sub'collection of columns in C assuming ) is a collection of columns and not
-ust a single column! has .roperty /
,ny combination of attributes that uni0uely identifies each tuple in a relation is .
Candidate Key#:
,ny minimal combination of attributes that uni0uely identifies each tuple in a relation 1 you cant
remove any attribute and still have uni0ue identification.
Alternate Key# $ "e%ondary Key#:
)andidate "eys not chosen as primary "eys are called alternate "eys.
.rimary key:
he candidate "ey chosen to be the uni0ue identifier for a relation is called .rimary "ey.
!rime Attribute:
,n attribute that i# a member of a candidate "ey is called .rime ,ttribute.
Non&!rime Attribute:
,n attribute that is not a member of a %andidate "ey is called .rime ,ttribute.
1' Fir#t Normal Form+
The first normal form or /N*! re0uires that the values in each column of a table are atomi%. By
atomic we mean that there are no multiple values in a column.
/N* isnt very interesting 1 it is a stepping stone to others. *or instance, if we have s2, name, city!,
where each supplier may have several branch locations in different cities, then the relation has a domain that
allows sets of cities for values, thus they arent atomic.
Not in /N*
Normalized to /N*.
The #olution is to decompose the table into 3 tables+ *#+, name, and *#+, lo%ation,'
2' "e%ond Normal Form:
, relation is in 3N* if and only if it is in 1NF and e-ery nonprime attribute is .ully .un%tionally
dependent on the primary )omposite! "ey.
()ample 1:
(here, the /nderline represents the !rimary 4ey attributes and the arro0# represent the
Fun%tional 1ependen%ie# of the relation #/.
The primary "ey is s2, p2!, and *Ds are
city tax
#2 tax,city
#2,p2 0ty,
(here tax is determined by the city, the tax and city are determined by #2 and 0ty is determined by both the
#2,p2 . Note that this structure will have update anomalies as below.
Anomalie# be%au#e o. !artial 1ependan%y:
2n#ert' 5ou cannot enter the existence of a new supplier and city unless that supplier is shipping a part. This
is because of the integrity rule, that all fields in the "ey must have values. p2 is part of the "ey.
1elete' &f a supplier has only one shipment, and it gets deleted, you also delete all "nowledge of that
supplier, such as the city.
/pdate' Because of the redundancy, if the city of a supplier moves, then you must either find all
occurrences of the supplier and change the city or change one occurrence, and have an inconsistent DB.
The #olution is to divide this into two tables, where the "ey of the new table will be the one that was
independently determining the values of some attributes. %o now we have
The functional dependency diagram shows that each of them now contains attributes that are .ully
dependent on the primary "ey of each relation. &nsert 1 can now insert the existence of a supplier into #3a,
without a shipment. Delete 1 can now delete a shipment from #3b without losing information about the
supplier. 6pdate 1 can now update the city in only one place.
3' hird Normal Form+
, relation is in 7N* if and only if it is in 2NF and e-ery nonprime attribute is non&tran#iti-ely
dependent on the primary key'
Third Normal *orm 7N*! re0uires that all %olumn# depend dire%tly on the primary key' Tables
-iolate the Third Normal *orm when one column depends on another column, 0hi%h in turn depend# on
the primary "ey a transitive dependency!.
8ne way to identify transitive dependencies is to loo" at your table and see if any columns would
re0uire updating if another column in the table was updated. &f such a column exists, it probably violates
, transitive dependence is when and hold. Therefore, the transitive dependency also holds. This can be seen in the functional dependency for #3a.

Tax rate is dependent on city. City is dependent on s#' Therefore, tax rate is dependent on s# through city.
This shows by the fact that there are arrows that originate from places other than the "ey. This also gives
Anomalie# be%au#e o. tran#iti-e dependen%y:
2n#ert 1 cannot enter that a city has a tax rate unless we have a supplier there. ,gain, this is because we
cannot have a null primary "ey.
1elete 1 if there is only one supplier in a city, when we delete the supplier, we delete the tax information for
that city.
/pdate 1 if we change the tax rate for a city, we must either /. find all suppliers in that city and change the
status for it or 3. change only one and have an inconsistent DB.
The solution is to break the relation into two relations. The point here is to get rid of the extra arrows, and
ma"e simple functional dependencies. %o the two new relations are
Now the functional dependency diagrams are simple, there are no transitive dependencies, all
attributes are fully dependent on the "ey, and they are in 7N*.
4' Boy%e&Codd Normal Form *BCNF,+
, relation is in B)N* if and only if every determinant is a candidate "ey. , determinant is any
attribute on which another attribute is functionally dependent.
This Normal *orm was developed to deal with relations where the relation has multiple candidate
keys, and the candidate "eys are composite, and the candidate "eys are overlapped. (hen a relation has
more than one candidate "ey, anomalies may occur even though the relation is in 7N*.
BCNF is based on the concept of a determinant. , determinant is any attribute simple or composite!
on which some other attribute is fully functionally dependent. A relation is in BCNF is, and only if, every
determinant is a candidate key.
(here, sname is also uni0ue. The candidate "eys are s2, p2! and sname, p2!. s2 determines sname,
and sname determines s2, so they are both determinants. But they are not candidate "eys 1 they are part of
different candidate "eys.
update anomaly be%au#e o. multiple 1eterminant:
&f you update the sname in one tuple, you must either update it in all tuples with the same s2, or be
inconsistent. Ditto with p2.
The #olution is to ma"e to pro!ections of "#$
Now, each determinant in each relation is also a candidate "ey. 5ou can update sname or s2 in one
place ta"ing into account the issues of foreign "eys!.
Comple)ity o. more table#:
, complete normalization of tables is desirable, but you may find that in practice that full
normalization can introduce complexity to your design and application. 3ore table# often lead more 452N
operation#, and in most database management systems DB$%s! such 98&N operations can be %o#tly,
leading to de%rea#ed per.orman%e'
The "ey lies in finding a balance where the first three normal forms are generally met without
creating an exceedingly complicated schema.
5' 3ulti-alued 1ependen%ie# 6 4NF:
, problem with multi'valued dependencies $:D! occurs when you are trying to express two
independent /+N relations, or multi'valued attributes, in the same relation. *or example, in your initial
design process, you may have seen something li"e+
(here, the manager is associated multi'determines! a set of phone numbers, and also a set of
employees, but the phone numbers and the employees have nothing to do with each other. 8f course, you
cant have a relation that loo"s li"e the ones above 1 it is excluded by /N*.
5ou are trying to express the idea that the manager is associated with a #et of employees, and a #et of phone
numbers, but that the employees and the phone numbers are independent of one another. %o, you might
design a relation that loo"s li"e+
But that implies a relationship connection! between <<<'/3/3 and =eorge. To avoid that
appearance, you would have to store all combinations of phone2 and employee.
0o /+N relations or multivalued attributes!, ,+B, ,+), where B and ) are independent of each other.
,'>>B , determines a set of $ultiple! values B
,'>>) , determines a set of $ultiple! values )
The only time a multi'valued dependency m-d! is a problem is when you have more than one m-d, and the
B and ) values are independent.
, tri-ial m-d is one where+
/. The B attributes! are a subset of the , attributes. That is, if you made them distinct from each other,
there would no longer be an m-d. ;.g., ,B)'>>B.
3. The union of A and B ma"e up the entire relation 1 there are no other attributes in the table.
8therwise, you have a nontri-ial m-d, and these are the potential problems.
There are lots of redundancies allow room for anomalies. Note that the relations with non'trivial m-d7# tend
to be all&key relations, where the "ey is the entire relation.
he %ure: 4NF
, relation is in ?N* if for every nontri-ial m-d A&88B, C is a super"ey any combination of
attributes! that is a uni0ue &D in the relation non'minimal! for the relation. The manager table used as an
example above is not in ?N*. m9r, phone+ i# a nontri-ial m-d because phone+ is not a subset of mgr, and
there is also employee. %imilarly, mgr, employee is a nontrivial mvd. ,s usual, the way to get a relation into
4NF i# to de%ompo#e it, to get the m-d# into #eparate relation#+
(hich are now trivial mvds, ma"ing up the entire table@ This decomposition will have the lossless
-oin property.
he 5-erall 2dea
#emember that the goal here is to get a good design. %tarting from an ;# diagram is one way,
although you still mgiht want to chec" normalization of tables. But starting with a bunch of tables and then
normalizing them or starting with one enormous table! is another approach. (e have been tal"ing about
normalization as something that you do regarding -ust one table in the database. &t is also important to loo"
at your DB design in terms of how the tables relate to each other, and how you can combine them. $erely
having a bunch of tables in 7N* or B)N* is not enough.
"ome de.inition#:
&n a database design, we have a de%ompo#ition 1 of the uni-er#al relation :. This is the way that
all of the attributes have been decomposed into tables. There is a set of .un%tional dependen%ie# F that hold
over the attributes of #A this depends on the semantics of the DB and how things wor" in the world it
1' 1ependen%y !re#er-in9 1e%ompo#ition:
&n decomposition, it is possible to lose a functional dependency 1 this is undesirable, so a good
decomposition will preserve dependencies. There are two ways of storing functional dependencies+ they can
be in the same table, or they can be inferred from different tables.
2' ;o##le## *Additi-e, 4oin#:
,nother important feature of a good decomposition is that it gives lossless -oins. This is the problem
of spurious tuples. The term BlosslessC refers to the problem of losing information 1 the way that you lose
information here is by getting noise spurious tuples! into your table.
!ropertie# o. lo##le## <oin de%ompo#ition:
/. For 2&relation 1B #%hema#: the attributes in both relations must functionally determine either those
attributes that appear in only the first relation, or those that appear in only the second relation.
3. 8nce you have established decomposition with the lossless -oin property, you can further decompose
one of its tables without losing the property. %o, to de%ompo#e and maintain lo##le## <oin#:
*or each table in the DB that isnt in B)N*, find the functional dependency that is in
violation that is, contains a determinant that is not part of a candidate "ey!, and brea" the relation
into two. 8ne relation contains the , and B ,B! attributes from the functional dependency ,
'>>B). The other contains the rest of the attributes i.e , )!.
5ou cant always perform the BidealC decomposition that is in B)N* and preserves dependencies.
5ou may only be able to get to 7N*. 5ou must then decide whether to leave it there, and build in protection
for update anomalies, or to decompose even further, with the resulting loss of performance.
&n terms of design, remember that it isnt a good idea to design a table that will get too many nulls. &t
is better to brea" it up into another table. Dowever, this could also result in the problem of dan9lin9 tuple#.
The representation of a BthingC is bro"en up into 3 tables. To get the full information on the BthingC, you
-oin the tables together. Dowever, if some tuples have either null value on a -oin attribute, or dont appear at
all in one table, they wont appear in the result, unless you "now in advance that you should do an outer
=' Normal Form *5NF, and 4oin 1ependen%ie# :
5NF, also "nown as !ro<e%t&<oin normal .orm !4$NF! is based on the idea of a lossless 98&N or
the lac" of a -oin'pro-ection anomaly. This problem occurs when you have an n'way relationship, where
n>3. , 0uic" chec" for EN* is to see if the table is in 7N* and all the candidate "eys are single columns.
4oin dependen%y
, <oin dependen%y 41!, denoted by 9DR
, R
, ..., R
!, specified on relation schema R, specifies a
constraint on the states r of R. The constraint states that every legal state r of R should have a non&additi-e
-oin decomposition into R
, R
, ..., R
A that is, for every such r we have
Note an !"# is a special case of a $# where n % &.
Fossless'-oin property refers to when we decompose a relation into two relations ' we can re-oin
the resulting relations to produce the original relation.
)onsider a table of supply with no $:D is in ?N* but not in EN* if it has 9D#/,#3,#7!.
The %6..F5 relation with the -oin dependency is decomposed into three relations #/,#3, and G7
that are each in EN*.
,pplying a natural -oin to any two of these relations produces spurious tuples, but applying a natural
-oin to all three together does not.
Ba#i% 1e.inition# in Normalization
What are update Anomalies?
The .roblems resulting from data redundancy in an un'normalized database table are collectively
"nown as update anomalie#. %o any database insertion, deletion or modification that leaves the database in
an inconsistent state is said to have caused an update anomaly. They are classified as
Insertion anomalies
Deletion anomalies
Modification anomalies
2n#ertion anomalie#: To insert the details of a new member of staff located at branch B/ into
the bl>"ta..>Bran%h Table shown above, we must enter the correct details of branch number B/ so
that the branch details are consistent with the values for branch B/ in other rows.
To insert the details of a new branch that currently has no members of staff into
the TblH%taffHBranch table, it is necessary to enter nulls for the staff details which is not allowed
as #ta..21 is the primary "ey. But if you normalize TblH%taffHBranch, which is in %econd Normal *orm
3N*! to Third Normal Dorm 7N*!, you end up with bl>"ta.. and bl>Bran%h and you shouldnIt
have the problems mentioned above.
1eletion anomalie#: &f we delete a row from the TblH%taffHBranch table that represents the last
member of staff located at that branch, for e.g. row with Branch numbers BJ,B7 or B?! the detals about
that branch are also lost from the Database.
3odi.i%ation anomalie#: %hould we need to change the address of a perticular branch in
the TblH%taffHBranch table, we must update the rows of all staff located at that branch. &f this
modification is not carried out on all the relevent rows, the database will become inconsistent.
Whats a spurious tuple?
, spurious tuple is, basically, a record in a database that gets created when two tables are -oined
badly. &n database'ese, spurious tuples are created when two tables are -oined on attributes that are neither
primary "eys nor foreign "eys.
What is Functional Dependency? what are the different types of Functional Dependencies?
*unctional Dependency is a constraint between two sets of attributes from the database.
Fun%tional 1ependen%y describes the relationship between attributes columns! in a table. They are
fundamental to the process of Normalization.
*or example, if A and B are attributes of a table, B is functionally dependent on A, if each value
of A is associated with exactly one value of B so, you can say, IA functionally determines BI!.
*unctional dependency between , and B
,ttribute or group of attributes on the left hand side of the arrow of a functional dependency is
referred to as ?determinant?'
,ttribute or group of attributes on the right hand side of the arrow of a functional dependency is
referred to as ?dependent?'
%imple example would be "ta..21 functionally determines !o#ition in the above tables.
Fun%tional 1ependen%y %an be %la##i.ied a# .ollo0#:
Full Fun%tional dependen%y &ndicates that if A and B are attributescolumns!of a table, B is fully
functionally dependent on A if B is functionally dependent on A ,but not on any proper subset of A.
;.g. %taff&D''''>Branch&D
!artial Fun%tional 1ependen%y &ndicates that if A and B are attributes of a table , B is partially
dependent on A if there is some attribute that can be removed from A and yet the dependency still
%ay for ;x, consider the following functional dependency that exists in the bl>"ta.. table+
%taff&D,Name '''''''> Branch&D
Bran%h21 is functionally dependent on a subset of , %taff&D,Name!, namely %taff&D.
ran#iti-e Fun%tional 1ependen%y: , condition where A , B and C are attributes of a table such
that if A is functionally dependent on B and B is functionally dependent on C then C is ran#iti-ely
dependent on A via B.
%ay for ;x, consider the following functional dependencies that exists in
the bl>"ta..>Bran%h table+
%o, %taff&D attribute functionally determines BrH,ddress via Branch&D attribute.
@hat i# %lo#ure o. a #et o. F1#A
&f F is a set of *Ds on a relation schema :, then F
, the closure of *, is the smallest set of *Ds such that
* and no *D can be derived from * by using the inference axioms that are not contained in F
. if : is
not specified, it is assumed to contain all the attributes that appear in F'