You are on page 1of 13

c 


 
   
The concept of functional dependency (also known as normalization was
introduced by professor Codd in 1970 when he defined the first three normal forms
(first, second and third normal forms). Normalization is used to avoid or eliminate
the three types of anomalies (insertion, deletion and update anomalies) which a
database may suffer from. These concepts will be clarified soon, but first let us
define the first three normal forms.
cc A relation is in first normal form if all its attributes are
simple. In other words, none of the attributes of the relation is a relation. Notice
that relation means 2-diemenatioanl table.
 
. Assume the following relation

 
 (   Sname, Phone, Courses-taken)
Where attribute Sid is the primary key, Sname is student name, Phone is student's
phone number and Courses-taken is a table contains course-id, course-description,
credit hours and grade for each course taken by the student. More precise
definition of table Course-taken is :
! 

 (! 
  , Course-description, Credit-hours, Grade)
According to the definition of "" relation 
 
 
in first normal form #
 

"# 
! 


"#

  
# e. To clarify it more assume the above tables contain
the data as shown below:

 

 
$
! 


100 John 487 2454 St-100-courses-taken
200 Smith 671 8120 St-200-courses-taken
300 Russell 871 2356 St-300-courses-taken
%%! 


! 
 ! 

  !
  &

IS380 Database Concepts 3 A


IS416 Unix Operating System 3 B
'%%! 


! 
 ! 

  !
  &

IS380 Database Concepts 3 B


IS416 Unix Operating System 3 B
IS420 Data Net Work 3 C
(%%! 


! 
 ! 

  !
  &

IS417 System Analysis 3 A






""



"

)
 means that that some data can not be inserted in the database.
For example we can not add a new course to the database of example-1,unless we
insert a student who has taken that course.
* 
 means we have data redundancy in the database and to make any
modification we have to change all copies of the redundant data or else the
database will contain incorrect data. For example in our database we have the
Course description "Database Concepts" for IS380 appears in both %%! 


 and '%%! 

 tables. To change its description to "New
Database Concepts" we have to change it in all places. Indeed one of the purposes
of normalization is to eliminate data redundancy in the database.


 
 deleting some data cause other information to be lost.
For example if student Russell is deleted from %%! 

 table we also
lose the information that we had a course call IS417 with description System
Analysis.
Thus 
 
#
suffers from all the three anomalies.
+,

#,
  
""
 

# 
 #

,
,

 
# 
 To do that a new
relation is created by combining each row of 
 
 with all rows of its
corresponding course table that was taken by that specific student. Following is

 
 table in first normal form.

 
(  : , Sname, Phone, ! 
  ',
Course-description, Credit-hours, Grade)
Notice that the primary key of this table is a composite key made up of two parts;
 and ! 
 . Note that  following an attribute indicates that the attribute
is the first part of the primary key and ' indicates that the attribute is the second
part of the primary key.


 

 
$
! 
 ! 

  !
 &

  
100 John 487 IS380 Database Concepts 3 A
2454
100 John 487 IS416 Unix Operating 3 B
2454 System
200 Smith 671 IS380 Database Concepts 3 B
8120
200 Smith 671 IS416 Unix Operating 3 B
8120 System
200 Smith 671 IS420 Data Net Work 3 C
8120
300 Russell 871 IS417 System Analysis 3 A
2356

Examination of the above 


 
relation reveals that  does not
uniquely identify a row (tuple) in the relation hence cannot be the primary key. For
the same reason ! 
 cannot be the primary key. However the combination
of  and ! 
 uniquely identifies a row in 
 
, Therefore
 ! 
  is the primary key of the above relation.
The primary key determines every attribute. For example if you know both  and
! 
 for any student you will be able to retrieve Sname, Phone, Course-
description, Credit-hours and Grade, because these attributes are dependent on the
primary key. c- 
below is the graphical representation of the functional
dependency between the primary key and attributes of the above relation.
The image cannot be displayed. Your
The image
computer
cannot
maybe
not
displayed.
have enough
Your memory
computerto may
opennot
thehave
image,
enough
or thememory
image may
to open
havethe
been
image,
corrupted.
or the image
Restartmay
your
have
computer,
been corrupted.
and then Restart
open theyour
file computer,
again. If the
andred
then
x still
open appears,
the fileyou
again.
mayIfhave
the red
to delete
x still appears,
the imageyou
andmay
thenhave
insert
to it
delete
again.
the image and then insert it again.

Note that the attribute to the right of the arrow is functionally dependent on the
attribute in the left of the arrow. Thus the combination ( ! 
 ) is the
determinant (that determines other attributes) and attributes Sname, Phone, Course-
description, Credit-hours and Grade are dependent attributes.
Formally speaking a determinant is an attribute or a group of attributes determine
the value of other attributes. In addition to the ( ! 
 ) there are two other
determinants in the above 
 
relation. These are;  and ! 

attributes. Note that  alone determines both Sname and Phone, and attribute
! 
 alone determines both Credit-hours and Course_description attributes.

Attribute &
is "  "  dependent on the primary key  ! 

  because # "
  







&
. On
the other hand both 
and $
attributes are "  " 




  
, #
 
  "
  

 
 is needed to determine both 
and $
. Also attributes !
 s
and ! 

  are "  "  


 on the primary key
because only ! 
 is needed to determine their values.
The new relation 
 
 still suffers from all three anomalies for the
following reasons:
1. The relation contains redundant data (Note Database_Concepts as the
course
description for IS380 appears in more than one place).
2. The relation contains information about two entities Student and
course.
Following is the detail description of the anomalies that relation 
 

suffers from.
)
 We cannot add a new course such as IS247 with course
description programming techniques to the database unless we add a student
who to take the course. 
'* 
  If we change the course description for IS380 from
Database Concepts to New_Database_Concepts we have to make changes in
more than one place or else the database will be inconsistent. In other words
in some places the course description will be New_Database_Concepts and
in any place were we forgot to make the changes the description still will be
Database_Concepts. 
3.

  If student Russell is deleted from the database we also
loose information that we had on course IS417 with description
System_Analysis.
The above discussion indicates that having a single table 
 
 for our
database causing problems (anomalies). Therefore we break the table to smaller
table to get a higher normal form relation. Before doing that let us define the
second normal form.

 
 A first normal form relation is in second normal form if
all its non-primary attributes are fully functionally dependent on the primary key.
Note that primary attributes are those attributes, which are parts of the primary key,
and non-primary attributes do not participate in the primary key. In 

 
 relation both   and ! 
 are primary attributes because they are
components of the primary key. However attributes Sname, Phone, Course-
description, Credit-hours and Grade all are non primary attributes because non of
them is a component of the primary key.
To convert 
 
tosecond normal relations we have to make all non-
primary attributes to be fully functionally dependent on the primary key. To do that
we need to project (that is we break it down to two or more relations) 

 
 table into two or more tables. However projections may cause problems.
To avoid such problems it is important to keep attributes, which are dependent on
each other in the same table, when a relation is projected to smaller relations.
Following this principle and examination of Figure-1 indicate that we should
divide 
 
relation into following three relations:
PROJECT 
 
 . Sid, Sname, Phone) creates a table call it

. The relation 
will be 
Sid , Sname, Phone) and
PROJECT 
 
 . Sid, Course-id, Grade) creates a table call it

-
The relation 
-
will be

-
(Sid: " 
, Course-id:: '" ! 
, Grade) and
Projects 
 
 . (Course-id, Course-Description, Credit-hours)
create a table call it ! 
 Following are these three relations and their
contents:


Sid , Sname, Phone)
 
$

100 John 487 2454


200 Smith 671 8120
300 Russell 871 2356

! 
(! 
 :  Course-Description)
! 
 ! 

  !
 
IS380 Database Concepts 3
IS416 Unix Operating System 3
IS420 Data Net Work 3
IS417 System Analysis 3




-
(  " 
, ! 
 :: '" ! 
, Grade)

 ! 
 &

100 IS380 A
100 IS416 B
200 IS380 B
200 IS416 B
200 IS420 C
300 IS417 A

All these three relations are in second normal form. Examination of these relations
shows that we have eliminated the redundancy in the database. Now relation

 contains information only related to the entity student, relation ! 

contains information related to entity Courses only, and the relation 
-

contains information related to the relationship between these two entity.
Further these three sets are free from all anomalies. Let us clarify this in more
detail.
)
   Now a new Course with course-id IS247 and Course-
description can be inserted to the table Course. Equally we can add any new
students to the database by adding their id, name and phone to 
 table.
Therefore our database, which made up of these three tables does not suffer from
insertion anomaly.
* 
 : Since redundancy of the data was eliminated no update anomaly
can occur. To change the course-description for IS380 only one change is needed
in table ! 



   the deletion of student Russell from the database is achieved
by deleting Russell's records from both 
 and 
-
 relations and
this does not have any side effect because the course IS417 untouched in the table
Courses.
The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

Third Normal Form: A second normal form relation is in third normal form if all
non-primary attributes (that is attributes that are not parts of the primary key or of
any candidate key) have non-transitivity dependency on the primary key.
Assume the relation:

STUDENT (Sid: pk, Activity, fee)


Further Activity ------------> fee that is the Activity determine the fee

Sid Activity Fee


100 Swimming 100
200 Tennis 100
300 Golf 300
400 Swimming 100

Table STUDENT is in first normal form because all its attributes are simple. Also
STUDENT is in second normal form because all its non-primary attributes are
fully functionally dependent on the primary key (Sid). Notice that a first normal
relation with non-composite (that is simple) primary key automatically will be in
second normal form because all its non-primary attributes will be fully functionally
dependent on the primary key.
Table STUDENT suffers from all 3 anomalies; a new student can not be added to
the database unless he/she takes an activity and no activity can be inserted into the
database unless we get a student to take that activity. There is redundancy in the
table (see Swimming), therefore to change the fee for Swimming we must make
changes in more than one place and that will cause update anomaly problem. If
student 300 is deleted from the table we also loose the fact that we had Golf
activity with its fee to be 300. To overcome these anomalies STUDENT table
should be converted to smaller tables. Consider the following three projection of
the STUDENT relation:
PROJECT STUDENT on [Sid, Activity] and we get a relation name it
STUD-AVT (Sid:pk, Activity) with the following data :

STUD_ACT

Sid Activity
100 Swimming
200 Tennis
300 Golf
400 Swimming

PROJECT STUDENT on [Activity, Fee] and we get a relation name AVT-Fee


(Activity:pk, Fee) with the following data :

AVT-Fee

Activity Fee
Swimming 100
Tennis 100
Golf 300
Swimming 100

PROJECT STUDENT on [Sid, Fee] and we get a relation name


Sid-Fee (Sid:pk, Fee) with the following data :

Sid-Fee

Sid Fee
100 100
200 100
300 300
400 100

The question is which pairs of these projections should we choose? The answer to
that is to choose the pair STUD-AVT and AVT-Fee because the join of these two
projections produces the original STUDENT table. Such projections are called
non-loss projections. Therefore the join of STUD-AVT and AVT-Fee on the
common attribute Activity recreate the original STUDENT table. On the other
hand as shown below the join of projections Sid-Fee and AVT-Fee on their
common attribute Sid generates erroneous data that were not in the original
STUDENT table and such projections are called loss projections. Following is the
join of projections Sid-Fee and AVT-Fee on their common attribute Sid

Sid Activity Fee


100 Swimming 100
%% +
 %%
200 Tennis 100
'%% /- %%
300 Golf 300
400 Swimming 100
0%% +
 %%
The three rows marked in red color were not in the original STUDENT table. Thus
we have an erroneous data in the database.
Both projections STUD-AVT and AVT-Fee are in third normal form and they do
not suffer from any anomalies.

Boyce Codd normal (BOC): A relation is in BOC form if every determinant is a


candidate key. This is an improved form of third normal form.

c c A Boyce Codd normal form relation is in fourth normal


form if there is no multi value dependency in the relation or there are multi value
dependency but the attributes, which are multi value dependent on a specific
attribute, are dependent between themselves. This is best discussed through
mathematical notation. Assume the following relation

1 # ' (


Recall that a relation is in BOC normal form if all its determinant are
candidate keys, in other words each determinant can be used as a
primary key. Because relation 1 has only one determinant #,,
which is the composite primary, key and since the primary is a candidate
key therefore R is in BOC normal form.

Now R may or may not be in fourth normal form.

1. If R contains  ,





 then R will be in Fourth
normal form.

2. Assume R has the following two-multi value dependencies:

22#   22

In this case R will be in the fourth normal form if # and 





 However if b and 




"

then 1 in " normal form and the
relation has to be projected to following two non-loss
projections. These non-loss projections will be in fourth normal
form.

 


!

 

"/-

 

  3- -
 '  (

 ,



 

"
1" "


'
 

"/-
/ ,



 

 

  3- -
 '  (
 223- -
    22 
3- -
  




+ 
 


 
,
- -
 
,
 
4/
,
"
 
"- -
 
/
  


  3- -
  
%% - +
-
%%    $
%% c
  -
'%% -  -
'%% # --

+ 
 

%%/

5



 -# /

 

 c
+
" "  
 ""

" 


(
 

"/-
/ ,



 

 

  3- -
 '  (
 223- -
    22 
3- -
  





  3- -
  
%% - +
-
%%    $
%% - 
%%    +
-
'%% # --
  


+
" "  ""
"


"


)
 +
/'%%-! -/
,

/

/'%%# - '%%---


/


#
/#



#
/#
"/

  3- -
  
%% - +
-
%%    $
%% - 
%%    +
-
'%% # --
'%% - ! -
'%% # ! -
'%% - --



 )"
 

%% 
  /
,




//
%%  $ %%-$
/

 #
/#




* 
 )"
 

'%%-
 "-- -/

,

-


 


+

 6


"/-/ 6
/

""

 

73- -
  3- -
 '

  3- -

%% -
%%   
'%% #

 

73- -
    '

   
%% +
-
%% $
'%% --