You are on page 1of 45

UNIT-VI

UNIT-
Chapter--10
Chapter

Database Design:Functional
Dependencies and Normalization
for Relational Database

c
10.1 Informal Design Guidelines for
Relational Schemas
Informal measures of quality of relational
schemas:
¦ Semantics of the attributes

¦ Reducing the redundant information in


tuples
¦ Reducing the NULL values in tuples

¦ Disallowing the possibility of generation


spurious tuples

1.Imparting clear semantics to
Attributes in Relations
¦ Semantics refer to the interpretation of attribute
value in a tuple
¦ Meaning of Employee table : each tuple
represents ename, SSN, birthdate, address, Dno.
¦ Department table and project are also
straightforward
¦ Semantic of dept_locations and works_on are
complex
¦ Dept_locations has multivalued attribute and
works_on has N:M ralationship b/w emp and
project
Employee

Ename SSN Bdate Address Dno

Department

Dname Dnumber Dmgr_SSN


Dept_Loc
Dnumber Dlocation
Project
Pname Pnumber Plocation Dnum
Works_on

SSN Pnumber Hours

è
Guideline1
¦ Design a relational schema so that it is easy to
explain its meaning
¦ Do not combine attributes from multiple entities
and relationship types into single relation
¦ In Emp_dept mixes attributes of employee and
departments
¦ Emp_proj mixes attributes of employees and
projects
¦ Although there is nothing wrong but these are
considered poor designs

Î
Emp_dept

Ename SSN Bdate Add Dnum Dname Dmgr


_SSN

Emp_proj

SSN Pnum Hours Ename Pname Plocation

å
. Redundant Information in tuples
and update anomalies
¦ Minimize the redundancy so that storage space
is not wasted
¦ In emp_dept, the attribute values pertaining to a
particular department are repeated for every
employee who works for that department
¦ In contrast each department info appears only
once in department relation
¦ It may lead to insert, delete and update
anomalies

ü
Insert anomaly:
¦ Suppose we want to enter a new tuple for
employee who works in department 5 so it may
lead to consistency problem
¦ If we want to enter a new department that has
no employees as yet, we have to place NULL for
attributes of employees where primary key
cannot be NULL
Delete anomaly:
If we delete employee tuple that represents last
employee of that department , then info of that
department is lost from the database


Modification anomaly:
ƥ If we change the attribute of a particular
department then we have to make
changes of all employees who work in that
department
ƥ If we fail to update then it will cause
inconsistency
Guideline 
¦ Design the base relation schemas so that
no insertion, deletion and modification
anomalies are present

*
3.. NULL values in tuple
¦ Waste of storage space
¦ How to account them for aggregate
functions
¦ Means unpredictable, unknown, absent

c
Guideline 3
¦ Avoid placing NULL values in a base
relation
¦ If NULLƞs are unavoidable then make sure
that they are applied in exceptional cases
only and donot apply it on majority of
tuples

cc
è. Generation of spurious tuples
¦ In relation emp_proj1 and emp_locs are
the base relations instead of emp_proj
¦ We cannot recover info that was originally
in emp_proj from emp_proj1 and
emp_locs
¦ Because in this case Ploc is an attribute
that is neither a primary key nor a foreign
key

c
Emp_pro

SSN Pnum Hours Ename Pname Ploc

Emp_locs

Ename Ploc

Emp_pro1

SSN Pnum Hours Pname Ploc

c
Guideline è
¦ Design relations so that they can be joined
on primary keys and foreign keys in a way
that guarantees for no spurious tuples
¦ Avoid relations that contain matching
attributes that are not primary keys and
foreign keys because joining of these lead
to spurious attributes


Functional Dependencies
¦ Functional Dependency denoted by X†
between two sets of attributes X and  that are
subsets of R specifies a constraint on the
possible tuples that can form a relation r from R
¦ The constraint is that for any two tuples t1 and
t in r that have t1[X]=t[X] then t1[y]=t[y].
¦ Values of the X component determine the values
of  component or  is functionally dependent
on X


¦ Consider relation
emp_proj1(ssn,ename,pnum, pname,
ploc, hours)
¦ From the semantics of attributes
¦ SSN †Ename
¦ Pnum†
Pnum †'Pname,Ploc}
¦ 'SSN,Pnum}†
'SSN,Pnum} † Hours


¦ In some cases FD cannot be inferred from
a given relation
¦ FD must be defined explicitly by someone
who knows the sementics of the attributes
of relation
¦ Eg: Course †teacher

This cannot be true for all the legal states


If teacher teaches two subjects then we
cannot conclude that teacher is FD on
course


Inference rules for FD
¦ The set of all the dependencies that include F as
well as all the dependencies that can be inferred
from f is called i  of F denoted by F+
¦ Eg Dept no †mgrssn
¦ And mgrssn †mgrphone then
¦ deptno †mgrphone
¦ To determine a systematic way to infer
dependencies from a given set of dependencies
there are O  i 
¦ IR1
IR 1 (Reflexive rule): If X ƒ  then X †
¦ IR
IR  (Augmentation rule): X † thenXZ †Z
c
¦ IR holds only if t1(X)=t (X), t1()=t(),
t1(XZ)=t(XZ), t1(Z)=t(Z)
¦ IR3 (transition rule): X †, †Z then X †Z

¦ IRè (decomposition or projection rule): X †Z


not equal to X †Z
¦ IR5 (union or additive rule): X †,X †Z then
this is not equal to X †Z
¦ IR6 (pseudotransitive rule):X †,W †Z the
WX †Z
IR1 generates dependencies that are always true,
such dependencies are known as ›OO others
as  ›OO

c*
¦ ‰ 
¦ IRè
IRè: X †Z then Z † (IR1
(IR1, Z ƒ)
¦ IR5
IR5: X † and X †Z then X †X(IR
X(IR)
X †Z(IR
Z(IR) and X †Z(IR 3)
Z(IR3
¦ IR6
IR6: X †, W †Z then WX †W (IR (IR) and
WX †Z(IR
Z(IR3
3)
¦ A set of functional dependencies F is said to
i functional dependencies E if every
dependency in E can be inferred from F
¦ Two sets E and F are O  › if E+=F+


Normal forms based on primary
keys
¦ Ê  O›O process first proposed by
Codd (197) takes a relation through a
series of tests to certify that whether it
satisfies a certain normal form or not
¦ This process starts from top down fashion
¦ Codd proposed 3 normal forms, all these
normal forms are based on single
analytical tool :functional dependencies

c
¦ Later, è normal form and 5 normal form was
proposed based on multivalued and join
dependencies
¦ This approach is called relational design by
synthesis
¦ Ê  O›O is a process of analyzing the
given relation schemas based on FD and primary
keys to achieve these:
1) minimizing redundancy
) minimizing insertion, deletion and update
anomalies
The process of storing to join of higher normal
form relations as a base relation which is of
lower form is known as denormalization

¦ Process of normalization through decomposition
should also confer two additional properties:
1) lossless join or nonadditive join property:
guarantees that there are no spurious tuples
) dependency preservation property: ensures
that each FD is represented in some individual
relation after decomposition
Super Key: A superkey in a given relation
R='A1,A,ƦAn} is a set of attributes S is
superset of R with the property that no two
tupes t1 and t in any relation will have
t1[S]=t[S]


¦ Key is a superkey wih additional property that
removal of any attributes from K will cause K not
to be a super key
¦ If a relation has more than one keys than each
is called a candidate key
¦ One of the candidate key will be a primary key
others will be secondary keys
¦ An attribute of a relation is called prime if it is a
member of some candidate keys and others are
called non prime attributes
¦ Unique key is attribute that uniquely identifies
each row in a table and allow Null values in it.


¦ Primary key is attribute or combination of
attributes that uniquely identifies a row in a
table and should be not Null.
¦ Foreign key is attribute or combination of
attributes whose value match a primary key in
another table
¦ Composite or compound key consist of  or
more attributes
¦ Candidate key is a collection of keys in a table
which has ability to become a primary key
¦ Alternate or secondary key is a candidate key
that is not a part of primary key


First Normal Form1NF
¦ It states that domain of an attribute must
include only atomic values and that the value of
any attribute in an tuple must be a single value
from the domain of that attribute
¦ 1NF disallows having set of values, a tuple of
values or the combination of values
¦ Consider department relation , its not in 1NF
bcause Dloc is not atomic
¦ Dnum †Dloc because Dnum is primary key so
there are three main techniques to make it in
1NF:


1) Remove Dloc and make it in a separate relation
Dept_locations which will have primary key Dnum
and Dloc, this will form two 1NF relations
) Expand the key so that there is a separate tuple in
dept where primary key is combination of
'Dnum,Dloc}
3) Max number of values in Dloc as three and place Null
for other tuples but it will generate spurious tuples

Dname Dnum Dmgrssn Dloc


Research 5 333èèè5 'Bellaire, Sugarland,Houstan}
Adminis è 98765è3 'Stafford}
HeadQuater 1 8886677 'Houston}


Dname Dnum Dmgrssn
Dnum Dloc
Research 5 333èèè5 Bellaire
5
Adminis è 98765è3 Sugarlan
5
Headquater 1 8886677 d
5 Houstan

è Stafford

1 Houston
Dname Dnum Dmgrssn Dloc
Research 5 333èèè5 Bellaire
Research 5 333èèè5 Sugarland
Research 5 333èèè5 Houstan
Adminis è 98765è3 Stafford
Headquater 1 8886677 Houston

SSN Ename Pnumber Hours
13è Smith,John 1, 3,è3

5678 Narayan, 3 è0
Joyce
913 Ramseh, è,5 0,10
Rakesh
è567 Wong, 6,7 35,10
Franklin

*
Second Normal Form (NF)
¦ It is based on Full functional dependency X † is in NF
if its in 1NF and every nonprime attribute A in R is fully
functional dependent on the primary key of R
¦ X † is Fully Functionally dependent if removal of any
attribute A from X means that the dependency does not
hold any more
¦ X † is Partially Dependent if some attributes A belongs
to X can be removed from X and the dependency still
holds
¦ SSN †Ename is exmple of FFD in emp_dept relation
¦ Whereas 'SSN,Pnum} †Ename is Partial Dependency in
emp_proj relation
¦ In emp_proj with two primary keys SSN and Pnumber,
relation is in 1NF but not in NF


¦ These are the given FDƞs:
¦ 'SSN,Pnumber} †Hours
¦ 'SSN,Pnumber} †Ename
¦ 'SSN,Pnumber} †'Pname,Plocations}
¦ This should hold but Ename,
Pname,Plocations are partially dependent
because SSN †Ename
and Pnumber †'Pname,Plocations}

c
SSN Pnumber Hours Ename Pname Plocations
Emp_Proj
SSN Pnumber Hours

EP1
EP 1

SSN Ename
EP
Pnumber Pname Plocations

EP3


Third Normal Form (3
(3NF)
¦ 3NF is based on the concept of transitive
dependency
¦ A relation R is in 3NF if it satisfies NF and no
nonprime attribute of R is transitive dependent
on the primary key
¦ FD X † in relation R is transitive dependent if
set of attributes that is neither candidate key nor
subset of any key holds both X †Z and Z †
¦ In relation emp_dept, SSN †Dmgrssn is
transitive dependent because SSN †Dnum and
Dnum †Dmgrssn and Dnum is not a primary
key
¦ So because of this transitive dependency
between SSN , Dnumber and Dmgrssn this
relation is not in 3NF
¦ So we have to decompose relation into two
tables independent of transitive dependency
such that natural join operation will result in
original base table emp_dept relation
¦ It is not important to remove partial
dependencies before transitive dependency but
normal forms are made such that in NF partial
dependencies are removed and in 3NF transitive
dependencies are removed

è
Emp_Dept

Ename SSN Bdate Address Dnum Dname Dmgrssn

ED1
Ename SSN Bdate Address Dnum

ED

Dnum Dname Dmgrssn

Î
¦ Def  of NF: A relation schema R is in NF if its
in 1NF and every non prime attribute A in R is
not partially dependent on any key in R
¦ Def  of 3NF:A relation R is in 3NF if its in NF
and whenever nontrivial functional dependency
X †A holds in R either X is a superkey or A is a
prime attribute of R
¦ Def 3 of 3NF: A relation R is in 3NF if every
nonprime attribute meets both conditions:
It is fully functionally dependent on every key of
R
It is non transitively dependent on every key of
R
å
Example:1 NF

lots

Prop_id Country Lotno Area Price Taxrate


_name
FD1
FD
FD3
FDè ü
Example: NF
Prop_id Country_name Lotno Area Price

lots1

Country_name taxrate

lots


¦ Property_id is primary key and
country_name and lotno is candidate key
¦ FD1:Prop_id
†country_name,lotno,area,price,taxrate
¦ FD:Countryname,lotno
†prop_id,area,price,taxrate
¦ FD3:Countryname †taxrate, taxrate is partially
dependent on countryname and lotno
¦ FDè:Area †price
¦ (FD5:Area †Countryname)
¦ (Consider FD5 only in BCNF)
¦ Because of FD3 lots is not in NF so we
decompose lots into lots1 and lots

*
lots1
Example:3 NF
Prop_id Country_ Lotno Area Price
name

lots1A

Prop_id Country_ Lotno Area


name

lots1B
Area Price

è
¦ FDè violates 3NF becaues Area is not a key and
price is not a prime attribute
¦ To make it 3NF we decompose reation lots1 into
lots1A and lots1B by removing transitive
dependency
¦ Price is transitively dependent on each of
candidate keys via Area
¦ This lots1A is in 3NF but not BCNF because of
FD5 because Area is not a superkey and
Countryname is a prime attribute so we
decompose relation lots1A into lots1Ax and
lots1Ay

èc
Boyce--Codd Normal Form
Boyce
¦ BCNF was proposed as a simpler form of 3NF, it seems
to be strict and stronger than NF
¦ A relation R is in BCNF if whenever a nontrivial functional
dependency X †A holds in R, then X is a superkey
¦ In relation lotsA, Area †Countrycode so its in 3NF but
not in BCNF form because countrycode is a key
(candidate key) but Area is not a superkey
¦ For eg in relation Teach 'student,coures,instructor}
where student &course is primary key
¦ 'student,course} †instructor
¦ instructor †course
¦ This is in 3NF but not in BCNF so we decompose it into
'instructor,student} and 'instructor,course}
è
Teach

Student Course Instructor

Teach1
Teach 1

Student Instructor

Teach

Instructor Course

è
lots1A Example:BCNF
Prop_id Country_na Lotno Area
me

lots1Ax
Prop_i Area lotno
d

lots1Ay

Area Countryna
me
èè
Benefits or Advantages of normalization:
¦ Greater overall database organization

¦ Reduction of redundant values

¦ Data consistency within database

¦ A much more flexible design

¦ A better handle of database security

èÎ

You might also like