Database Design-Functional Dependencies and Normalization

UNIT-VI
UNIT-
Chapter--10
Chapter
Database Design:Functional
Dependencies and Normalization
for Relational Database
c
10.1 Informal Design Guidelines for
Relational Schemas
Informal measures of quality of relational
schemas:
¦ Semantics of the attributes
¦ Reducing the redundant information in

tuples
¦ Reducing the NULL values in tuples
¦ Disallowing the possibility of generation

spurious tuples

1.Imparting clear semantics to
Attributes in Relations
¦ Semantics refer to the interpretation of attribute
value in a tuple
¦ Meaning of Employee table : each tuple
represents ename, SSN, birthdate, address, Dno.
¦ Department table and project are also
straightforward
¦ Semantic of dept_locations and works_on are
complex
¦ Dept_locations has multivalued attribute and
works_on has N:M ralationship b/w emp and
project
Employee
Ename SSN Bdate Address Dno
Department
Dname Dnumber Dmgr_SSN

Dept_Loc
Dnumber Dlocation
Project
Pname Pnumber Plocation Dnum
Works_on
SSN Pnumber Hours
è
Guideline1
¦ Design a relational schema so that it is easy to
explain its meaning
¦ Do not combine attributes from multiple entities
and relationship types into single relation
¦ In Emp_dept mixes attributes of employee and
departments
¦ Emp_proj mixes attributes of employees and
projects
¦ Although there is nothing wrong but these are
considered poor designs
Î
Emp_dept
Ename SSN Bdate Add Dnum Dname Dmgr

_SSN
Emp_proj
SSN Pnum Hours Ename Pname Plocation
å
. Redundant Information in tuples
and update anomalies
¦ Minimize the redundancy so that storage space
is not wasted
¦ In emp_dept, the attribute values pertaining to a
particular department are repeated for every
employee who works for that department
¦ In contrast each department info appears only
once in department relation
¦ It may lead to insert, delete and update
anomalies
ü
Insert anomaly:
¦ Suppose we want to enter a new tuple for
employee who works in department 5 so it may
lead to consistency problem
¦ If we want to enter a new department that has
no employees as yet, we have to place NULL for
attributes of employees where primary key
cannot be NULL
Delete anomaly:
If we delete employee tuple that represents last
employee of that department , then info of that
department is lost from the database

Modification anomaly:
ƥ If we change the attribute of a particular
department then we have to make
changes of all employees who work in that
department
ƥ If we fail to update then it will cause
inconsistency
Guideline
¦ Design the base relation schemas so that
no insertion, deletion and modification
anomalies are present
*
3.. NULL values in tuple
¦ Waste of storage space
¦ How to account them for aggregate
functions
¦ Means unpredictable, unknown, absent
c
Guideline 3
¦ Avoid placing NULL values in a base
relation
¦ If NULLƞs are unavoidable then make sure
that they are applied in exceptional cases
only and donot apply it on majority of
tuples
cc
è. Generation of spurious tuples
¦ In relation emp_proj1 and emp_locs are
the base relations instead of emp_proj
¦ We cannot recover info that was originally
in emp_proj from emp_proj1 and
emp_locs
¦ Because in this case Ploc is an attribute
that is neither a primary key nor a foreign
key
c
Emp_pro
SSN Pnum Hours Ename Pname Ploc
Emp_locs
Ename Ploc
Emp_pro1
SSN Pnum Hours Pname Ploc
c
Guideline è
¦ Design relations so that they can be joined
on primary keys and foreign keys in a way
that guarantees for no spurious tuples
¦ Avoid relations that contain matching
attributes that are not primary keys and
foreign keys because joining of these lead
to spurious attributes
cè
Functional Dependencies
¦ Functional Dependency denoted by X
between two sets of attributes X and that are
subsets of R specifies a constraint on the
possible tuples that can form a relation r from R
¦ The constraint is that for any two tuples t1 and
t in r that have t1[X]=t[X] then t1[y]=t[y].
¦ Values of the X component determine the values
of component or is functionally dependent
on X
cÎ
¦ Consider relation
emp_proj1(ssn,ename,pnum, pname,
ploc, hours)
¦ From the semantics of attributes
¦ SSN Ename
¦ Pnum
Pnum 'Pname,Ploc}
¦ 'SSN,Pnum}
'SSN,Pnum} Hours
cå
¦ In some cases FD cannot be inferred from
a given relation
¦ FD must be defined explicitly by someone
who knows the sementics of the attributes
of relation
¦ Eg: Course teacher
This cannot be true for all the legal states

If teacher teaches two subjects then we
cannot conclude that teacher is FD on
course
cü
Inference rules for FD
¦ The set of all the dependencies that include F as
well as all the dependencies that can be inferred
from f is called i of F denoted by F+
¦ Eg Dept no mgrssn
¦ And mgrssn mgrphone then
¦ deptno mgrphone
¦ To determine a systematic way to infer
dependencies from a given set of dependencies
there are O i
¦ IR1
IR 1 (Reflexive rule): If X then X
¦ IR
IR (Augmentation rule): X thenXZ Z
c
¦ IR holds only if t1(X)=t (X), t1()=t(),
t1(XZ)=t(XZ), t1(Z)=t(Z)
¦ IR3 (transition rule): X , Z then X Z
¦ IRè (decomposition or projection rule): X Z

not equal to X Z
¦ IR5 (union or additive rule): X ,X Z then
this is not equal to X Z
¦ IR6 (pseudotransitive rule):X ,W Z the
WX Z
IR1 generates dependencies that are always true,
such dependencies are known as OO others
as OO
c*
¦
¦ IRè
IRè: X Z then Z (IR1
(IR1, Z )
¦ IR5
IR5: X and X Z then X X(IR
X(IR)
X Z(IR
Z(IR) and X Z(IR 3)
Z(IR3
¦ IR6
IR6: X , W Z then WX W (IR (IR) and
WX Z(IR
Z(IR3
3)
¦ A set of functional dependencies F is said to
i functional dependencies E if every
dependency in E can be inferred from F
¦ Two sets E and F are O if E+=F+

Normal forms based on primary
keys
¦ Ê OO process first proposed by
Codd (197) takes a relation through a
series of tests to certify that whether it
satisfies a certain normal form or not
¦ This process starts from top down fashion
¦ Codd proposed 3 normal forms, all these
normal forms are based on single
analytical tool :functional dependencies
c
¦ Later, è normal form and 5 normal form was
proposed based on multivalued and join
dependencies
¦ This approach is called relational design by
synthesis
¦ Ê OO is a process of analyzing the
given relation schemas based on FD and primary
keys to achieve these:
1) minimizing redundancy
) minimizing insertion, deletion and update
anomalies
The process of storing to join of higher normal
form relations as a base relation which is of
lower form is known as denormalization

¦ Process of normalization through decomposition
should also confer two additional properties:
1) lossless join or nonadditive join property:
guarantees that there are no spurious tuples
) dependency preservation property: ensures
that each FD is represented in some individual
relation after decomposition
Super Key: A superkey in a given relation
R='A1,A,ƦAn} is a set of attributes S is
superset of R with the property that no two
tupes t1 and t in any relation will have
t1[S]=t[S]

¦ Key is a superkey wih additional property that
removal of any attributes from K will cause K not
to be a super key
¦ If a relation has more than one keys than each
is called a candidate key
¦ One of the candidate key will be a primary key
others will be secondary keys
¦ An attribute of a relation is called prime if it is a
member of some candidate keys and others are
called non prime attributes
¦ Unique key is attribute that uniquely identifies
each row in a table and allow Null values in it.
è
¦ Primary key is attribute or combination of
attributes that uniquely identifies a row in a
table and should be not Null.
¦ Foreign key is attribute or combination of
attributes whose value match a primary key in
another table
¦ Composite or compound key consist of or
more attributes
¦ Candidate key is a collection of keys in a table
which has ability to become a primary key
¦ Alternate or secondary key is a candidate key
that is not a part of primary key
Î
First Normal Form1NF
¦ It states that domain of an attribute must
include only atomic values and that the value of
any attribute in an tuple must be a single value
from the domain of that attribute
¦ 1NF disallows having set of values, a tuple of
values or the combination of values
¦ Consider department relation , its not in 1NF
bcause Dloc is not atomic
¦ Dnum Dloc because Dnum is primary key so
there are three main techniques to make it in
1NF:
å
1) Remove Dloc and make it in a separate relation
Dept_locations which will have primary key Dnum
and Dloc, this will form two 1NF relations
) Expand the key so that there is a separate tuple in
dept where primary key is combination of
'Dnum,Dloc}
3) Max number of values in Dloc as three and place Null
for other tuples but it will generate spurious tuples
Dname Dnum Dmgrssn Dloc

Research 5 333èèè5 'Bellaire, Sugarland,Houstan}
Adminis è 98765è3 'Stafford}
HeadQuater 1 8886677 'Houston}
ü
Dname Dnum Dmgrssn
Dnum Dloc
Research 5 333èèè5 Bellaire
5
Adminis è 98765è3 Sugarlan
5
Headquater 1 8886677 d
5 Houstan
è Stafford
1 Houston
Dname Dnum Dmgrssn Dloc
Research 5 333èèè5 Bellaire
Research 5 333èèè5 Sugarland
Research 5 333èèè5 Houstan
Adminis è 98765è3 Stafford
Headquater 1 8886677 Houston

SSN Ename Pnumber Hours
13è Smith,John 1, 3,è3
5678 Narayan, 3 è0
Joyce
913 Ramseh, è,5 0,10
Rakesh
è567 Wong, 6,7 35,10
Franklin
*
Second Normal Form (NF)
¦ It is based on Full functional dependency X is in NF
if its in 1NF and every nonprime attribute A in R is fully
functional dependent on the primary key of R
¦ X is Fully Functionally dependent if removal of any
attribute A from X means that the dependency does not
hold any more
¦ X is Partially Dependent if some attributes A belongs
to X can be removed from X and the dependency still
holds
¦ SSN Ename is exmple of FFD in emp_dept relation
¦ Whereas 'SSN,Pnum} Ename is Partial Dependency in
emp_proj relation
¦ In emp_proj with two primary keys SSN and Pnumber,
relation is in 1NF but not in NF

¦ These are the given FDƞs:
¦ 'SSN,Pnumber} Hours
¦ 'SSN,Pnumber} Ename
¦ 'SSN,Pnumber} 'Pname,Plocations}
¦ This should hold but Ename,
Pname,Plocations are partially dependent
because SSN Ename
and Pnumber 'Pname,Plocations}
c
SSN Pnumber Hours Ename Pname Plocations
Emp_Proj
SSN Pnumber Hours
EP1
EP 1
SSN Ename
EP
Pnumber Pname Plocations
EP3

Third Normal Form (3
(3NF)
¦ 3NF is based on the concept of transitive
dependency
¦ A relation R is in 3NF if it satisfies NF and no
nonprime attribute of R is transitive dependent
on the primary key
¦ FD X in relation R is transitive dependent if
set of attributes that is neither candidate key nor
subset of any key holds both X Z and Z
¦ In relation emp_dept, SSN Dmgrssn is
transitive dependent because SSN Dnum and
Dnum Dmgrssn and Dnum is not a primary
key
¦ So because of this transitive dependency
between SSN , Dnumber and Dmgrssn this
relation is not in 3NF
¦ So we have to decompose relation into two
tables independent of transitive dependency
such that natural join operation will result in
original base table emp_dept relation
¦ It is not important to remove partial
dependencies before transitive dependency but
normal forms are made such that in NF partial
dependencies are removed and in 3NF transitive
dependencies are removed
è
Emp_Dept
Ename SSN Bdate Address Dnum Dname Dmgrssn
ED1
Ename SSN Bdate Address Dnum
ED
Dnum Dname Dmgrssn
Î
¦ Def of NF: A relation schema R is in NF if its
in 1NF and every non prime attribute A in R is
not partially dependent on any key in R
¦ Def of 3NF:A relation R is in 3NF if its in NF
and whenever nontrivial functional dependency
X A holds in R either X is a superkey or A is a
prime attribute of R
¦ Def 3 of 3NF: A relation R is in 3NF if every
nonprime attribute meets both conditions:
It is fully functionally dependent on every key of
R
It is non transitively dependent on every key of
R
å
Example:1 NF
lots
Prop_id Country Lotno Area Price Taxrate

_name
FD1
FD
FD3
FDè ü
Example: NF
Prop_id Country_name Lotno Area Price
lots1
Country_name taxrate
lots

¦ Property_id is primary key and
country_name and lotno is candidate key
¦ FD1:Prop_id
country_name,lotno,area,price,taxrate
¦ FD:Countryname,lotno
prop_id,area,price,taxrate
¦ FD3:Countryname taxrate, taxrate is partially
dependent on countryname and lotno
¦ FDè:Area price
¦ (FD5:Area Countryname)
¦ (Consider FD5 only in BCNF)
¦ Because of FD3 lots is not in NF so we
decompose lots into lots1 and lots
*
lots1
Example:3 NF
Prop_id Country_ Lotno Area Price
name
lots1A
Prop_id Country_ Lotno Area

name
lots1B
Area Price
è
¦ FDè violates 3NF becaues Area is not a key and
price is not a prime attribute
¦ To make it 3NF we decompose reation lots1 into
lots1A and lots1B by removing transitive
dependency
¦ Price is transitively dependent on each of
candidate keys via Area
¦ This lots1A is in 3NF but not BCNF because of
FD5 because Area is not a superkey and
Countryname is a prime attribute so we
decompose relation lots1A into lots1Ax and
lots1Ay
èc
Boyce--Codd Normal Form
Boyce
¦ BCNF was proposed as a simpler form of 3NF, it seems
to be strict and stronger than NF
¦ A relation R is in BCNF if whenever a nontrivial functional
dependency X A holds in R, then X is a superkey
¦ In relation lotsA, Area Countrycode so its in 3NF but
not in BCNF form because countrycode is a key
(candidate key) but Area is not a superkey
¦ For eg in relation Teach 'student,coures,instructor}
where student &course is primary key
¦ 'student,course} instructor
¦ instructor course
¦ This is in 3NF but not in BCNF so we decompose it into
'instructor,student} and 'instructor,course}
è
Teach
Student Course Instructor
Teach1
Teach 1
Student Instructor
Teach
Instructor Course
è
lots1A Example:BCNF
Prop_id Country_na Lotno Area
me
lots1Ax
Prop_i Area lotno
d
lots1Ay
Area Countryna
me
èè
Benefits or Advantages of normalization:
¦ Greater overall database organization
¦ Reduction of redundant values
¦ Data consistency within database
¦ A much more flexible design
¦ A better handle of database security
èÎ

Database Design-Functional Dependencies and Normalization

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Database Design-Functional Dependencies and Normalization

Uploaded by

Copyright:

Available Formats

UNIT-VI

¦ Reducing the redundant information in

¦ Disallowing the possibility of generation

Ename SSN Bdate Address Dno

Dname Dnumber Dmgr_SSN

SSN Pnumber Hours

Ename SSN Bdate Add Dnum Dname Dmgr

SSN Pnum Hours Ename Pname Plocation

SSN Pnum Hours Ename Pname Ploc

SSN Pnum Hours Pname Ploc

This cannot be true for all the legal states

¦ IRè (decomposition or projection rule): X Z

Dname Dnum Dmgrssn Dloc

Ename SSN Bdate Address Dnum Dname Dmgrssn

Dnum Dname Dmgrssn

Prop_id Country Lotno Area Price Taxrate

Prop_id Country_ Lotno Area

Student Course Instructor

¦ Reduction of redundant values

¦ Data consistency within database

¦ A much more flexible design

¦ A better handle of database security

You might also like

¦ IRè (decomposition or projection rule): X Z