You are on page 1of 18

RELATIONAL DATABASE DESIGN

The goal of relational database design is to generate a set of relation schemas that allow us to store
information without unnecessary redundancy, yet allows us to retrieve information easily. The approach is
to decompose relations into normal forms using functional dependencies.

Functional Dependency
 A functional dependency(FD) occurs when one attribute in a relation uniquely determines another
attribute.
 If X and Y are two attributes of a relation, then given the value of if there is only one corresponding
value of Y, then Y is functionally dependent on X, or X determines Y. It is denoted as X  Y.
 Functional dependency may also be based on a composite attribute ie. XY Z, where Z is
functionally dependent on the composite attributes XY.
 If X  Y then it doesn’t ensure whether YX or not.
 Functional dependency is a constraint and cannot be determined by inspection of an instance of the
relation, unless we know that it is true for all possible legal states of the relation.
 The diagrammatic notation of FD is the left hand side(LHS) attributes are connected by vertical lines
to the horizontal FD line, while the right hand side(RHS) attributes are connected by arrows pointing
to the attributes.
Example:
In Student(Rollno, Name, Address)
RollnoName
RollnoAddress
Student

Roll Name Address

Inference Rules for Functional Dependencies (ARMSTRONG AXIOMS)


If F is the set of functional dependencies specified on the relation schema R, then other dependencies can be
inferred or deduced from the given functional dependencies in F (ie. FDs logically implied by F). The set of
all possible functional dependencies is called the closure of F, denoted by F+.
The basic rules for functional dependencies are
i. Reflexive Rule
If Ys are the subset of Xs then X Y.
1
Ex: FirstName,LastName LastName
Proof:
If B is a subset of A and two tuples t1 and t2 exist in some relation instance
of R such that t1.A = t2.A then t1.B = t2.B, because B is a subset of A. Hence A  B must
hold.

ii. Augmentation Rule


If A  B and C is a set of attributes the AC BC
Ex: Regno,Address  FirstName,LastName,Address
Proof : (By contradiction method)
If Assume A  B holds but AC  BC doesn’t hold. There exists two tuples t1 and t2
such that
t1.A = t2.A ------ (a)
t1.B = t2.B -------(b)
t1.AC = t2.AC ---( c)
t1.BC ≠ t2.BC --- (d)
This is not possible because from (a) and ( c) we deduce
T1.C = t2.C ------( e )
Now from ( b ) and (e )we deduce
T1.BC = t2.BC ----(f)
This contradicts (d) , thus assumption is false and AC  BC holds.

iii. Transitive Rule


If A B and B  C then A  C
Ex: Rollno  Address, Address  Pincode
Then Rollno  Pincode
Proof:
Assuming A  B and B  C holds good in a relation. For any two tuples t1 and t2
of the relation such that t1.A = t2.A, we must have
t1.B = t2.B ----(a)
hence t1.C = t2.C ----(b)
Now from (a) and B C, A C must hold.
Some additional rules derived from the basic rules of functional dependencies are:

2
iv. Decomposition Rule (Projective Rule)
If A BC then AB and A  C
Ex : Rollno  FirstName,LastName
Then Rollno  Firstname and Rollno  Lastname

Proof:
A  BC holds
Then BC B (By Reflexive Rule), BC C (By Reflexive Rule)
Thus A  B (By Transitive Rule) and A  C (By Transitive Rule)

v. Union Rule (Additive Rule)


If A  B and A C then A  BC
Ex: Rollno  Name and Rollno  Address
Then Rollno  Name, Address
Proof:
A B and A  C holds
Then AA AB (By Augmentation Rule) ie A AB (since AA = A for sets)
AB  BC (By Augmentation Rule)
Thus A  BC (by Transitive Rule)

vi. Pseudotransitive Rule


If A  B and BC D then AC  D
Ex: Rollno  Name and Name, GuardianName GuardianAddress
Then Rollno,GuardianName  GuardianAddress
Proof:
A  B and BC  D holds
Then AC  BC (By Augmentation Rule)
Thus AC  D (By Transitive Rule)

Thus Union and Decomposition rules together give us some choices as to how to
choose a set of functional dependencies.
Two FDs X and Y are equivalent only if X  Y and Y  X.
If X1X2………….Xn  B1B2………….Bn then the dependency is

3
- Trival if Bs are subset of Xs.
- Non Trival if at least one of the Bs is not among Xs.
- Completely Non Trival if none of the Bs are among Xs.

Closure of Functional Dependencies


The closure of F ( where F be a set of functional dependencies), denoted by F + is the
set consisting of all the functional dependencies of F, in addition with all the functional
dependencies that can be implied from these dependencies (ie all implies FDs).
The closure can be computed by an algorithm.

Given
Let F be the set of FDs, A be the set of attributes of F, B be the set of attributes determined by A,
C be the set of new attributes determined by B (ie. Indirectly from A) then we compute A +
(closure of A).
Let Result be a variable which stores the values of A.

Algorithm
Step1: Result = A
Step 2: Repeat until Result is unchanged
Step 3: For each FD B  C in F
Step 4: Repeat
Step 5: If B is a subset of Result then Result = Result U C
Step 6: Goto Step 3
Step 7: Goto Step2
Step 8: End
Example: Given set F
A  BC, D  E, DF  GAH, G DF and D I
Computing A+
Initially A determines A, then on further iterations, it adds B and C
Thus A+  ABC
Computing D+
Initially D determines D, then on further iterations, it adds E and I
Thus D+  DEI

4
Computing DF+
Initially DF determines D and F,, then on further iterations, it adds E, G, A, H and I. On still
further iterations it adds B and C
Thus DF+DFEGAHIBC
Computing G+
Initially G determines G, then on further iterations, it adds D and F. On further iterations it
adds E, A, H and I. On still further iterations it adds B and C
Thus G+GDFEAHIBC
So closure of F ie.F+ are
A+ ABC
D+  DEI
DF+  DFEGAHIBC
G+GDFEAHIBC

Examples of finding closures from the given functional dependencies


1. Given R(ABCDE) and F is
AB, BCE, EDA
Closure of F ie F+ are
A+AB, BC+BCE, ED+ABDE
But none determines all attributes ie ABCDE, so
EDC+ABCDE
Thus key is EDC.

2. Given R(ABCDEFGHI) and F is


A B, C D, ACE, BF, D G,G  H, AI, I A
Closure of F ie F+ are
G+  GH, B+ BF, D+  DGH, C+  CDGH, A+ ABFI, I+ ABFI, AC+
ABCDEFGHI
Thus key is AC.

3. Given R(ABCD) and F is


C D, C A, B C, DA, ABCD, AB, BCD, AC, ABC, ABD, DB
Closure of F ie F+ are

5
C+  ABCD, B+ ABCD, A+ABCD, D+ABCD, AB+ABCD, BC+  ABCD,
ABC+ABCD
Thus any of A, B, C or D can be the key.

4. Given R(ABCDEGH) and F is


ABC, ACB, ADE, BD, BCA, EG
Closure of F ie F+ are
B+ BD, E+ EG, AD+ ADEG, AC+ABCDEG, BC+ABCDEG, AB+ABCDEG
But none determines al attributes ie. ABCDEGH , so
ACH+ABCDEGH
BCH+ABCDEGH
ABH+ABCDEGH
Thus any of ACH, BCH or ABH can be the key.

5. Given R(ABCDE) and F is


A  BC, BE, EDA
Closure of F ie. F+ are:
A+  ABCDE, B+ABCDE, E+ABCDE
Thus any of A,B or E can be the key.

Canonical Cover of Functional Dependencies


Canonical cover of the functional dependencies is the set of minimum dependencies, with no redundancies.
It is denoted by Fc . A canonical Cover Fc for the FDs F, is the set of dependencies such that F logically
implies all dependencies in Fc and Fc logically implies all dependencies in F.
The canonical cover Fc must have the following properties:
i. No dependency in Fc contains any extraneous attributes.
ii. No two dependencies have the same left hand side(ie. The right hand sides of the dependencies with
the same left hand sides are combined together)
To determine the canonical cover first eliminate the redundant dependencies(ie implied by other
dependencies) and then combine the dependencies with the same left hand side.
The Canonical Cover can be determined through an algorithm

Given : Let A, B, C be the set of attributes of the FDs F

6
Algorithm
Step 1: Repeat until F is unchanged
Step 2: Use Union rule to replace any FD of the form AB and BC with ABC
Step 3: Find any FD AB with extraneous attributes in either A or in B
Step 4: If extraneous attributes are found then delete it from AB
Step 5: Goto step 1
Step 6: End

Example:
Given set F
ABC, B  C, A B, ABC
Computing Fc
ABC, AB
Thus on combining ABC(since same LHS attribute)
A is extraneous in ABC since already BC
Thus delete A and it becomes BC
Now from ABC and BC, C is extraneous in first FD since already BC
Thus delete C and it becomes AB.

So Canonical Cover of F ie Fc are


A B
BC

Example: If R(ABC) and F is as follows


AAB, B  BC, A AC find Fc?
Solution:
Given AAB So: A  A, A B
Given BBC So: B  B, B C
Given AAC So: A  A, A C
Eliminating the duplicates and FDs from reflexive axioms , we get :
AB, A  C, B C
But A C can be derived from AB and B  C by transitive axiom.
So Fc is AB, B  C

7
ANAMOLIES IN DESIGNING DB
Data redundancy means repetition of information in the relation(or table). The aim of the
database system is to reduce redundancy, because redundancy leads to the wastage of storage space and an
increase in the size of stored data. Redundancy also gives rise to inconsistency problems which are known as
Data Anomalies.

There are three kinds of database anomalies such as


i. Update Anomalies:
It refers to the inconsistencies arising due to update of multiple copies of the same facts ie
whenever updates are made and not all, but only some copies are updated.
ii. Insertion Anomalies
It is the inability to represent certain information ie. Until and unless another value is inserted
for an attribute, the tuple cannot be inserted into the relation.
iii. Deletion Anomalies
It refers to the loss of information ie. Some useful information may be lost when a tuple is
deleted , which we don’t want to lose.

These problems are remedied by decomposition of the relations (or tables). Decomposition is the
replacement of a relation schema R = (A1, A2,…… An) by a set of relation schemas {R1,R2….Rm} such
that Ri is a subset of R (for 1<=i<=m) and R1 U R2 U ……… U.Rm = R

Some keywords
1. Functional dependency : In a given table, an attribute Y is said to have a functional dependency
on a set of attributes X (written X → Y) if and only if each X value is associated with precisely one Y
value.
For example, among the attributes "Employee ID" and "Employee Date of Birth", the functional
dependency {Employee ID} → {Employee Date of Birth} would hold.

2. Trivial functional dependency : A trivial functional dependency is a functional dependency of


an attribute on a superset of itself. {Employee ID, Employee Address} → {Employee Address} is
trivial, as is {Employee Address} → {Employee Address}.

8
3. Full functional dependency : An attribute is fully functionally dependent on a set of attributes X if
it is: functionally dependent on X, and not functionally dependent on any proper subset of X.
{Employee Address} has a functional dependency on {Employee ID, Skill}, but not a full functional
dependency, because it is also dependent on {Employee ID}.

4. Transitive dependency : A transitive dependency is an indirect functional dependency, one in


which X→Z only by virtue of X→Y and Y→Z.

5. Multivalued dependency : A multivalued dependency is a constraint according to which the


presence of certain rows in a table implies the presence of certain other rows.

6. Join dependency : A table T is subject to a join dependency if T can always be recreated by


joining multiple tables each having a subset of the attributes of T.

7. Superkey :A superkey is a combination of attributes that can be used to uniquely identify a


database record. A table might have many superkeys.

8. Candidate key :A candidate key is a special subset of superkeys that do not have any
extraneous information in them: it is a minimal superkey.

NORMALIZATION
It is the decomposition of the relation (or table) into smaller relations based on the concept of functional
dependencies, to overcome undesirable anomalies. It groups the data over a number of tables which are
independent and contain no duplicate data. It eliminates redundancy and promotes integrity.
There are different types of normal forms and each normal form is usually built upon the previous normal
form.

1) First Normal Form (1NF)


A relation R is said to be in 1NF if for each duple in R, each attribute of it is a single, non composite
value (i.e. Atomic). It is also known as flat table.
Thus each field can store maximum only one value (i.e. Multiple values are not allowed). If every field
satisfies the condition then the table is in 1NF.

9
Ex: table not in 1NF
FriendInfo
Id Name FavouriteArtist
10 Smith Lata, Kishore
20 Hary Kishore, Rafi

Table in 1Nf
FriendList ArtistList
Id FafouriteArtist
Id Name 10 Lata
10 Smith 10 Kishore
20 Hary 20 Kishore
20 Rafi

2) Second Normal Form (2NF)


A relation is said to be in 2NF if it is in 1NF and every non key attribute is fully functionally
dependent on the key attribute. Fully functionally dependent means depending on the whole key, but not
on any part of it. (I.e. proper subset of the key).
Thus every field in a row must be dependent on the whole of the primary key. If a 1NF relation has a
simple primary key (ie. One field as key ) then it will be in 2NF, but if it has composite primary key (i.e.
more than one field combined as key) then it may or may not be in 2NF.

Ex:-table not in 2NF

Empno Projno Totalhrs Ename Pname Ploc

Empno and Projno is the composite primary key, but Ename,Pname, ploc are not depending on whole of the
primary key, so it is not in 2NF, thus the table must be decomposed.
Table in 2NF
Empproj1
Empno Projno Totalhrs

10
Empproj2

Empno Ename

Empproj3

Projo Pname Ploc

3. Third Normal Form (3NF)


A relation is said to be in 3NF if it is in 2NF and no non key attribute is transitively dependent on a key
through some other non key attribute ie. Remove all transitive dependencies.
This situation arises if a non key attribute is functionally dependent on another non key. But key attributes
can depend on non keys ie. Interdependencies may exist.
Example:
Table not in 3NF:
Empinfo
Empid Empname Address Deptno Dname

Empid is the primary key, but Dname depends on Deptno(ie non key attribute), so transitive dependencies
must be removed and the table decomposed.
Tables in 3NF
Empinfo1

Empid Empname Address Deptno

Empinfo2

Deptno Dname

4. Boyce Codd Normal Form(BCNF)


 A relation schema R is in BCNF if it is in 3NF and satisfies an additional constraint that for every FD
X A, X must be a candidate key.
 It is a strict form of 3NF ie every BCNF is 3NF but every 3NF may or may not be BCNF.
 In BCNF we may sacrifice some functional dependencies.

11
 It is usually encountered when there are interdependencies ie one attribute depends on another and
the other attribute depends on the first.
 A relationship is said to be in BCNF if it is already in 3NF and the left hand side of every
dependency is a candidate key.

These could be same situation when a 3NF relation may not be in BCNF the following conditions are found
true.
1. The candidate keys are composite.
2. There are more than one candidate keys in the relation.
3. There are some common attributes in the relation

Example : Table not in BCNF


ProfessorCode Department HeadOfDepartment PercentTime
P1 Physics Ghosh 50
P1 Mathematics Krishnan 50
P2 Chemistry Rao 25
P2 Physics Ghosh 75
P3 Mathematics Krishnan 100

Consider, as an example, the above relation. It is assumed that:


1. A professor can work in more than one department
2. The percentage of the time he spends in each department is given.
3. Each department has only one Head of Department.
The relation diagram for the above relation is given as the following:

12
The given relation is in 3NF. Observe, however, that the names of Dept. and Head of Dept. are duplicated.
Further, if Professor P2 resigns, rows 3 and 4 are deleted. We lose the information that Rao is the Head of
Department of Chemistry.
The normalization of the relation is done by creating a new relation for Dept. and Head of Dept. and deleting
Head of Dept. form the given relation. The normalized relations are shown in the following.
Professor Code Department Percent Time
P1 Physics 50
P1 Mathematics 50
P2 Chemistry 25
P2 Physics 75
P3 Mathematics 100

Department HeadOfDepartment
Ghosh Physics
Mathematics Krishnan
Chemistry Rao

See the dependency diagrams for these new relations.

Fourth Normal Form(4NF)


A relation schema R is said to be in 4NF if for every Multivalued dependency X --->Y that holds over R,
one of following is true
i. X is subset or equal to (or) XY = R.
ii. X is a super key.
A record type should not contain two or more independent multi-valued facts about an entity.

13
If a relation contains multivalue dependencies ie the values of the column are linked with multiple values
of other columns independently then the relation is not in 4NF. If all possible combinations of the values of
the columns can exist then it is not in 4NF, and must be decomposed. Thus it is based on multivalue
dependency.

Example :In the example there are two many-to-many relationships i.e one between employees and
skills, and one between employees and languages.
Example: Table not in 4NF
Employee
Employee Skill Language
Smith Type English
Smith Public Speaking Hindi
Jack Type English
Jack ShortHand Hindi

The above relation should be divided into following relations in order to satisfy 4NF.
Tables in 4NF

EmpSkill EmpLang
Employee Skill Employee Language
Smith Type Smith English
Smith Public Speaking Smith Hindi
Jack Type Jack English
Jack ShortHand Jack Hindi
Multivalue dependency is denoted as ↠

Fifth Normal Form(5NF)


A Relation schema R is said to be 5NF if for every join dependency {R1, R2, …, Rn} that holds R, one the
following is true
i. Ri = R for some i.
ii. The join dependency is implied by the set of FD, over R in which the left side is key of R.

14
If all possible combinations of the values of the columns cannot exist then it is not in 5NF and must
be decomposed, along with an extra join restriction table. Thus it is based on join dependency. It is also
known as Project Join Normal Form(PJNF).

Example :
If an agent sells a certain product, and he represents a company making that product, then he sells that
product for that company.
Table not in 5NF
Tab1
Agent Company Product
Smith Ford Car
Smith Ford Truck
Smith GM Car
Smith GM Truck
Jones Ford Car

Tables in 5NF

Tab2 Tab3 Tab4


Agent Comapny Company Product Agent Product
Smith Ford Ford Car Smith Car
Smith GM Ford Truck Smith Truck
Jones Ford GM Car Jones Car
GM Truck

Note:
 If a relation has join dependency then it can be divided into smaller relations such that if we
combine the smaller relations then we can get the original table.
 If join dependency doesn’t exist then either data is lost or new entries are created.
 Join decomposition is a further generalization of Multivalued dependencies.
 If the join of R1 and R2 over C is equal to relation R then we can say that a join dependency (JD)
exists, where R1 and R2 are the decomposition R1(A, B, C) and R2(C, D) of a given relations R (A,
B, C, D).

15
 Alternatively, R1 and R2 are a lossless decomposition of R.

Conclusion regarding 5NF:


 If a table is decomposed into smaller tables and that leads to some loss of information or some
additional information is getting created then we should not go for decomposition.
 But if breaking down the table doesn’t lead to information loss and by using the decomposed table
we can still verify all the facts about the data then we must decompose the relation.
 For a relation R(A,B,C), if there is a multi-valued dependency between, A & B and A & C where B
and C are independent of each other then 4th Normal Form will be applied.
 For a relation R(A,B,C), if there is a multi-valued dependency between, A & B and A & C where B
and C are interlinked with some restriction then one extra restricted table will be created.

Join of the tables depends on the values of the restriction table.


Properties of Decomposition
Decomposition is a process of splitting a relation or table into subtables, such that they are not
disjoint(ie have common attributes). The basic desirable properties of decomposition are
i. Attribute Preservation
It involves preserving all attributes or columns that were present in the relation that is being
decomposed ie. All columns must be present in the union of all the decomposed tables. All
normal forms satisfy this property.

ii. Lossless join decomposition(Non additive join decomposition)


 It guarantees that the join of all the decomposed tables will result in exactly the same relation
that was originally decomposed(ie. Exactly the same rows must be present in the join of all the
decomposed tables).
 It guarantees that the spurious tuple generation does not occur with respect to the relation
schemas created after decomposition.
Join of the decomposed tables is done through the common attribute, which acts as a link
or glue that gives the ability to find relationship between different tables. If the common attribute
is the primary key of at least one of the two decomposed tables then the problem of losing
information doesn’t exist. Whereas if it is a non key attribute in both the tables then some loss of

16
information may occur. The decomposition of one relation into such subrelations, which do not
give the exact original records, on joining the decomposed tables, is known as lossy
decomposition.(ie gives either more number or less number of rows). All normal forms satisfy
this property.

Algorithm to check for lossless decomposition


Step 1. Construct a matrix, where columns represent the fields of the original table and rows
represent the number of decomposed tables.
Step 2: Make all the cells of the matrix ‘x’.
Step 3: For each decomposed table repeat
Change the value of the cell to ‘y’ for all the fields belonging to that table, in its
corresponding row.
Step 4: For each functional dependency in the decomposed tables repeat
If the LHS of the FD has a ‘y’ value then the RHS of that FD must have a ‘y’ value,
which is checked for all the rows.
Step5: If any of the rows has all ‘y’ values then the decomposition is lossless otherwise it is lossy.

iii. Dependency Preservation


 A dependency is a constraint on the database. If all the attributes appearing on the LHS and RHS
of a dependency, appears in the union of all the decomposed relations, then the dependency is
preserved.
 All the dependencies should be met by combining the dependencies of each individual
decomposed table.
 The Normal Forms upto 3NF satisfy this property but BCNF may or may not satisfy this
property.
 It is an important constraint of the database.
 In the dependency preservation, at least one decomposed table must satisfy every
dependency.
 If a relation R is decomposed into relation R1 and R2, then the dependencies of R
either must be a part of R1 or R2 or must be derivable from the combination of
functional dependencies of R1 and R2.
Example:
Relation R (A, B, C, D) with FD (A->BC)

17
Now if R1(ABC) and R2(AD) then it is dependency preserving because FD A->BC is a part
of relation R1(ABC).

iv. Lack of Redundancy


Redundancy may lead to inconsistency, thus such repetitions of data should be avoided as
much as possible.

18

You might also like