You are on page 1of 50

Unit 5

Relational Database Design

Note: Please refer course book for explore


more knowledge
Relational Database Design
the goal of relational database design is to generate a set of relation
schemas that allows us to store information without unnecessary
redundancy, yet also allows us to retrieve information easily.
accomplished by designing schemas that are in an appropriate
normal form.
To determine whether a relation schema is in one of the desirable
normal forms, we need information about the real-world enterprise
that we are modeling with the database

Prepared By Kul P. Paudel,Relational Database Design


Pitfalls in Relational DB Design
A bad design may have several properties, including:
Repetition of information.

Inability to represent certain information.

Loss of information.

Prepared By Kul P. Paudel,Relational Database Design


Combine Schemas?
• Suppose we combine instructor and department into inst_dept
• (No connection to relationship set inst_dept)
• Result is possible repetition of information

Prepared By Kul P. Paudel,Relational Database Design


A Combined Schema Without Repetition
• Consider combining relations
• sec_class(sec_id, building, room_number) and
• section(course_id, sec_id, semester, year)
into one relation
• section(course_id, sec_id, semester, year,
building, room_number)
• No repetition in this case

Prepared By Kul P. Paudel,Relational Database Design


Decompose schema into multiple table
• Suppose we had started with inst_dept. How would we know to split up (decompose)
it into instructor and department?
• Write a rule “if there were a schema (dept_name, building, budget), then dept_name
would be a candidate key”
• Denote as a functional dependency:
dept_name  building, budget
• In inst_dept, because dept_name is not a candidate key, the building and budget of a
department may have to be repeated.
• This indicates the need to decompose inst_dept
• Not all decompositions are good. Suppose we decompose
employee(ID, name, street, city, salary) into
employee1 (ID, name)
employee2 (name, street, city, salary)
• The next slide shows how we lose information -- we cannot reconstruct the original
employee relation -- and so, this is a lossy decomposition.
Prepared By Kul P. Paudel,Relational Database Design
Decomposition
Decomposition in DBMS removes redundancy, anomalies and
inconsistencies from a database by dividing the table into multiple
tables.
Lossy Decomposition: when a relation is decomposed into two or more
relational schemas, the loss of information is unavoidable when the
original relation is retrieved.

Lossless Decomposition: Decomposition is lossless if it is feasible to


reconstruct relation R from decomposed tables using Joins. This is the
preferred choice. The information will not lose from the relation when
decomposed.

Prepared By Kul P. Paudel,Relational Database Design


Lossy Decomposition Example

Prepared By Kul P. Paudel,Relational Database Design


Example of Lossless-Join Decomposition
• Decomposition of R = (A, B, C)
R1 = (A, B)R2 = (B, C)

A B C A B B C
 1 A  1 1 A
 2 B  2 2 B
r A,B(r) B,C(r)

A B C
A (r) B (r)
 1 A
 2 B

Prepared By Kul P. Paudel,Relational Database Design


Goal — Devise a Theory for the Following
• Decide whether a particular relation R is in “good” form.
• In the case that a relation R is not in “good” form, decompose it into a
set of relations {R1, R2, ..., Rn} such that
• each relation is in good form
• the decomposition is a lossless-join decomposition
• Our theory is based on:
• functional dependencies
• multivalued dependencies

Note: Functional Dependencies will cover in next topic


Prepared By Kul P. Paudel,Relational Database Design
Anomalies
Anomalies are problems that can occur in poorly planned, un-
normalized databases where all the data is stored in one table (a flat-
file database).
 Insert anomaly: insertion anomaly occurs in database when certain
attributes cannot be inserted without the presence of other.
 Delete anomaly: Certain information are loss due to the delete of
other values.
 Update anomaly: anomalies occur during the time of recording the
data in any one table causes a partial update or data redundant.

Prepared By Kul P. Paudel,Relational Database Design


Table Design
Teach_id Teacher_name Department Course_name
T1010 Mr. R K Agrawal Science & Technology Physics
T1011 Mr. Shyam Krishna Mahat Management Economics
T1012 Mr. R K Agrawal Science & Technology Biology
T1013 Ms. Anjila Bhatta Information Technology Java Programming

 Insertion anomalies: If new teacher is added, but s/he is not assign to any
department or any course and if null entries are not allowed in database then
insertion anomalies occur.
 Deletion anomalies: If management department is deleted then records of
teacher as well as course name also deleted from database.
 Update anomalies: If department assign to R K Agrawal is an error, then it need
to updated at two places to make consistency.
Functional Dependencies
Functional dependency is a relationship that exists when one
attribute uniquely determines another attribute.

Functional dependency (FD) is a set of constraints between


two attributes in a relation. Functional dependency says that
if two tuples have same values for attributes A1, A2,..., An,
then those two tuples must have to have same values for
attributes B1, B2, ..., Bn.

If A is the determinant and B is the determined then we say


that A functionally determines B and graphically represent
this as A  B. The symbols AB, can also be expressed as
B is functionally determined by A.

Prepared By Kul P. Paudel,Relational Database Design


Procedure to Compute F+
F+ = F
repeat
for each functional dependency f in F+
apply reflexivity and augmentation rules on f
add the resulting functional dependencies to F+
for each pair of functional dependencies f1 and f2 in F+
if f1 and f2 can be combined using transitivity
add the resulting functional dependency to F+
until F+ does not change any further

Prepared By Kul P. Paudel,Relational Database Design


Functional Dependencies
An important property of a functional dependency is Armstrong’s
axiom
In a relation, R, with three attributes (X, Y, Z) Armstrong’s axiom holds
true if the following conditions are satisfied:

Axiom of Transivity: If X->Y and Y->Z, then X->Z


Axiom of Reflexivity (Subset Property): If Y is a subset of X, then X->Y
Axiom of Augmentation: If X->Y, then XZ->YZ

Prepared By Kul P. Paudel,Relational Database Design


FD Example
Instructor_id name phone
1010 Ram Krishna Karki 9851112130
1020 Subodh Satyal 9801112134
1030 Smiriti Shah 9841424344
1040 Ram Krishna Karki 9803214315

Instructor_id  name (instructor_id functionally determine name)

instructor_id  phone (instructor_id functionally determine phone)

Prepared By Kul P. Paudel,Relational Database Design


Closure of a Set of Functional
Dependencies
• Given a set F of functional dependencies, there are certain other functional
dependencies that are logically implied by F.
E.g. If A → B and B → C, then we can infer that A → C
The set of all functional dependencies logically implied by F is the closure of
F.
We denote the closure of F by F+.
We can find all of F+ by applying Armstrong’s Axioms:
if β ⊆ α, then α → β (reflexivity)
if α → β, then γ α → γ β (augmentation)
if α → β, and β → γ, then α → γ (transitivity)

Prepared By Kul P. Paudel,Relational Database Design


Example of FD
R = (A, B, C, G, H, I)
Given F = { A → B ,A → C, CG → H, CG → I, B → H}
some members of F+
A → H : by transitivity from A → B and B → H
AG → I : by augmenting A → C with G, to get AG → CG and then transitivity with
CG → I
CG → HI :from CG → H and CG → I : “union rule” can be inferred from definition of
functional dependencies, or:
(1) augmentation of CG → I to infer CG → CGI, (2) augmentation of CG → H to infer
CGI → HI, and then (3) transitivity.

Prepared By Kul P. Paudel,Relational Database Design


Closure of Attribute Sets
The set of attributes that are functionally dependent on the attribute
A is called Attribute Closure of A and it can be represented as A+.
R(A,B,C,D)
Given :A B, B DC B
Closure of Attribute A= A+ : Set of attributes which can be determined
by A.
A+= ABD
Hence Closure of A = ABD

Prepared By Kul P. Paudel,Relational Database Design


An algorithm for computing +:
result := 
repeat
temp := result
for each functional dependency    in F do
if   result then
result := result  
until temp = result

Prepared By Kul P. Paudel,Relational Database Design


Find the closure of given:
Relation R: (A,B,C,D,E)
Given FD {AB C, BD, CE, DA} Find B+ :
{B}
{BD}: BD
{BDA} : DA
{BDAC}: ABC
{BDACE}: CE

Find (C,D)+
Find (B,C)+
Prepared By Kul P. Paudel,Relational Database Design
Problem
• Consider the given functional dependencies-
AB → CD
AF → D
DE → F
C→G
F→E
G→A

Prepared By Kul P. Paudel,Relational Database Design


Which of the following options is false?

i. { CF }+ = { A , C , D , E , F , G }
ii. { BG }+ = { A , B , C , D , G }
iii. { AF }+ = { A , C , D , E , F , G }
iv. { AB }+ = { A , C , D , F ,G }

Prepared By Kul P. Paudel,Relational Database Design


Solution for: { CF }+ = { A , C , D , E , F , G }

{ CF }+ = { C , F }
= { C , F , G } ( Using C → G )
= { C , E , F , G } ( Using F → E )
= { A , C , E , E , F } ( Using G → A )
= { A , C , D , E , F , G } ( Using AF → D )

Given closure match with the result so it is true.

Prepared By Kul P. Paudel,Relational Database Design


Solution for: { BG }+ = { A , B , C , D , G }
{ BG }+ = { B , G }
= { A , B , G } ( Using G → A )
= { A , B , C , D , G } ( Using AB → CD )

Result is same as given closure so it is true.

Prepared By Kul P. Paudel,Relational Database Design


Solution for: { AF }+ = { A , C , D , E , F , G }
• { AF }+ = { A , F }
• = { A , D , F } ( Using AF → D )
• = { A , D , E , F } ( Using F → E )

Since result is not match with the given closures so it is false.

Prepared By Kul P. Paudel,Relational Database Design


Solution for: { AB }+ = { A , C , D , F ,G }
{ AB }+ = { A , B }
= { A , B , C , D } ( Using AB → CD )
= { A , B , C , D , G } ( Using C → G )

Since result is not match with the given closures so it is false.

Prepared By Kul P. Paudel,Relational Database Design


Candidate Key: Candidate Key is minimal set of attributes of a relation
which can be used to identify a tuple uniquely.
Note: A candidate key may or may not be a primary key.

Super Key: Super Key is set of attributes of a relation which can be


used to identify a tuple uniquely.

Note: A Candidate key is always super key but not vice versa.

Prepared By Kul P. Paudel,Relational Database Design


C lo su reo fA trib u teS etsco n td .

Findings Candidate Key


R (A, B, C, D, E, F)
Set of FD are given below:
A C
CD
DB
EF
Find all possible candidate key
List all the attributes which are not present in RHS of FD.
A, E
Now find out closure of AE (AE)+=AECDBF
All attributes are determined by AE, so AE is our candidate key.
Prepared By Kul P. Paudel,Relational Database Design
Finding Candidate Keys and Super Keys of a Relation using FD set

Staff (s_id,name,city,state)
FD: s_id name,s_idcity, s_idstate, citystate
Closure of attribute set
(s_id)+= (name, city, state)
(s_id,name)+= (name, city, state)
(s_id,city)+= (name, city, state)
(s_id,state)+= (name, city, state)
(name)+= (name)
(city)+= (city,state)
(s_id)+,(s_id,name)+,(s_id,city)+ ,(s_id,state)+ determine all the attributes of relation set
so all of these are super key.
Here minimal super key is s_id so it is candidate key.
Prepared By Kul P. Paudel,Relational Database Design
• In a schema with attributes A, B, C, D and E following set of
functional dependencies are given
{A -> B, A -> C, CD -> E, B -> D, E -> A}
Which of the following functional dependencies is NOT implied by
the above set?
A. CD -> AC
B. BD -> CD
C. BC -> CD
D. AC -> BC
• Answer: Using FD set given in question,
(CD)+ = {CDEAB} which means CD -> AC also holds true.
(BD)+ = {BD} which means BD -> CD can’t hold true. So this FD is no
implied in FD set. So (B) is the required option.

Note: Check for remaining


Prepared By Kul P. Paudel,Relational Database Design
• Consider a relation scheme R = (A, B, C, D, E, H) on which the following
functional dependencies hold: {A–>B, BC–> D, E–>C, D–>A}. What are the
candidate keys of R?
(a) AE, BE
(b) AE, BE, DE
(c) AEH, BEH, BCH
(d) AEH, BEH, DEH
• Answer: (AE)+ = {ABECD} which is not set of all attributes.
AEB : A->B
AEBC: E->C
AEBCD: BC->D
Hence, AE Cannot Determine all the attribute set of given schema so it is not
candidate key.
Now check for other in same way.
Prepared By Kul P. Paudel,Relational Database Design
Normalization
Normalization is a technique of organizing the data in the database.
Systematic approach of decomposing tables to eliminate data
redundancy and undesirable characteristics like Insertion, Update and
Deletion Anomalies.

Normalization is used for mainly two purpose

• Eliminating redundant (useless) data.


• Ensuring data dependencies make sense i.e, data is logically stored.

Prepared By Kul P. Paudel,Relational Database Design


Normalization Process

First Normal Form (1NF)

Second Normal Form (2NF)

Third Normal Form (3NF)

Boyce-Codd Normal Form (BCNF)


Prepared By Kul P. Paudel,Relational Database Design
First Normal Form
A table is said to be in the 1NF when each cell of the table contains
precisely one value .
No two Rows of data must contain repeating group of information i.e
each set of column must have a unique value, such that multiple
columns cannot be used to fetch the same row.
eid name job dep_id knowledge
1 Abhinav Designer 10 HTML, CSS, jQuery
2 Anjel Network Enginer 20 CCNA, CISA

 Given table is not in 1NF because knowledge has more than one
value.
Solution : split table
Prepared By Kul P. Paudel,Relational Database Design
Cont..
eid Name Job Dep_id eid Knowledge

1 Abhinav Designer 10 1 HTML


1 CSS
2 Anjel Network Enginer 20
1 JQuery
2 CCNA
2 CISA

Prepared By Kul P. Paudel,Relational Database Design


Second Normal Form
A table is said to be in the 2NF when it is in 1NF and every attribute in
the row is functionally dependent upon the whole key, not just part of
the key.
Consider the given table

Prepared By Kul P. Paudel,Relational Database Design


This situation could lead to the following problems:
Insertion: The department of a particular employee cannot
be recorded until the employee is assigned a project.
Updating: For a given employee, the employee code and
department are repeated several times. Hence if an
employee is transferred to another department, this change
will have to be recorded in every row.
Deletion: If an employee completes work on a project, the
employee’s record will be deleted. The information regarding
the department to which the employee will also be lost.

Prepared By Kul P. Paudel,Relational Database Design


Second Normal Form
Given table satisfies the definition to 1NF.
Let check whether it satisfies 2NF or not.
For each values of Ecode, there is more than one value of hours. i.e.
Hours is not functionally dependent on Ecode. Similarly for ProjCode.
However, for a combination of Ecode and ProjCode values, there is one
value of Hours. Hence Hours is functionally dependent on whole key,
Ecode+ProjCode. Dept is functionally dependent on Ecode but not
functionally dependent on ProjCode. Thus Dept is functionally
dependent on part of the key not on the whole key. Therefore table
Project is not in 2NF.

Prepared By Kul P. Paudel,Relational Database Design


Second Normal Form
Guidelines for converting table to 2NF:
Find and remove attributes that are functionally dependent on only a
part of the key are not on the whole key. Place them in a different
table.
Group the remaining attributes.

Prepared By Kul P. Paudel,Relational Database Design


Third Normal Form
A relation is said to be in 3NF when it is in 2NF and every non-prime
attribute of table must be dependent on primary key, or we can say
that, there should not be the case that a non-prime attribute is
determined by another non-prime attribute. So this transitive
functional dependency should be removed from the table
Consider the table Employee.

Prepared By Kul P. Paudel,Relational Database Design


Given situation could lead to the following problems:
Insertion: The department head of a new department that does not
have any employee at present cannot be inserted in the DeptHead
column.
Updating: For a given department, the code for a particular
department head is repeated several times. Hence, if a department
head moves to another department, the change will have to be made
consistently across the table.
Deletion: If the record of an employee is deleted, the information
regarding the head of the department will also be deleted.

Prepared By Kul P. Paudel,Relational Database Design


Third Normal Form
This table satisfies the definition of 1NF and 2NF.
Lets check whether it satisfies 3NF or Not:
Attribute DeptHead is dependent on the attribute Dept also.
According to 3NF rule, all non-key attributes have to be
functionally dependent only on the primary key.
This table is not in 3NF since DeptHead is functionally
dependent on Dept, which is not a primary key.

Prepared By Kul P. Paudel,Relational Database Design


Guidelines for converting a table to 3NF:
Find and remove non-key attributes that are functionally dependent
on attributes that are not the primary key. Place them in a different
table.
Group the remaining attributes.

Prepared By Kul P. Paudel,Relational Database Design


Boyce-Codd Normal Form
The original definition of 3NF was inadequate in some situations. It was
not satisfactory for the tables:
If table have multiple candidate keys.
Where the multiple candidate keys are composite.
Where the multiple candidate keys are overlapped.

Note: Hence, a new normal form Boyce-Codd Normal Form was


introduced. In tables where the above 3 conditions do not apply, you
can stop at 3NF, in such case, 3NF is same as BCNF.

Prepared By Kul P. Paudel,Relational Database Design


Cont…
A relation is in BCNF if and only if every determinant is a candidate key.
Consider the following table :

This table has redundancies. If the name of an employee is changes,


the change will have to be made in every row of the table, otherwise
there will be inconsistencies.

Prepared By Kul P. Paudel,Relational Database Design


Cont…
Ecode + ProjCode is the primary key. Name + ProjCode should be chosen
as primary key and hence is a candidate key.
Hours is functionally dependent on Ecode+ProjCode.
Hours is also functionally dependent on name+ProjCode.
Name is functionally dependent on Ecode.
Ecode is functionally dependent on Name.

You will notice this table has:


• Multiple candidate keys, i.e. Ecode+ProjCode and Name+ProjCode.
• The candidate keys are composite.
• The candidate keys overlap since the attribute ProjCode is common.
Prepared By Kul P. Paudel,Relational Database Design
Cont..
Ecode and Name are determinants since they are functionally dependent on each
other. However, they are not candidate keys by themselves. As per BCNF, the
determinants have to be candidate keys.
Guidelines for converting a table to BCNF:
Find and remove the overlapping candidate keys. Place the part of the candidate
key and the attribute it is functionally dependent on, in a different table.
Group the remaining attributes into a table

Prepared By Kul P. Paudel,Relational Database Design


Denormalization in Databases
As the name suggests, denormalization is the opposite of normalization.

Normalization organize data to ensure integrity and eliminate


redundancies. Database denormalization means you deliberately put the
same data in several places, thus increasing redundancy.

The intentional introduction of redundancy in a table in order to improve


performance is called denormalization. The decision to de-normalize will
obviously result in a trade-off between performance and data integrity.
Denormalization also increases disk space utilization.
Prepared By Kul P. Paudel,Relational Database Design
End of Chapter 5

Prepared By Kul P. Paudel,Relational Database Design

You might also like