You are on page 1of 146

UNIT 2

Relational Model
Relational Algebra

• The relational model has two formal languages: Relational Algebra and Relational Calculus.

• Other than defining the structure of the database and its constraints, any data model has the
set of operations to manipulate the database and are called as Relational Algebra.

• Relational algebra takes relations as input and gives relations as output.


• Projection operation removes the duplicate tuples from the resulting relation
Informal Design Guidelines for Relational Databases

What is relational database design?

• The grouping of attributes to form "good" relation schemas

Two levels of relation

n schemas

• The logical "user view" level

The storage "base relation" level

• Design is concerned mainly with base relations

What are the criteria for "good" base relations?


We first discuss informal guidelines for good relational design

Then we discuss formal concepts of functional dependencies and normal forms

- 1NF (First Normal Form)

- 2NF (Second Normal Form)

- 3NF (Third Normal Form)

- BCNF (Boyce-Codd Normal Form)


Semantics of the Relation Attributes

GUIDELINE 1: Informally, each tuple in a relation should represent one entity or relationship instance. (Applies to individual
relations and their attributes).

• Attributes of different entities (EMPLOYEEs, DEPARTMENTs, PROJECTs) should not be mixed in the same relation

• Only foreign keys should be used to refer to other entities

• Entity and relationship attributes should be kept apart as much as possible.

Bottom Line: Design a schema that can be explained easily relation by relation. The semantics of attributes should be easy
to interpret.
Redundant Information in Tuples and Update Anomalies

Information is stored redundantly

• Wastes storage

• Causes problems with update anomalies

 Insertion anomalies

 Deletion anomalies

 Modification anomalies
What are the Anomalies in DBMS?

Anomaly means inconsistency in the pattern from the normal form.

In Database Management System (DBMS), anomaly means the inconsistency occurred in the relational table during the
operations performed on the relational table.

There can be various reasons for anomalies to occur in the database.

• For example, if there is a lot of redundant data present in our database then DBMS anomalies can occur.

• If a table is constructed in a very poor manner, then there is a chance of database anomaly. Due to database
anomalies, the integrity of the database suffers.

The other reason for the database anomalies is that all the data is stored in a single table.

So, to remove the anomalies of the database, normalization is the process that is done where the splitting of the table
and joining of the table (different types of join) occurs.
Worker_id Worker_name Worker_dept Worker_address

65 Ramesh ECT001 Jaipur


65 Ramesh ECT002 Jaipur
73 Amit ECT002 Delhi
76 Vikas ECT501 Pune
76 Vikas ECT502 Pune
79 Rajesh ECT669 Mumbai

There can be three types of an anomaly in the database:

Updation / Update Anomaly

When we update some rows in the table, and if it leads to the inconsistency of the table then this anomaly occurs.

In the above table, if we want to update the address of Ramesh then we will have to update all the rows where Ramesh is
present. If during the update we miss any single row, then there will be two addresses of Ramesh, which will lead to
inconsistent and wrong databases.
Insertion Anomaly

If there is a new row inserted in the table and it creates inconsistency in the table, then it is called the insertion anomaly.

For example, if in the above table, we create a new row of a worker, and if it is not allocated to any department then we
cannot insert it in the table so, it will create an insertion anomaly.

Deletion Anomaly

If we delete some rows from the table and if any other information or data which is required is also deleted from the
database, this is called the deletion anomaly in the database.

For example, in the above table, if we want to delete the department number ECT669 then the details of Rajesh will also
be deleted since Rajesh's details are dependent on the row of ECT669. So, there will be deletion anomalies in the table.
Stu_id Stu_name Stu_branch Stu_club
2018nk01 Shivani Computer literature
science
2018nk01 Shivani Computer dancing
science
2018nk02 Ayush Electronics Videography
2018nk03 Mansi Electrical dancing
2018nk03 Mansi Electrical singing
2018nk04 Gopal Mechanical Photography

Mention the anomalies that may occur in the following:

• If Shivani changes her branch from Computer Science to Electronics, then we will have to update all the rows.
• If we add a new row for student Ankit who is not a part of any club
• If we remove the photography club from the college
GUIDELINE 2:

Design a schema that does not suffer from insertion, deletion and update anomalies.
If there are any anomalies present, then note them so that applications can be made to take them into account.

GUIDELINE 3:
Relations should be designed such that their tuples will have as few NULL values as possible
Attributes that are NULL frequently could be placed in separate relations (with the primary key)
Reasons for nulls:
Attribute not applicable or invalid
Attribute value unknown (may exist)
A value known to exist, but unavailable

Bad designs for a relational database may result in erroneous results for certain JOIN operations
The "lossless join" property is used to guarantee meaningful results for join operations

GUIDELINE 4:

• The relations should be designed to satisfy the lossless join condition.


• No spurious tuples should be generated by doing a natural join of any relations.
There are two important properties of decompositions:

• Non-additive or losslessness of the corresponding join


• Preservation of the functional dependencies.

Note that:
• Property (a) is extremely important and cannot be sacrificed.
• Property (b) is less stringent and may be sacrificed.
Functional Dependency

In any relation, a functional dependency α → β holds if- Two tuples having same value of attribute α also have same value
for attribute β.
α-Determinant
β- Dependent
Mathematically,

If α and β are the two sets of attributes in a relational table R where-

α⊆R
β⊆R
Then, for a functional dependency to exist from α to β,

If t1[α] = t2[α], then t1[β] = t2[β]


Types Of Functional Dependencies-

There are two types of functional dependencies-

1. Trivial Functional Dependencies-

•A functional dependency X → Y is said to be trivial if and only if Y ⊆ X.


•Thus, if RHS of a functional dependency is a subset of LHS, then it is called as a trivial functional dependency.

Examples-

The examples of trivial functional dependencies are-


•AB → A
•AB → B
•AB → AB
2. Non-Trivial Functional Dependencies-

•A functional dependency X → Y is said to be non-trivial if and only if Y ⊄ X.

•Thus, if there exists at least one attribute in the RHS of a functional dependency that is not a part of LHS, then it is
called as a non-trivial functional dependency.

Examples-

The examples of non-trivial functional dependencies are-


•AB → BC
•AB → CD

a b c d e
Question: Which of the following options are correct:
A 2 3 4 5
a) A-BC 2 A 3 4 5
b) DE-C A 2 3 6 5
c) C-DE A 2 3 6 6
d) BC-A
Inference Rules-

Reflexivity-
If B is a subset of A, then A → B always holds.

Transitivity-
If A → B and B → C, then A → C always holds.

Augmentation-
If A → B, then AC → BC always holds.

Decomposition-
If A → BC, then A → B and A → C always holds.

Composition-
If A → B and C → D, then AC → BD always holds.

Additive-
If A → B and A → C, then A → BC always holds.
Rules for Functional Dependency-
Rule-01:

A functional dependency X → Y will always hold if all the values of X are unique (different) irrespective of the values of Y.

Example-

Consider the following table-

The following functional dependencies will always hold since all the values of attribute ‘A’ are unique-
• A→B
• A → BC
• A → CD
• A → BCD
• A → DE
• A → BCDE
In general, we can say following functional dependency will always hold- A → Any combination of attributes A, B, C, D,
E
Rule-02:

A functional dependency X → Y will always hold if all the values of Y are same irrespective of the values of X.

Example-
Consider the following table-

The following functional dependencies will always hold since all the values of attribute ‘C’ are same-
• A→C
• AB → C
• ABDE → C
• DE → C
• AE → C
In general, we can say following functional dependency will always hold true- Any combination of attributes A, B, C, D, E → C
Rule-03:

For a functional dependency X → Y to hold, if two tuples in the table agree on the value of attribute X, then they must
also agree on the value of attribute Y.

Rule-04:

For a functional dependency X → Y, violation will occur only when for two or more same values of X, the corresponding
Y values are different.

• These properties are used to identify new FDs.

• Extensively used to find candidate keys in a relation.


Closure Properties

Closure properties can be discussed in two aspects :

1. Closure Set of attributes

2. Closure set of FDs

Closure of an Attribute Set-


•The set of all those attributes which can be functionally determined from an attribute set is called as a closure of that
attribute set.

•Closure of attribute set {X} is denoted as {X} +.


Following steps are followed to find the closure of an attribute set-

Step-01:

Add the attributes contained in the attribute set for which closure is being calculated to the result set.

Step-02:

Recursively add the attributes to the result set which can be functionally determined from the attributes already contained in
the result set.

Example 1: In a relation R (A, B, C) the following FDs hold FD:{AB, BC}

The set of all attributes that can be determined using given set of attributes is called attribute closure.

Thus A+ {B,C,A}

In this example A-B and B- C so using transitivity axiom we can include B and C in the closure set and A-A is a trivial
functional dependency. Thus,

B+ {B,C}
EXAMPLE 2
Consider a relation R ( A , B , C , D , E , F , G ) with the functional dependencies-
A → BC
BC → DE
D→F
CF → G

Now, let us find the closure of some attributes and attribute sets-

Closure of attribute A-

A+ = { A }
= {A, B , C } ( Using A → BC )
= {A, B , C , D , E } ( Using BC → DE )
= {A, B , C , D , E , F } ( Using D → F )
= {A, B , C , D , E , F , G } ( Using CF → G )
Thus,
A+ = { A , B , C , D , E , F , G }
Closure of attribute D-

D+ = { D }
= { D , F } ( Using D → F )
We can not determine any other attribute using attributes D and F contained in the result set.
Thus,
D+ = { D , F }

Closure of attribute set {B, C}-

{ B , C } += { B , C }
={B,C,D,E} ( Using BC → DE )
={B,C,D,E,F} ( Using D → F )
={B,C,D,E,F,G} ( Using CF → G )
Thus,
{ B , C }+ = { B , C , D , E , F , G }
Finding the Keys Using Closure-

Super Key-

•If the closure result of an attribute set contains all the attributes of the relation, then that attribute set is called as a super key
of that relation.
•Thus, we can say-
“The closure of a super key is the entire relation schema.”

Example-

In the above example,


•The closure of attribute A is the entire relation schema.
•Thus, attribute A is a super key for that relation.

Candidate Key-

•If there exists no subset of an attribute set whose closure contains all the attributes of the relation, then that attribute set is
called as a candidate key of that relation.
Example-

In the above example,


•No subset of attribute A contains all the attributes of the relation.
•Thus, attribute A is also a candidate key for that relation.
PRACTICE PROBLEM BASED ON FINDING CLOSURE OF AN ATTRIBUTE SET-

Problem-

Consider the given functional dependencies-


AB → CD
AF → D
DE → F
C→G
F→E
G→A

Which of the following options is false?


(A) { CF }+ = { A , C , D , E , F , G }
(B) { BG }+ = { A , B , C , D , G }
(C) { AF }+ = { A , C , D , E , F , G }
(D) { AB }+ = { A , C , D , F ,G }
Solution-

Let us check each option one by one-

Option-(A):

{ CF }+ = { C , F }
={C,F,G} ( Using C → G )
={C,E,F,G} ( Using F → E )
= {A, C , E , E , F } ( Using G → A )
= { A , C , D , E , F , G } ( Using AF → D )

Since, our obtained result set is same as the given result set, so, it means it is correctly given.

Option-(B):

{ BG }+ = { B , G }
= {A, B , G } ( Using G → A )
= {A, B , C , D , G } ( Using AB → CD )

Since, our obtained result set is same as the given result set, so, it means it is correctly given.
Option-(C):

{ AF }+ = { A , F }
= {A, D , F } ( Using AF → D )
= {A, D , E , F } ( Using F → E )

Since, our obtained result set is different from the given result set, so,it means it is not correctly given.

Option-(D):

{ AB }+ = { A , B }
= {A, B , C , D } ( Using AB → CD )
= {A, B , C , D , G } ( Using C → G )

Thus,
Option (C) and Option (D) are correct.
Candidate Key-

A candidate key may be defined as-


A set of minimal attribute(s) that can identify each tuple uniquely in the given relation is called as a candidate key.

OR

A minimal super key is called as a candidate key.

Finding Candidate Keys-

Step-01:

•Determine all essential attributes of the given relation.


•Essential attributes are those attributes which are not present on RHS of any functional dependency.
•Essential attributes are always a part of every candidate key.
•This is because they can not be determined by other attributes.
Example
Let R(A, B, C, D, E, F) be a relation scheme with the following functional dependencies-

A→B

C→D

D→E
Here, the attributes which are not present on RHS of any functional dependency are A, C and F.

So, essential attributes are- A, C and F.

Step-02:

•The remaining attributes of the relation are non-essential attributes.


•This is because they can be determined by using essential attributes.
Case-01:

If all essential attributes together can determine all remaining non-essential attributes, then-
•The combination of essential attributes is the candidate key.
•It is the only possible candidate key.

Case-02:

If all essential attributes together can not determine all remaining non-essential attributes, then-
•The set of essential attributes and some non-essential attributes will be the candidate key(s).
•In this case, multiple candidate keys are possible.
•To find the candidate keys, we check different combinations of essential and non-essential attributes.
PRACTICE PROBLEM BASED ON FINDING CANDIDATE KEYS-

Problem-01:

Let R = (A, B, C, D, E, F) be a relation scheme with the following dependencies-


C→F
E →A
EC → D
A→B
Which of the following is a key for R?
1.CD
2.EC
3.AE
4.AC

Also, determine the total number of candidate keys and super keys.
Solution-
We will find candidate keys of the given relation in the following steps-

Step-01:

•Determine all essential attributes of the given relation.


•Essential attributes of the relation are- C and E.
•So, attributes C and E will be a part of every candidate key.

Step-02:
Now,
•We will check if the essential attributes together can determine all remaining non-essential attributes.
•To check, we find the closure of CE.
So, we have-
{ CE }+
={C,E}
={C,E,F} ( Using C → F )
= {A, C , E , F} ( Using E → A )
= {A, C , D , E , F } ( Using EC → D )
= {A, B , C , D , E , F } ( Using A → B )

We conclude that CE can determine all the attributes of the given relation.
So, CE is the only possible candidate key of the relation.
Thus, Option (B) is correct.
Total Number of Candidate Keys-

Only one candidate key CE is possible.

Total Number of Super Keys-

There are a total of 6 attributes in the given relation of which-


•There are 2 essential attributes- C and E.
•Remaining 4 attributes are non-essential attributes.
•Essential attributes will be present in every key.
•Non-essential attributes may or may not be taken in every super key.

The generalized formula will be for the table if only one candidate key is available and K is the number of attributes,
then total super keys = 2(K-1).

So, the number of super keys possible = 32.


Problem-02:

Let R = (A, B, C, D, E) be a relation scheme with the following dependencies-


AB → C
C→D
B→E
Determine the total number of candidate keys and super keys.
Problem-03:

Consider the relation scheme R(E, F, G, H, I, J, K, L, M, N) and the set of functional dependencies-
{ E, F } → { G }
{F}→{I,J}
{ E, H } → { K, L }
{K}→{M}
{L}→{N}

What is the key for R?


1.{ E, F }
2.{ E, F, H }
3.{ E, F, H, K, L }
4.{ E }

Also, determine the total number of candidate keys and super keys.
Closure Of Functional Dependency :

The Closure Of Functional Dependency means the complete set of all possible attributes that can be functionally derived
from given functional dependency using the inference rules known as Armstrong’s Rules.
If “F” is a functional dependency, then closure of functional dependency can be denoted using “{F}+”.

There are three steps to calculate closure of functional dependency. These are:

Step-1 : Add the attributes which are present on Left Hand Side in the original functional dependency.

Step-2 : Now, add the attributes present on the Right-Hand Side of the functional dependency.

Step-3 : With the help of attributes present on Right Hand Side, check the other attributes that can be derived from the other
given functional dependencies.

Repeat this process until all the possible attributes which can be derived are added in the closure.
In a relation R(A,B,C) the following FDs exist:
F:{A-B, B-C}

F+ is all the set of FDs that exist.

Q- Find the number of FDs in F+ for a relation with two attributes R(A,B).

Take possible subsets of A and B then determine X-Y where x can be subset of {A,B} and Y is a subset of {A,B}.

Thus, possible subsets of X are: Φ, A, B,AB

The possible subsets of Y are: Φ, A, B,AB

Total FDs are 4*4-=16 FDs

Thus, if R has cardinality as n i.e., having n attributes then possible subsets are 2 n

Practice Problem:

R(A,B,C) and F:{AB, B C}


How many FDs exist and how many are valid?
Decomposition of a Relation-

The process of breaking up or dividing a single relation into two or more sub relations is called as decomposition of a
relation.

Properties of Decomposition-

The following two properties must be followed when decomposing a given relation-

1. Lossless decomposition-

Lossless decomposition ensures-


•No information is lost from the original relation during decomposition.
•When the sub relations are joined back, the same relation is obtained that was decomposed.
Every decomposition must always be lossless.
2. Dependency Preservation-

Dependency preservation ensures-

None of the functional dependencies that holds on the original relation are lost.
The sub relations still hold or satisfy the functional dependencies of the original relation.

Types of Decomposition-

Decomposition of a relation can be completed in the following two ways-


1. Lossless Join Decomposition-

Consider there is a relation R which is decomposed into sub relations R1 , R2 , …. , Rn.

This decomposition is called lossless join decomposition when the join of the sub relations results in the same relation
R that was decomposed.

For lossless join decomposition, we always have-

R1 ⋈ R2 ⋈ R3 ……. ⋈ Rn = R

where ⋈ is a natural join operator


2. Lossy Join Decomposition-

Consider there is a relation R which is decomposed into sub relations R1 , R2 , …. , Rn.


This decomposition is called lossy join decomposition when the join of the sub relations does not result in the same
relation R that was decomposed.
The natural join of the sub relations is always found to have some extraneous tuples.
For lossy join decomposition, we always have-
R1 ⋈ R2 ⋈ R3 ……. ⋈ Rn ⊃ R
where ⋈ is a natural join operator
Determining Whether Decomposition Is Lossless Or Lossy-

Consider a relation R is decomposed into two sub relations R 1 and R2.


Then,
•If all the following conditions satisfy, then the decomposition is lossless.
•If any of these conditions fail, then the decomposition is lossy.

Condition-01:

Union of both the sub relations must contain all the attributes that are present in the original relation R.
Thus, R1 ∪ R2 = R
Condition-02:

Intersection of both the sub relations must not be null. In other words, there must be some common attribute which is
present in both the sub relations.
Thus,

R1 ∩ R2 ≠ ∅
Condition-03:

Intersection of both the sub relations must be a super key of either R1 or R2 or both.
PRACTICE PROBLEM BASED ON DETERMINING WHETHER DECOMPOSITION IS LOSSLESS OR LOSSY-

Problem-01:

Consider a relation schema R ( A , B , C , D ) with the functional dependencies A → B and C → D. Determine whether the
decomposition of R into R1 ( A , B ) and R2 ( C , D ) is lossless or lossy.

Solution-

To determine whether the decomposition is lossless or lossy,


•We will check all the conditions one by one.
•If any of the conditions fail, then the decomposition is lossy otherwise lossless.

Condition-01:

According to condition-01, union of both the sub relations must contain all the attributes of relation R.
So, we have-
R1 ( A , B ) ∪ R 2 ( C , D )
= R (A, B , C , D )
Clearly, union of the sub relations contain all the attributes of relation R.
Thus, condition-01 satisfies.
Condition-02:

According to condition-02, intersection of both the sub relations must not be null.
So, we have-
R1 ( A , B ) ∩ R 2 ( C , D )

Clearly, intersection of the sub relations is null.
So, condition-02 fails.

Thus, we conclude that the decomposition is lossy.

Problem-02:

Consider a relation schema R ( A , B , C , D ) with the following functional dependencies-


A→B
B→C
C→D
D→B
Determine whether the decomposition of R into R 1 ( A , B ) , R2 ( B , C ) and R3 ( B , D ) is lossless or lossy.
Normalization in DBMS-

In DBMS, database normalization is a process of making the database consistent by-

•Reducing the redundancies


•Ensuring the integrity of data through lossless decomposition
Normalization is done through normal forms.
First Normal Form-

A given relation is called in First Normal Form (1NF) if each cell of the table contains only an atomic value.
OR
A given relation is called in First Normal Form (1NF) if the attribute of every tuple is either single valued or a null value.

Example-

The following relation is not in 1NF-

Student_id Name Subjects

100 Akshay Computer Networks, Designing

101 Aman Database Management System

102 Anjali Automata, Compiler Design


However,
•This relation can be brought into 1NF.
•This can be done by rewriting the relation such that each cell of the table contains only one value.

Relation is in 1NF

This relation is in First Normal Form (1NF).

Student_id Name Subjects

100 Akshay Computer Networks


100 Akshay Designing
101 Aman Database Management System
102 Anjali Automata
102 Anjali Compiler Design

NOTE-

•By default, every relation is in 1NF.


•This is because formal definition of a relation states that value of all the attributes must be atomic.
Second Normal Form-

A given relation is called in Second Normal Form (2NF) if and only if-
1.Relation already exists in 1NF.
2.No partial dependency exists in the relation.

Partial Dependency

A partial dependency is a dependency where few attributes of the candidate key determines non-prime attribute(s).
OR
A partial dependency is a dependency where a portion of the candidate key or incomplete candidate key determines non-prime
attribute(s).

In other words,
A → B is called a partial dependency if and only if-
1.A is a subset of some candidate key
2.B is a non-prime attribute.
If any one condition fails, then it will not be a partial dependency.

NOTE-

•To avoid partial dependency, incomplete candidate key must not determine any non-prime attribute.
•However, incomplete candidate key can determine prime attributes.
Example-

Consider a relation- R ( V , W , X , Y , Z ) with functional dependencies-


VW → XY
Y→V
WX → YZ

The possible candidate keys for this relation are-


VW , WX , WY

From here,
•Prime attributes = { V , W , X , Y }
•Non-prime attributes = { Z }

Now, if we observe the given dependencies-


•There is no partial dependency.
•This is because there exists no dependency where incomplete candidate key determines any non-prime attribute.

Thus, we conclude that the given relation is in 2NF.


Third Normal Form-

A given relation is called in Third Normal Form (3NF) if and only if-

Relation already exists in 2NF.


No transitive dependency exists for non-prime attributes.

Transitive Dependency

A → B is called a transitive dependency if and only if-

A is not a super key.


B is a non-prime attribute.
If any one condition fails, then it is not a transitive dependency.

NOTE-

Transitive dependency must not exist for non-prime attributes.


However, transitive dependency can exist for prime attributes.
OR

A relation is called in Third Normal Form (3NF) if and only if-

Any one condition holds for each non-trivial functional dependency A → B

A is a super key
B is a prime attribute

You might also like