You are on page 1of 104

Functional Dependency

• Functional dependency in RDBMS, as the name suggests is a


relationship between attributes of a table dependent on each other.
Introduced by E. F. Codd, it helps in preventing data redundancy.

• The functional dependency is a relationship that exists between two


attributes. It typically exists between the primary key and non-key
attribute within a table.
Functional Dependency
• Functional Dependency is represented by -> (arrow sign).
• Then the following will represent the functional dependency between
attributes with an arrow sign A -> B
• The left side of FD is known as a determinant, the right side of the
production is known as a dependent.
Functional Dependency
• If α and β are the two sets of attributes in a relational table R where-
α ⊆ R
β ⊆ R
Then, for a functional dependency to exist from α to β.
Functional Dependency
Employee number Employee Name Salary City
1 Dana 50000 San Francisco
2 Francis 38000 London
3 Andrew 25000 Tokyo

Example:
For the given table, if we know the value of Employee number, we can obtain Employee Name, city, salary, etc.
By this, we can say that the city, Employee Name, and salary are functionally depended on Employee number.
Functional Dependency
Example
• We have a <Department> table with two attributes − DeptId and DeptName.
• The DeptId is primary key.
DeptId DeptName

001 Finance

002 Marketing

003 HR

• Here, DeptId uniquely identifies the DeptName attribute. This is because if you want to know the
department name, then at first you need to have the DeptId.
• Above functional dependency between DeptId and DeptName can be determined as DeptId is functionally
dependent on DeptName −
DeptId -> DeptName
Types of Functional Dependency

•Trivial Functional Dependency


•Non-Trivial Functional Dependency
•Completely Non-Trivial Functional Dependency
Trivial functional dependency
A → B has trivial functional dependency if B is a subset of A.
OR
A functional dependency X → Y is said to be trivial if and only if Y ⊆ X.
• Thus, if RHS of a functional dependency is a subset of LHS, then it is called as a trivial functional dependency.

• The examples of trivial functional dependencies are-


• AB → A
• AB → B
• AB → AB

• The following dependencies are also trivial like: A → A, B → B


Consider a table with two columns Employee_Id and Employee_Name.
• {Employee_id, Employee_Name} → Employee_Id
is a trivial functional dependency as Employee_Id is a subset of {Employee_Id, Employee_Name}.

• Also, Employee_Id → Employee_Id and Employee_Name → Employee_Name are trivial dependencies too.
Trivial functional dependency
Example
• We have a <Department> table with two attributes − DeptId and
DeptName.
• The DeptId is primary key.

The following is a trivial functionnal dependency since DeptId is a subset of DeptId and DeptName

{ DeptId,  DeptName } -> Dept Id


Non Trivial functional dependency

• A → B has a non-trivial functional dependency if B is not a subset of A.


OR
A functional dependency X → Y is said to be non-trivial if and only if Y ⊄ X.
• Thus, if there exists at least one attribute in the RHS of a functional dependency that is not a part of LHS,
then it is called as a non-trivial functional dependency.

The examples of non-trivial functional dependencies are-


• AB → BC (PARTIAL)
• AB → CD (COMPLETE)

When A intersection B is NULL, then A → B is called as complete non-trivial.


ID   →    Name,  
Name   →    DOB  
Non Trivial functional dependency

• Functional dependency which also known as a nontrivial dependency occurs when


A->B holds true where B is not a subset of A. In a relationship, if attribute B is not
a subset of attribute A, then it is considered as a non-trivial dependency.
Company CEO Age
• Example: Microsoft Satya Nadella 51
Google Sundar Pichai 46
Apple Tim Cook 57

• (Company} -> {CEO} (if we know the Company, we knows the CEO name)

• But CEO is not a subset of Company, and hence it's non-trivial functional
dependency.
VALID/INVALID FD
X Y Z
1 4 2
1 5 3
1 6 3
1 2 2
VALID/INVALID FD
A B C
1 2 4
3 5 4
3 7 2
1 4 2
Key terms
Key Terms Description
Axioms is a set of inference rules used to infer all the
Axiom
functional dependencies on a relational database.
It is a rule that suggests if you have a table that appears
to contain two entities which are determined by the
Decomposition
same primary key then you should consider breaking
them up into two different tables.
It is displayed on the right side of the functional
Dependent
dependency diagram.
It is displayed on the left side of the functional
Determinant
dependency Diagram.
It suggests that if two tables are separate, and the PK is
Union
the same, you should consider putting them. together
Armstrong’s Axioms(Inference Rules) in
Functional Dependency
The term Armstrong axioms refer to the sound and complete set of
inference rules or axioms, introduced by William W. Armstrong, that is
used to test the logical implication of functional dependencies. If F is a
set of functional dependencies then the closure of F, denoted as F^+, is
the set of all functional dependencies logically implied by F.
Armstrong’s Axioms are a set of rules, that when applied repeatedly,
generates a closure of functional dependencies.
,

Armstrong’s Axioms(Inference Rules) in


Functional Dependency
PRIMARY RULES
Reflexivity-
• If B is a subset of A, then A → B always holds.
Augmentation-
• If A → B, then AC → BC always holds.
Transitivity-
• If A → B and B → C, then A → C always holds.
,

Armstrong’s Axioms(Inference Rules) in


Functional Dependency
SECONDARY RULES
Additive(Union)-
• If A → B and A → C, then A → BC always hold
Decomposition-
• If A → BC, then A → B and A → C always holds.
• Composition-
• If A → B and C → D, then AC → BD always holds.
Closure of an Attribute Set/Attribute
Closure
•The set of all those attributes which can be functionally determined from an attribute set is called
as a closure of that attribute set.
•Closure of attribute set {X} is denoted as {X}+

Steps to Find Closure of an Attribute Set-


 
Following steps are followed to find the closure of an attribute set-

Step-01:
 
Add the attributes contained in the attribute set for which closure is being calculated to the result set.
 
Step-02:
 
Recursively add the attributes to the result set which can be functionally determined from the attributes already
contained in the result set.
Closure of an Attribute Set/Attribute
Example-
Closure
• Consider a relation R ( A , B , C , D , E , F , G ) with the functional
dependencies-
A → BC, BC → DE, D → F, CF → G
Now, let us find the closure of some attributes and attribute sets
Closure of an Attribute Set/Attribute
• Example-
Closure
• Consider a relation R ( A , B , C , D , E , F , G ) with the functional
dependencies-
• A → BC, BC → DE, D → F, CF → G
Now, let us find the closure of some attributes and attribute sets
• Consider a relation R ( A , B , C , D , E , F , G ) with the functional
dependencies-
• A → BC, BC → DE, D → F, CF → G
• {B,C}+={B,C}
• ={B,C,D,E} using BC → DE
• ={B,C,D,E,F} using D → F
={B,C,D,E,F,G} using CF → G
GIVEN FOLLOWING FD’S
AB->CD
AF->D
DE->F
C->G
F->E

G->A
Which statement is false?
• [CF]+=[A,C,D,E,F,G]
• [BE]+=[A,B,C,D,E]
• [AF]+=[A,C,D,E,F,G]
• [AB]+=[A,C,D,F,G]
Rules for Functional Dependency-
Rule-01:
• A functional dependency X → Y will always hold if all the values of X are unique (different) irrespective of
the values of Y.
• Example-Consider the following table-

The following functional dependencies will always hold since all the values of attribute ‘A’ are unique-
• A → B
• A → BC
• A → CD
• A → BCD
• A → DE
• A → BCDE
In general, we can say following functional dependency will always hold-
A → Any combination of attributes A, B, C, D, E
Rules for Functional Dependency-
Rule-02:
• A functional dependency X → Y will always hold if all the values of Y are same irrespective of the values
of X.
• Example-
• Consider the following table-

The following functional dependencies will always hold since all the values of attribute ‘C’ are same-
• A → C
• AB → C
• ABDE → C
• DE → C
• AE → C
 
• In general, we can say following functional dependency will always hold true-
Any combination of attributes A, B, C, D, E → C
Different Types Of Keys in DBMS-

• The terms ‘relation’ and ‘table’ are used interchangeably.


• The terms ‘tuple’ and ‘record’ are used interchangeably.

Types of keys

• Super key
• Candidate key
• Primary key
• Alternate key
• Foreign key
Super Key
• A super key is a set of attributes that can identify each tuple uniquely in the given relation.
• A super key is not restricted to have any specific number of attributes.
• Thus, a super key may consist of any number of attributes.

Example- Consider the following Student schema-


• Student ( roll , name , age , address , class , section )
 
Given below are the examples of super keys since each set can uniquely identify each student in the Student table-

• ( roll , name , age , address , class , section )


• ( class , section , roll )
• (section , roll )
• ( name , address )

NOTE-
• All the attributes in a super key are definitely sufficient to identify each tuple uniquely in the given relation but
all of them may not be necessary.
Candidate Key
A minimal super key is called as a candidate key.
OR
A set of minimal attribute(s) that can identify each tuple uniquely in the given relation
is called as a candidate key.

Example- Consider the following Student schema-


Student ( roll , name , sex , age , address , class , section )
 
• Given below are the examples of candidate keys since each set consists of minimal
attributes required to identify each student uniquely in the Student table-
• ( class , section , roll )
• ( name , address )
Candidate Key
NOTES-

• All the attributes in a candidate key are sufficient as well as necessary to


identify each tuple uniquely.
• Removing any attribute from the candidate key fails in identifying each tuple
uniquely.
• The value of candidate key must always be unique.
• The value of candidate key can never be NULL.
• It is possible to have multiple candidate keys in a relation.
• Those attributes which appears in some candidate key are called as prime
attributes.
Primary Key
A primary key is a candidate key that the database designer selects while designing the database.
OR
Candidate key that the database designer implements is called as a primary key.
 
NOTES-
 
• The value of primary key can never be NULL.
• The value of primary key must always be unique.
• The values of primary key can never be changed i.e. no updation is possible.
• The value of primary key must be assigned when inserting a record.
• A relation is allowed  to have only one primary key.
Alternate Key

Candidate keys that are left unimplemented or unused after


implementing the primary key are called as alternate keys.
OR
Unimplemented candidate keys are called as alternate keys.
Foreign Key

• An attribute ‘X’ is called as a foreign key to some other attribute ‘Y’


when its values are dependent on the values of attribute ‘Y’.
• The attribute ‘X’ can assume only those values which are assumed by
the attribute ‘Y’.
• Here, the relation in which attribute ‘Y’ is present is called as the
referenced relation.
• The relation in which attribute ‘X’ is present is called as the referencing
relation.
• The attribute ‘Y’ might be present in the same table or in some other
table.
Foreign Key

• An attribute ‘X’ is called as a foreign key to some other attribute ‘Y’


when its values are dependent on the values of attribute ‘Y’.
• The attribute ‘X’ can assume only those values which are assumed by
the attribute ‘Y’.
• Here, the relation in which attribute ‘Y’ is present is called as the
referenced relation.
• The relation in which attribute ‘X’ is present is called as the referencing
relation.
• The attribute ‘Y’ might be present in the same table or in some other
table.
Foreign Key
Way of finding candidate key
Step-01:
• Determine all essential attributes of the given relation.
• Essential attributes are those attributes which are not present on RHS of any functional dependency.
• Essential attributes are always a part of every candidate key.
• This is because they can not be determined by other attributes.

Example
 
• Let R(A, B, C, D, E, F) be a relation scheme with the following functional dependencies-
• A → B, C → D, D → E
 
• Here, the attributes which are not present on RHS of any functional dependency are A, C and F.
• So, essential attributes are- A, C and F.
Way of finding candidate key
Step-02:
• The remaining attributes of the relation are non-essential attributes. This is because they can be determined by using essential
attributes. 
• Now, following two cases are possible-

Case-01:

• If all essential attributes together can determine all remaining non-essential attributes, then the combination of essential
attributes is the candidate key.
• It is the only possible candidate key.
 
Case-02:

• If all essential attributes together can not determine all remaining non-essential attributes, then-
• The set of essential attributes and some non-essential attributes will be the candidate key(s).
• In this case, multiple candidate keys are possible.
• To find the candidate keys, we check different combinations of essential and non-essential attributes.
Equivalence of Two Sets of Functional
Dependencies-
• Two different sets of functional dependencies for a given relation may
or may not be equivalent.
• If F and G are the two sets of functional dependencies, then following
4 cases are possible-
Case-01: F covers G (F ⊇ G) OR G is a subset of F
Case-02: G covers F (G ⊇ F)
Case-03: Both F and G cover each other (F = G)
Case-04: F and G does not cover each other (F ≠ G)
Case-01: Determining Whether F Covers
G-
• Following steps are followed to determine whether F covers G or not-
Step-01:
• Take the functional dependencies of set G into consideration.
• For each functional dependency X → Y, find the closure of X using the functional dependencies of set G.
 Step-02:
 
• Take the functional dependencies of set G into consideration.
• For each functional dependency X → Y, find the closure of X using the functional dependencies of set F.
 Step-03:
 
• Compare the results of Step-01 and Step-02.
• If the functional dependencies of set F has determined all those attributes that were determined by the
functional dependencies of set G, then it means F covers G.
• Thus, we conclude F covers G (F ⊇ G) otherwise not.
Case-02: Determining Whether G Covers
Step-01:
F-
• 
• Take the functional dependencies of set F into consideration.
• For each functional dependency X → Y, find the closure of X using the functional dependencies of set F.
 
Step-02:
 
• Take the functional dependencies of set F into consideration.
• For each functional dependency X → Y, find the closure of X using the functional dependencies of set G.
 
• Step-03:
 
• Compare the results of Step-01 and Step-02.
• If the functional dependencies of set G has determined all those attributes that were determined by the
functional dependencies of set F, then it means G covers F.
• Thus, we conclude G covers F (G ⊇ F) otherwise not.
Case-03: Determining Whether Both F and
G Cover Each Other-

• If F covers G and G covers F, then both F and G cover each other.


• Thus, if both the above cases hold true, we conclude both F and G
cover each other (F = G).
Canonical Cover in DBMS-

• A canonical cover is a simplified and reduced version of the given set of


functional dependencies.
• Since it is a reduced version, it is also called as Irreducible set.

Characteristics- 
• Canonical cover is free from all the extraneous functional dependencies.
• The closure of canonical cover is same as that of the given set of functional
dependencies.
• Canonical cover is not unique and may be more than one for a given set of
functional dependencies.
Canonical Cover in DBMS-

Need-

• Working with the set containing extraneous functional dependencies


increases the computation time.
• Therefore, the given set is reduced by eliminating the useless
functional dependencies.
• This reduces the computation time and working with the irreducible
set becomes easier.
Way of finding Canonical Cover
Step-01:Write the given set of functional dependencies in such a way that each functional
dependency contains exactly one attribute on its right side.
i.e. The functional dependency W → XZ will be written as-
W→ X , W → Z

Step-02: Consider each functional dependency one by one from the set obtained in Step-01.
• Determine whether it is essential or non-essential.
• To determine whether a functional dependency is essential or not, compute the closure of
its left side-
• Once by considering that the particular functional dependency is present in the set
• Once by considering that the particular functional dependency is not present in the set

• Then following two cases are possible-


Way of finding Canonical Cover
Case-01: Results Come Out to be Same-
• If results come out to be same, It means that the presence or absence of that
functional dependency does not create any difference.
• Thus, it is non-essential. Eliminate that functional dependency from the set.

Case-02: Results Come Out to be Different-


 
• If results come out to be different, It means that the presence or absence of
that functional dependency creates a difference.
• Thus, it is essential. Do not eliminate that functional dependency from the
set.
Way of finding Canonical Cover
• Step-03:
• Consider the newly obtained set of functional dependencies after performing Step-02.
• Check if there is any functional dependency that contains more than one attribute on its left side. Then following two
cases are possible-

• Case-01: No-  There exists no functional dependency containing more than one attribute on its left side. In this case, the
set obtained in Step-02 is the canonical cover.
• Case-01: Yes-

• There exists at least one functional dependency containing more than one attribute on its left side. In this case, consider
all such functional dependencies one by one.
• Check if their left side can be reduced.
 
Use the following steps to perform a check-
• Consider a functional dependency. Compute the closure of all the possible subsets of the left side of that functional
dependency.
• If any of the subsets produce the same closure result as produced by the entire left side, then replace the left side with
that subset. After this step is complete, the set obtained is the canonical cover.
Normalization in DBMS
In DBMS, database normalization is a process of making the database consistent by-
•Reducing the redundancies
•Ensuring the integrity of data through lossless decomposition
• Normalization is done through normal forms.

Normal Forms-

• The standard normal forms used are-

• First Normal Form (1NF)


• Second Normal Form (2NF)
• Third Normal Form (3NF)
• Boyce-Codd Normal Form (BCNF)

There exists several other normal forms even after BCNF


First Normal Form (1NF)
First Normal Form-

A given relation is called in First Normal Form (1NF) if each cell of the table contains only an atomic value.

OR

A given relation is called in First Normal Form (1NF) if the attribute of every tuple is either single valued or a
null value.

Example-

The following relation is not in 1NF-

Student_id Name Subjects


100 Akshay Computer Networks, Designing
101 Aman Database Management System
102 Anjali Automata, Compiler Design
However,
First Normal Form (1NF)
This relation can be brought into 1NF.
This can be done by rewriting the relation such that each cell of the table contains only one value.

Student_id Name Subjects


100 Akshay Computer Networks
100 Akshay Designing
101 Aman Database Management System
102 Anjali Automata
102 Anjali Compiler Design

Relation is in 1NF

This relation is in First Normal Form (1NF).

NOTE-
 
By default, every relation is in 1NF.
This is because formal definition of a relation states that value of all the attributes must be atomic.
Second Normal Form (2NF)
A given relation is called in Second Normal Form (2NF) if and only if-
• Relation already exists in 1NF.
• No partial dependency exists in the relation.

Partial Dependency
A partial dependency is a dependency where few attributes of the candidate key determines non-prime
attribute(s). OR
A partial dependency is a dependency where a portion of the candidate key or incomplete candidate key
determines non-prime attribute(s).

In other words,

• A → B is called a partial dependency if and only if-


• A is a subset of some candidate key
• B is a non-prime attribute.

• If any one condition fails, then it will not be a partial dependency.


• NOTE-
• To avoid partial dependency, incomplete candidate key must not determine any non-prime attribute.
• However, incomplete candidate key can determine prime attributes.
Second Normal Form (2NF)
Example-
 
Consider a relation- R ( V , W , X , Y , Z ) with functional dependencies-
VW → XY
Y→V
WX → YZ
 
The possible candidate keys for this relation are-
VW , WX , WY
 
From here,
Prime attributes = { V , W , X , Y }
Non-prime attributes = { Z }
 
Now, if we observe the given dependencies-
There is no partial dependency.
This is because there exists no dependency where incomplete candidate key determines any non-prime
attribute.
 
Thus, we conclude that the given relation is in 2NF.
Third Normal Form (3NF)
A given relation is called in Third Normal Form (3NF) if and only if-
Relation already exists in 2NF.
No transitive dependency exists for non-prime attributes.

Transitive Dependency
 
A → B is called a transitive dependency if and only if-
A is not a super key.
B is a non-prime attribute.
If any one condition fails, then it is not a transitive dependency.
 
NOTE-
 
Transitive dependency must not exist for non-prime attributes.
However, transitive dependency can exist for prime attributes.
Third Normal Form (3NF)
A relation is called in Third Normal Form (3NF) if and only if-
Any one condition holds for each non-trivial functional dependency A → B
A is a super key
B is a prime attribute
 
Example-
 
Consider a relation- R ( A , B , C , D , E ) with functional dependencies-
A → BC
CD → E
B→D
E→A
 The possible candidate keys for this relation are-
A , E , CD , BC
 From here,
Prime attributes = { A , B , C , D , E }
There are no non-prime attributes
 Now,
It is clear that there are no non-prime attributes in the relation.
In other words, all the attributes of relation are prime attributes. Thus, all the attributes on RHS of each
BCNF
A given relation is called in BCNF if and only if-
Relation already exists in 3NF.
For each non-trivial functional dependency A → B, A is a super key of the relation.
Thus, we conclude that the given relation is in 3NF.

Consider a relation- R ( A , B , C ) with the functional dependencies-


A→B
B→C
C→A
 
The possible candidate keys for this relation are-

A,B,C
 
Now, we can observe that LHS of each given functional dependency is a candidate key.
Thus, we conclude that the given relation is in BCNF.
 Fourth Normal Form (4NF) 
Any relation is said to be in the fourth normal form when it satisfies the
following conditions:
• It must be in Boyce Codd Normal Form (BCNF).
• It should have no multi-valued dependency.

A multi-valued dependency is said to occur when there are two


attributes in a table which depend on a third attribute but are
independent of each other.

In order to denote a multi-valued dependency, “->->” this sign is used.


Multivalued Dependency

• Multivalued dependency occurs when two attributes in a table are


independent of each other but, both depend on a third attribute.

• A multivalued dependency consists of at least two attributes that are


dependent on a third attribute that's why it always requires at least
three attributes.

• Example: Suppose there is a bike manufacturer company which


produces two colors(white and black) of each model every year.
Multivalued Dependency
• Example: Suppose there is a bike manufacturer company which produces two colors(white
and black) of each model every year.

• Here columns COLOR and MANUF_YEAR are dependent on BIKE_MODEL and independent of
each other.
• In this case, these two columns can be called as multivalued dependent on BIKE_MODEL.
These can be shown as
• BIKE_MODEL   →  →  MANUF_YEAR  
• BIKE_MODEL   →  →  COLOR  
Fifth Normal Form (5NF)
Any relation in order to be in the fifth normal form must satisfy the
following conditions:
• It must be in Fourth Normal Form (4NF).
• It should have no join dependency(All dependency must be
preserve) and also the joining must be lossless.

In the fifth normal form the relation must be decomposed in as many


sub-relations as possible so as to avoid any kind of redundancy and
there must be no extra tuples generated when the sub-relations are
combined together by using natural join.
Decomposition of a Relation-

• The process of breaking up or dividing a single relation into two or more sub relations is
called as decomposition of a relation.
• Properties of Decomposition-
 1. Lossless decomposition-Lossless decomposition ensures-
• No information is lost from the original relation during decomposition.
• When the sub relations are joined back, the same relation is obtained that was decomposed.
• Every decomposition must always be lossless.

• 2. Dependency Preservation-Dependency preservation ensures-


• None of the functional dependencies that holds on the original relation are lost.
• The sub relations still hold or satisfy the functional dependencies of the original relation.
Types of Decomposition-
Types of Decomposition-

• 1. Lossless Join Decomposition-


• Consider there is a relation R which is decomposed into sub
relations R1 , R2 , …. , Rn.
• This decomposition is called lossless join decomposition when the
join of the sub relations results in the same relation R that was
decomposed.
• For lossless join decomposition, we always have-

R1 ⋈ R2 ⋈ R3 ……. ⋈ Rn = R 
where ⋈ is a natural join operator
Types of Decomposition-
Example-
 
Consider the following relation R( A , B , C )-
 
R( A , B , C )
 
Consider this relation is decomposed into two sub relations R1( A , B ) and R2( B , C )-

A B C
1 2 1
2 5 3
3 3 3
Types of Decomposition-
The two sub relations are-
 
R1( A , B )
 
R2( B , C )
 
Now, let us check whether this decomposition is lossless or not.
For lossless decomposition, we must have-
R1 ⋈ R2 = R
 
Now, if we perform the natural join ( ⋈ ) of the sub relations R1 and R2 , we get-
 
 
This relation is same as the original relation R.
Thus, we conclude that the above decomposition is lossless join decomposi
A B B C A B C
1 2 2 1 1 2 1
2 5 5 3 2 5 3
3 3 3 3 3 3 3
2. Lossy Join Decomposition-
 
•Consider there is a relation R which is decomposed into sub relations R1 , R2 , …. , Rn.
•This decomposition is called lossy join decomposition when the join of the sub relations does
not result in the same relation R that was decomposed.
•The natural join of the sub relations is always found to have some extraneous tuples.
•For lossy join decomposition, we always have-
 
where ⋈ is a natural join operator 1 R ⋈ R  ⋈ R ……. ⋈ R ⊃ R 
2 3 n

 
Determining Whether Decomposition Is
Lossless Or Lossy
• Consider a relation R is decomposed into two sub relations R1 and R2.
• Then,If all the following conditions satisfy, then the decomposition is lossless. If any of these
conditions fail, then the decomposition is lossy.
Condition-01: Union of both the sub relations must contain all the attributes that are present in
the original relation R.
R1 ∪ R2 = R
Condition-02: Intersection of both the sub relations must not be null.
• In other words, there must be some common attribute which is present in both the sub
relations.
R1 ∩ R2 ≠ ∅
Condition-03: Intersection of both the sub relations must be a super key of either R1 or R2 or
both.
R1 ∩ R2 = Super key of R1 or R2
Dependency Preserving Decomposition
• If we decompose a relation R into relations R1 and R2, All
dependencies of R either must be a part of R1 or R2 or must be
derivable from combination of FD’s of R1 and R2.

• For Example, A relation R (A, B, C, D) with FD set{A->BC} is


decomposed into R1(ABC) and R2(AD) which is dependency
preserving because FD A->BC is a part of R1(ABC).
Dependency Preserving Decomposition
Consider a schema R(A,B,C,D) and functional dependencies A->B and
C->D which is decomposed into R1(AB) and R2(CD)
This decomposition is dependency preserving decompostion because
• A->B can be ensured in R1(AB)
• C->D can be ensured in R2(CD)

Is it lossy or lossless decomposition?


DBMS - Transaction

• A transaction can be defined as a group of tasks. A single task is the


minimum processing unit which cannot be divided further.
• A transaction is a single logical unit of work which accesses and
possibly modifies the contents of a database. Transactions access data
using read and write operations.
• “Transaction is a set of operations which are all logically related.”
Operations in Transaction-

• 1. Read Operation-
• Read operation reads the data from the database and then stores it in the buffer in
main memory.
• For example- Read(A) instruction will read the value of A from the database and will
store it in the buffer in main memory. 

• 2. Write Operation-

• Write operation writes the updated data value back to the database from the buffer.
• For example- Write(A) will write the updated value of A from the buffer to the
database.
Transaction Concept

• A transaction is a unit of program execution that accesses and possibly


updates various data items.
• E.g., transaction to transfer $50 from account A to account B:
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)

• Two main issues to deal with:


• Failures of various kinds, such as hardware failures and system crashes
• Concurrent execution of multiple transactions
Transaction State
• Active – the initial state; the transaction stays in this state while it
is executing
• Partially committed – after the final statement has been executed.
• Failed -- after the discovery that normal execution can no longer
proceed.
• Aborted – after the transaction has been rolled back and the
database restored to its state prior to the start of the transaction.
Two options after it has been aborted:
• Restart the transaction
• can be done only if no internal logical error
• Kill the transaction
• Committed – after successful completion.
Transaction State (Cont.)
ACID Properties

• Atomicity. Either all operations of the transaction are properly reflected in


the database or none are.
• Consistency. Execution of a transaction in isolation preserves the
consistency of the database.
• Isolation. Although multiple transactions may execute concurrently, each
transaction must be unaware of other concurrently executing transactions.
Intermediate transaction results must be hidden from other concurrently
executed transactions.
• That is, for every pair of transactions Ti and Tj, it appears to Ti that either Tj, finished
execution before Ti started, or Tj started execution after Ti finished.
• Durability. After a transaction completes successfully, the changes it has
made to the database persist, even if there are system failures.
Required Properties of a Transaction
• Consider a transaction to transfer $50 from account A to account B:
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
• Atomicity requirement
• If the transaction fails after step 3 and before step 6, money will be “lost” leading to an inconsistent
database state
• Failure could be due to software or hardware
• The system should ensure that updates of a partially executed transaction are not reflected in the
database
• Durability requirement — once the user has been notified that the transaction has completed (i.e., the
transfer of the $50 has taken place), the updates to the database by the transaction must persist even if
there are software or hardware failures.
Required Properties of a Transaction (Cont.)

• Consistency requirement in above example:


• The sum of A and B is unchanged by the execution of the transaction
• In general, consistency requirements include
• Explicitly specified integrity constraints such as primary keys and foreign keys
• Implicit integrity constraints
• e.g., sum of balances of all accounts, minus sum of loan amounts must equal value
of cash-in-hand
• A transaction, when starting to execute, must see a consistent database.
• During transaction execution the database may be temporarily inconsistent.
• When the transaction completes successfully the database must be consistent
• Erroneous transaction logic can lead to inconsistency
Required Properties of a Transaction (Cont.)

• Isolation requirement — if between steps 3 and 6 (of the fund transfer transaction) , another
transaction T2 is allowed to access the partially updated database, it will see an inconsistent
database (the sum A + B will be less than it should be).

T1 T2
1. read(A)
2. A := A – 50
3. write(A)
read(A), read(B), print(A+B)
4. read(B)
5. B := B + 50
6. write(B
• Isolation can be ensured trivially by running transactions serially
• That is, one after the other.
• However, executing multiple transactions concurrently has significant benefits, as we will see later.
Concurrency Problems in DBMS-

When multiple transactions execute concurrently in an uncontrolled or


unrestricted manner, then it might lead to several problems.Such
problems are called as concurrency problems.
The concurrency problems are-
• Dirty Read Problem
• Unrepeatable Read Problem
• Lost Update Problem
• Phantom Read Problem
Dirty Read Problem
This read is called as dirty read because-
• There is always a chance that the uncommitted transaction might roll back later.Thus,
uncommitted transaction might make other transactions read a value that does not even
exist.This leads to inconsistency of the database.
• Dirty read does not lead to inconsistency always. It becomes problematic only when the
uncommitted transaction fails and roll backs later due to some reason.

Here,
1.T1 reads the value of A.
2.T1 updates the value of A in the buffer.
3.T2 reads the value of A from the buffer.
4.T2 writes the updated the value of A.
5.T2 commits.
6.T1 fails in later stages and rolls back.
Unrepeatable Read Problem
This problem occurs when a transaction gets to read unrepeated i.e.
different values of the same variable in its different read operations
even when it has not updated its value.
Here,
1.T1 reads the value of X (= 10 say).
2.T2 reads the value of X (= 10).
3.T1 updates the value of X (from 10 to 15 say) in the buffer.
4.T2 again reads the value of X (but = 15).
 
In this example,
•T2 gets to read a different value of X in its second reading.
•T2 wonders how the value of X got changed because according
to it, it is running in isolation.
 Lost Update Problem-
This problem occurs when multiple transactions execute concurrently
and updates from one or more transactions get lost.
Here,
1.T1 reads the value of A (= 10 say).
2.T2 updates the value to A (= 15 say) in the buffer.
3.T2 does blind write A = 25 (write without read) in the buffer.
4.T2 commits.
5.When T1 commits, it writes A = 25 in the database.
 
In this example,
•T1 writes the over written value of X in the database.
•Thus, update from T1 gets lost.
 
NOTE-
 
•This problem occurs whenever there is a write-write conflict.
•In write-write conflict, there are two writes one by each transaction
on the same data item without any read in the middle.
Phantom Read Problem
This problem occurs when a transaction reads some variable from the
buffer and when it reads the same variable later, it finds that the
variable does not exist.
Here,
1.T1 reads X.
2.T2 reads X.
3.T1 deletes X.
4.T2 tries reading X but does not find it.
 
In this example,
•T2 finds that there does not exist any variable X when it tries
reading X again.
•T2 wonders who deleted the variable X because according Avoiding
to Concurrency Problems-
it, it is running in isolation.  
•To ensure consistency of the database, it is very
important to prevent the occurrence of above problems.
•Concurrency Control Protocols and Schedulling help
to prevent the occurrence of above problems and maintain
the consistency of the database.
DBMS Schedule
A series of operation from one transaction to another transaction is
known as schedule. It is used to preserve the order of the operation in
each of the individual transaction.
Serial Schedule
• The serial schedule is a type of schedule where one transaction is executed
completely before starting another transaction. In the serial schedule, when the
first transaction completes its cycle, then the next transaction is executed. Serial
schedules are always-
• Consistent
• Recoverable

• For example: Suppose there are two transactions T1 and T2 which have some
operations. If it has no interleaving of operations, then there are the following
two possible outcomes:
• Execute all the operations of T2 which was followed by all the operations of T1.
• Execute all the operations of T1 which was followed by all the operations of T2.
Serial Schedule
Non-serial Schedule

• If interleaving of operations is allowed, then there will be non-serial


schedule.It contains many possible orders in which the system can
execute the individual operations of the transactions. Non-serial
schedules are NOT always-
• Consistent
• Recoverable
Serializability in DBMS-
•  Some non-serial schedules may lead to inconsistency of the
database. Serializability is a concept that helps to identify which non-
serial schedules are correct and will maintain the consistency of the
database.
• Serial Schedules Vs Serializable Schedules-
Serializable schedule

• The serializability of schedules is used to find non-serial schedules


that allow the transaction to execute concurrently without interfering
with one another.
• It identifies which schedules are correct when executions of the
transaction have interleaving of their operations.
• A non-serial schedule will be serializable if its result is equal to the
result of its transactions executed serially.
Testing of Serializability
• Serialization Graph is used to test the Serializability of a schedule.
• Assume a schedule S. For S, we construct a graph known as precedence graph. This graph has a
pair G = (V, E), where V consists a set of vertices, and E consists a set of edges. The set of vertices
is used to contain all the transactions participating in the schedule. The set of edges is used to
contain all edges Ti ->Tj for which one of the three conditions holds:
• Create a node Ti → Tj if Ti executes write (Q) before Tj executes read (Q).
• Create a node Ti → Tj if Ti executes read (Q) before Tj executes write (Q).
• Create a node Ti → Tj if Ti executes write (Q) before Tj executes write (Q).

• If a precedence graph contains a single edge Ti → Tj, then all the instructions of Ti are executed
before the first instruction of Tj is executed.
• If a precedence graph for schedule S contains a cycle, then S is non-serializable. If the precedence
graph has no cycle, then S is known as serializable.
Serializability in DBMS-

Conflict Serializability-
 
If a given non-serial schedule can be converted into a serial schedule by swapping its non-conflicting operations,
then it is called as a conflict serializable schedule.
 
Conflicting Operations-
 
Two operations are called as conflicting operations if all the following conditions hold true for them-
•Both the operations belong to different transactions
•Both the operations are on the same data item
•At least one of the two operations is a write operation
Problem on Serializability
Check whether the given schedule S is conflict serializable or not-
S : R1(A) , R2(A) , R1(B) , R2(B) , R3(B) , W1(A) , W2(B)
Step-01:
• List all the conflicting operations and determine the dependency between the transactions-
• R2(A) , W1(A)              (T2 → T1)
• R1(B) , W2(B)              (T1 → T2)
• R3(B) , W2(B)              (T3 → T2)

• Clearly, there exists a cycle in the precedence graph.


• Therefore, the given schedule S is not conflict serializable.
Problem on Serializability

List all the conflicting operations and


determine the dependency between
the transactions-
•R2(X) , W3(X)              (T2 → T3)
•R2(X) , W1(X)              (T2 → T1)
•W3(X) , W1(X)             (T3 → T1)
•W3(X) , R4(X)              (T3 → T4)
•W1(X) , R4(X)              (T1 → T4)
•W2(Y) , R4(Y)              (T2 → T4)
List all the conflicting operations and
determine the dependency between
the transactions-
•R2(X) , W3(X)              (T2 → T3)
•R2(X) , W1(X)              (T2 → T1)
•W3(X) , W1(X)             (T3 → T1)
•W3(X) , R4(X)              (T3 → T4)
•W1(X) , R4(X)              (T1 → T4)
•W2(Y) , R4(Y)              (T2 → T4)

• Checking Whether S is Recoverable Or Not-


• 
• Conflict serializable schedules are always recoverable.
• Therefore, the given schedule S is recoverable.
DBMS Concurrency Control
• Concurrency Control in Database Management System is a procedure
of managing simultaneous operations without conflicting with each
other. It ensures that Database transactions are performed
concurrently and accurately to produce correct results without
violating data integrity of the respective Database.

• Concurrent access is quite easy if all users are just reading data. There
is no way they can interfere with one another. Though for any
practical Database, it would have a mix of READ and WRITE
operations and hence the concurrency is a challenge.
Concurrency Control Techniques
• Concurrency control protocols ensure atomicity, isolation, and
serializability of concurrent transactions. The concurrency control
protocol can be divided into three categories:
• Lock based protocol
• Time-stamp based protocol
• Deadlock handling
• Recovery
• Log based recovery
• Shadow paging
Lock-Based Protocol

• Lock Based Protocols in DBMS is a mechanism in which a transaction


cannot Read or Write the data until it acquires an appropriate lock.
Lock based protocols help to eliminate the concurrency problem in
DBMS for simultaneous transactions by locking or isolating a
particular transaction to a single user.
• A lock is a data variable which is associated with a data item. This lock
signifies that operations that can be performed on the data item.
Locks in DBMS help synchronize access to the database items by
concurrent transactions.
Lock-Based Protocol
There are two types of lock:
• 1. Shared Lock (S):
• A shared lock is also called a Read-only lock. With the shared lock, the data item can be shared
between transactions. This is because you will never have permission to update data on the data
item.
• For example, consider a case where two transactions are reading the account balance of a
person. The database will let them read by placing a shared lock. However, if another transaction
wants to update that account's balance, shared lock prevent it until the reading process is over.
• 2. Exclusive Lock (X):
• With the Exclusive Lock, a data item can be read as well as written. This is exclusive and can't be
held concurrently on the same data item. X-lock is requested using lock-x instruction.
Transactions may unlock the data item after finishing the 'write' operation.
• For example, when a transaction needs to update the account balance of a person. You can
allows this transaction by placing X lock on it. Therefore, when the second transaction wants to
read or write, exclusive lock prevent this operation.
Timestamp-based Protocols

• Timestamp based Protocol in DBMS is an algorithm which uses


the System Time or Logical Counter as a timestamp to serialize the
execution of concurrent transactions. The Timestamp-based protocol
ensures that every conflicting read and write operations are executed
in a timestamp order.
• The older transaction is always given priority in this method. It uses
system time to determine the time stamp of the transaction. This is
the most commonly used concurrency protocol.
Deadlock handling in DBMS

• A deadlock is a condition where two or more transactions are waiting


indefinitely for one another to give up locks. Deadlock is said to be
one of the most feared complications in DBMS as no task ever gets
finished and is in waiting state forever.
Deadlock in DBMS

• For example: In the student table, transaction T1 holds a lock on


some rows and needs to update some rows in the grade table.
Simultaneously, transaction T2 holds locks on some rows in the grade
table and needs to update the rows in the Student table held by
Transaction T1
Deadlock Avoidance

• When a database is stuck in a deadlock state, then it is better to avoid


the database rather than aborting or restating the database. This is a
waste of time and resource.
• Deadlock avoidance mechanism is used to detect any deadlock
situation in advance. A method like "wait for graph" is used for
detecting the deadlock situation but this method is suitable only for
the smaller database. For the larger database, deadlock prevention
method can be used.
Wait for graph

• The wait for graph shows the relationship between the resources and
transactions. If a transaction requests a resource or if it already holds
a resource, it is visible as an edge on the wait for graph. If the wait for
graph contains a cycle, then there may be a deadlock in the system,
otherwise not
Deadlock Prevention

• It is imperative to prevent a deadlock before it can occur. So, the


system rigorously checks each transaction before it is executed to
make sure it does not lead to deadlock. If there is even a chance that
a transaction may lead to deadlock, it is never allowed to execute.
• There are some deadlock prevention schemes that use timestamps in
order to make sure that deadlock does not occur.
• Deadlock prevention method is suitable for a large database. If the
resources are allocated in such a way that deadlock never occurs,
then the deadlock can be prevented.
Introduction to Log-Based Recovery
• Log-based recovery in DBMS provides the ability to maintain or recover data
in case of system failure. DBMS keeps a record of every transaction on some
stable storage device to provide easy access to data when the system fails.
A log file will be created for every operation performed on the database at
that point.
• Log and log records – The log is a sequence of log records, recording all the
update activities in the database. 
• If the user performs an operation on the database, it will be recorded in the
log.
• However, the process of saving logs must be completed before the actual
transaction is applied to the database.
Log-Based Recovery
The following fields are found in an update log record(represented as <Ti, Xj, V1, V2>):
Transaction identifier: The transaction identifier is the unique identification of the
transaction that performed the write operation.
Data item: The data item's unique identifier.
Old value: The value of a data item before the write operation.
New value: The value of a data item after the write operation.

Other kinds of log records include:


• <Ti start>: This variable holds information about when a transaction Ti begins.
• <Ti commit>: This variable includes information about when a transaction Ti commits.
• <Ti abort>: It includes information about when a transaction  Ti aborts.
Log-Based Recovery
Undo and Redo Operations – Because all database modifications must be preceded by creation of log record, the system has
available both the old value prior to the modification of the data item and new value that is to be written for data item. This
allows system to perform redo and undo operations as appropriate:
Undo: using a log record sets the data item specified in log record to old value.
Redo: using a log record sets the data item specified in log record to new value.
The database can be modified using two approaches –
•Deferred Modification Technique: If the transaction does not modify the database until it has partially committed, it is said
to use deferred modification technique.
•Immediate Modification Technique: If database modification occur while the transaction is still active, it is said to use
immediate modification technique.

Recovery using Log records – After a system crash has occurred, the system consults the log to determine which transactions
need to be redone and which need to be undone.
•Transaction Ti needs to be undone if the log contains the record <Ti start> but does not contain either the record <Ti
commit> or the record <Ti abort>.
•Transaction Ti needs to be redone if log contains record <Ti start> and either the record <Ti commit> or the record <Ti
abort>.

Recovery can also be done using Checkpoint based technique.


Introduction of Shadow Paging
Shadow paging is one of the techniques that is used to recover from failure. We all know that
recovery means to get back the information, which is lost. It helps to maintain database
consistency in case of failure.
Concept of shadow paging: Concept of shadow paging step by step −

• Step 1 − Page is a segment of memory. Page table is an index of pages. Each table entry points
to a page on the disk.
• Step 2 − Two page tables are used during the life of a transaction: the current page table and
the shadow page table. Shadow page table is a copy of the current page table.
• Step 3 − When a transaction starts, both the tables look identical, the current table is updated
for each write operation.
• Step 4 − The shadow page is never changed during the life of the transaction.
• Step 5 − When the current transaction is committed, the shadow page entry becomes a copy of
the current page table entry. After transaction, both tables become identical.
• Step 6 − The shadow page table is stored in non-volatile memory. If the system crash occurs,
then the shadow page table is copied to the current page table.
Advantages & Disadvantages
of Shadow Paging
Advantages
• The advantages of shadow paging are as follows −
• No need for log records.
• No undo/ Redo algorithm.
• Recovery is faster.

Disadvantages
• The disadvantages of shadow paging are as follows −
• Data is fragmented or scattered.
• Garbage collection problem. Database pages containing old versions of modified
data need to be garbage collected after every transaction.
• Concurrent transactions are difficult to execute.

You might also like