Professional Documents
Culture Documents
Example:
For the given table, if we know the value of Employee number, we can obtain Employee Name, city, salary, etc.
By this, we can say that the city, Employee Name, and salary are functionally depended on Employee number.
Functional Dependency
Example
• We have a <Department> table with two attributes − DeptId and DeptName.
• The DeptId is primary key.
DeptId DeptName
001 Finance
002 Marketing
003 HR
• Here, DeptId uniquely identifies the DeptName attribute. This is because if you want to know the
department name, then at first you need to have the DeptId.
• Above functional dependency between DeptId and DeptName can be determined as DeptId is functionally
dependent on DeptName −
DeptId -> DeptName
Types of Functional Dependency
• Also, Employee_Id → Employee_Id and Employee_Name → Employee_Name are trivial dependencies too.
Trivial functional dependency
Example
• We have a <Department> table with two attributes − DeptId and
DeptName.
• The DeptId is primary key.
The following is a trivial functionnal dependency since DeptId is a subset of DeptId and DeptName
• (Company} -> {CEO} (if we know the Company, we knows the CEO name)
• But CEO is not a subset of Company, and hence it's non-trivial functional
dependency.
VALID/INVALID FD
X Y Z
1 4 2
1 5 3
1 6 3
1 2 2
VALID/INVALID FD
A B C
1 2 4
3 5 4
3 7 2
1 4 2
Key terms
Key Terms Description
Axioms is a set of inference rules used to infer all the
Axiom
functional dependencies on a relational database.
It is a rule that suggests if you have a table that appears
to contain two entities which are determined by the
Decomposition
same primary key then you should consider breaking
them up into two different tables.
It is displayed on the right side of the functional
Dependent
dependency diagram.
It is displayed on the left side of the functional
Determinant
dependency Diagram.
It suggests that if two tables are separate, and the PK is
Union
the same, you should consider putting them. together
Armstrong’s Axioms(Inference Rules) in
Functional Dependency
The term Armstrong axioms refer to the sound and complete set of
inference rules or axioms, introduced by William W. Armstrong, that is
used to test the logical implication of functional dependencies. If F is a
set of functional dependencies then the closure of F, denoted as F^+, is
the set of all functional dependencies logically implied by F.
Armstrong’s Axioms are a set of rules, that when applied repeatedly,
generates a closure of functional dependencies.
,
Step-01:
Add the attributes contained in the attribute set for which closure is being calculated to the result set.
Step-02:
Recursively add the attributes to the result set which can be functionally determined from the attributes already
contained in the result set.
Closure of an Attribute Set/Attribute
Example-
Closure
• Consider a relation R ( A , B , C , D , E , F , G ) with the functional
dependencies-
A → BC, BC → DE, D → F, CF → G
Now, let us find the closure of some attributes and attribute sets
Closure of an Attribute Set/Attribute
• Example-
Closure
• Consider a relation R ( A , B , C , D , E , F , G ) with the functional
dependencies-
• A → BC, BC → DE, D → F, CF → G
Now, let us find the closure of some attributes and attribute sets
• Consider a relation R ( A , B , C , D , E , F , G ) with the functional
dependencies-
• A → BC, BC → DE, D → F, CF → G
• {B,C}+={B,C}
• ={B,C,D,E} using BC → DE
• ={B,C,D,E,F} using D → F
={B,C,D,E,F,G} using CF → G
GIVEN FOLLOWING FD’S
AB->CD
AF->D
DE->F
C->G
F->E
G->A
Which statement is false?
• [CF]+=[A,C,D,E,F,G]
• [BE]+=[A,B,C,D,E]
• [AF]+=[A,C,D,E,F,G]
• [AB]+=[A,C,D,F,G]
Rules for Functional Dependency-
Rule-01:
• A functional dependency X → Y will always hold if all the values of X are unique (different) irrespective of
the values of Y.
• Example-Consider the following table-
The following functional dependencies will always hold since all the values of attribute ‘A’ are unique-
• A → B
• A → BC
• A → CD
• A → BCD
• A → DE
• A → BCDE
In general, we can say following functional dependency will always hold-
A → Any combination of attributes A, B, C, D, E
Rules for Functional Dependency-
Rule-02:
• A functional dependency X → Y will always hold if all the values of Y are same irrespective of the values
of X.
• Example-
• Consider the following table-
The following functional dependencies will always hold since all the values of attribute ‘C’ are same-
• A → C
• AB → C
• ABDE → C
• DE → C
• AE → C
• In general, we can say following functional dependency will always hold true-
Any combination of attributes A, B, C, D, E → C
Different Types Of Keys in DBMS-
Types of keys
• Super key
• Candidate key
• Primary key
• Alternate key
• Foreign key
Super Key
• A super key is a set of attributes that can identify each tuple uniquely in the given relation.
• A super key is not restricted to have any specific number of attributes.
• Thus, a super key may consist of any number of attributes.
NOTE-
• All the attributes in a super key are definitely sufficient to identify each tuple uniquely in the given relation but
all of them may not be necessary.
Candidate Key
A minimal super key is called as a candidate key.
OR
A set of minimal attribute(s) that can identify each tuple uniquely in the given relation
is called as a candidate key.
Example
• Let R(A, B, C, D, E, F) be a relation scheme with the following functional dependencies-
• A → B, C → D, D → E
• Here, the attributes which are not present on RHS of any functional dependency are A, C and F.
• So, essential attributes are- A, C and F.
Way of finding candidate key
Step-02:
• The remaining attributes of the relation are non-essential attributes. This is because they can be determined by using essential
attributes.
• Now, following two cases are possible-
Case-01:
• If all essential attributes together can determine all remaining non-essential attributes, then the combination of essential
attributes is the candidate key.
• It is the only possible candidate key.
Case-02:
• If all essential attributes together can not determine all remaining non-essential attributes, then-
• The set of essential attributes and some non-essential attributes will be the candidate key(s).
• In this case, multiple candidate keys are possible.
• To find the candidate keys, we check different combinations of essential and non-essential attributes.
Equivalence of Two Sets of Functional
Dependencies-
• Two different sets of functional dependencies for a given relation may
or may not be equivalent.
• If F and G are the two sets of functional dependencies, then following
4 cases are possible-
Case-01: F covers G (F ⊇ G) OR G is a subset of F
Case-02: G covers F (G ⊇ F)
Case-03: Both F and G cover each other (F = G)
Case-04: F and G does not cover each other (F ≠ G)
Case-01: Determining Whether F Covers
G-
• Following steps are followed to determine whether F covers G or not-
Step-01:
• Take the functional dependencies of set G into consideration.
• For each functional dependency X → Y, find the closure of X using the functional dependencies of set G.
Step-02:
• Take the functional dependencies of set G into consideration.
• For each functional dependency X → Y, find the closure of X using the functional dependencies of set F.
Step-03:
• Compare the results of Step-01 and Step-02.
• If the functional dependencies of set F has determined all those attributes that were determined by the
functional dependencies of set G, then it means F covers G.
• Thus, we conclude F covers G (F ⊇ G) otherwise not.
Case-02: Determining Whether G Covers
Step-01:
F-
•
• Take the functional dependencies of set F into consideration.
• For each functional dependency X → Y, find the closure of X using the functional dependencies of set F.
Step-02:
• Take the functional dependencies of set F into consideration.
• For each functional dependency X → Y, find the closure of X using the functional dependencies of set G.
• Step-03:
• Compare the results of Step-01 and Step-02.
• If the functional dependencies of set G has determined all those attributes that were determined by the
functional dependencies of set F, then it means G covers F.
• Thus, we conclude G covers F (G ⊇ F) otherwise not.
Case-03: Determining Whether Both F and
G Cover Each Other-
Characteristics-
• Canonical cover is free from all the extraneous functional dependencies.
• The closure of canonical cover is same as that of the given set of functional
dependencies.
• Canonical cover is not unique and may be more than one for a given set of
functional dependencies.
Canonical Cover in DBMS-
Need-
Step-02: Consider each functional dependency one by one from the set obtained in Step-01.
• Determine whether it is essential or non-essential.
• To determine whether a functional dependency is essential or not, compute the closure of
its left side-
• Once by considering that the particular functional dependency is present in the set
• Once by considering that the particular functional dependency is not present in the set
• Case-01: No- There exists no functional dependency containing more than one attribute on its left side. In this case, the
set obtained in Step-02 is the canonical cover.
• Case-01: Yes-
• There exists at least one functional dependency containing more than one attribute on its left side. In this case, consider
all such functional dependencies one by one.
• Check if their left side can be reduced.
Use the following steps to perform a check-
• Consider a functional dependency. Compute the closure of all the possible subsets of the left side of that functional
dependency.
• If any of the subsets produce the same closure result as produced by the entire left side, then replace the left side with
that subset. After this step is complete, the set obtained is the canonical cover.
Normalization in DBMS
In DBMS, database normalization is a process of making the database consistent by-
•Reducing the redundancies
•Ensuring the integrity of data through lossless decomposition
• Normalization is done through normal forms.
Normal Forms-
A given relation is called in First Normal Form (1NF) if each cell of the table contains only an atomic value.
OR
A given relation is called in First Normal Form (1NF) if the attribute of every tuple is either single valued or a
null value.
Example-
Relation is in 1NF
NOTE-
By default, every relation is in 1NF.
This is because formal definition of a relation states that value of all the attributes must be atomic.
Second Normal Form (2NF)
A given relation is called in Second Normal Form (2NF) if and only if-
• Relation already exists in 1NF.
• No partial dependency exists in the relation.
Partial Dependency
A partial dependency is a dependency where few attributes of the candidate key determines non-prime
attribute(s). OR
A partial dependency is a dependency where a portion of the candidate key or incomplete candidate key
determines non-prime attribute(s).
In other words,
Transitive Dependency
A → B is called a transitive dependency if and only if-
A is not a super key.
B is a non-prime attribute.
If any one condition fails, then it is not a transitive dependency.
NOTE-
Transitive dependency must not exist for non-prime attributes.
However, transitive dependency can exist for prime attributes.
Third Normal Form (3NF)
A relation is called in Third Normal Form (3NF) if and only if-
Any one condition holds for each non-trivial functional dependency A → B
A is a super key
B is a prime attribute
Example-
Consider a relation- R ( A , B , C , D , E ) with functional dependencies-
A → BC
CD → E
B→D
E→A
The possible candidate keys for this relation are-
A , E , CD , BC
From here,
Prime attributes = { A , B , C , D , E }
There are no non-prime attributes
Now,
It is clear that there are no non-prime attributes in the relation.
In other words, all the attributes of relation are prime attributes. Thus, all the attributes on RHS of each
BCNF
A given relation is called in BCNF if and only if-
Relation already exists in 3NF.
For each non-trivial functional dependency A → B, A is a super key of the relation.
Thus, we conclude that the given relation is in 3NF.
A,B,C
Now, we can observe that LHS of each given functional dependency is a candidate key.
Thus, we conclude that the given relation is in BCNF.
Fourth Normal Form (4NF)
Any relation is said to be in the fourth normal form when it satisfies the
following conditions:
• It must be in Boyce Codd Normal Form (BCNF).
• It should have no multi-valued dependency.
• Here columns COLOR and MANUF_YEAR are dependent on BIKE_MODEL and independent of
each other.
• In this case, these two columns can be called as multivalued dependent on BIKE_MODEL.
These can be shown as
• BIKE_MODEL → → MANUF_YEAR
• BIKE_MODEL → → COLOR
Fifth Normal Form (5NF)
Any relation in order to be in the fifth normal form must satisfy the
following conditions:
• It must be in Fourth Normal Form (4NF).
• It should have no join dependency(All dependency must be
preserve) and also the joining must be lossless.
• The process of breaking up or dividing a single relation into two or more sub relations is
called as decomposition of a relation.
• Properties of Decomposition-
1. Lossless decomposition-Lossless decomposition ensures-
• No information is lost from the original relation during decomposition.
• When the sub relations are joined back, the same relation is obtained that was decomposed.
• Every decomposition must always be lossless.
R1 ⋈ R2 ⋈ R3 ……. ⋈ Rn = R
where ⋈ is a natural join operator
Types of Decomposition-
Example-
Consider the following relation R( A , B , C )-
R( A , B , C )
Consider this relation is decomposed into two sub relations R1( A , B ) and R2( B , C )-
A B C
1 2 1
2 5 3
3 3 3
Types of Decomposition-
The two sub relations are-
R1( A , B )
R2( B , C )
Now, let us check whether this decomposition is lossless or not.
For lossless decomposition, we must have-
R1 ⋈ R2 = R
Now, if we perform the natural join ( ⋈ ) of the sub relations R1 and R2 , we get-
This relation is same as the original relation R.
Thus, we conclude that the above decomposition is lossless join decomposi
A B B C A B C
1 2 2 1 1 2 1
2 5 5 3 2 5 3
3 3 3 3 3 3 3
2. Lossy Join Decomposition-
•Consider there is a relation R which is decomposed into sub relations R1 , R2 , …. , Rn.
•This decomposition is called lossy join decomposition when the join of the sub relations does
not result in the same relation R that was decomposed.
•The natural join of the sub relations is always found to have some extraneous tuples.
•For lossy join decomposition, we always have-
where ⋈ is a natural join operator 1 R ⋈ R ⋈ R ……. ⋈ R ⊃ R
2 3 n
Determining Whether Decomposition Is
Lossless Or Lossy
• Consider a relation R is decomposed into two sub relations R1 and R2.
• Then,If all the following conditions satisfy, then the decomposition is lossless. If any of these
conditions fail, then the decomposition is lossy.
Condition-01: Union of both the sub relations must contain all the attributes that are present in
the original relation R.
R1 ∪ R2 = R
Condition-02: Intersection of both the sub relations must not be null.
• In other words, there must be some common attribute which is present in both the sub
relations.
R1 ∩ R2 ≠ ∅
Condition-03: Intersection of both the sub relations must be a super key of either R1 or R2 or
both.
R1 ∩ R2 = Super key of R1 or R2
Dependency Preserving Decomposition
• If we decompose a relation R into relations R1 and R2, All
dependencies of R either must be a part of R1 or R2 or must be
derivable from combination of FD’s of R1 and R2.
• 1. Read Operation-
• Read operation reads the data from the database and then stores it in the buffer in
main memory.
• For example- Read(A) instruction will read the value of A from the database and will
store it in the buffer in main memory.
• 2. Write Operation-
• Write operation writes the updated data value back to the database from the buffer.
• For example- Write(A) will write the updated value of A from the buffer to the
database.
Transaction Concept
• Isolation requirement — if between steps 3 and 6 (of the fund transfer transaction) , another
transaction T2 is allowed to access the partially updated database, it will see an inconsistent
database (the sum A + B will be less than it should be).
T1 T2
1. read(A)
2. A := A – 50
3. write(A)
read(A), read(B), print(A+B)
4. read(B)
5. B := B + 50
6. write(B
• Isolation can be ensured trivially by running transactions serially
• That is, one after the other.
• However, executing multiple transactions concurrently has significant benefits, as we will see later.
Concurrency Problems in DBMS-
Here,
1.T1 reads the value of A.
2.T1 updates the value of A in the buffer.
3.T2 reads the value of A from the buffer.
4.T2 writes the updated the value of A.
5.T2 commits.
6.T1 fails in later stages and rolls back.
Unrepeatable Read Problem
This problem occurs when a transaction gets to read unrepeated i.e.
different values of the same variable in its different read operations
even when it has not updated its value.
Here,
1.T1 reads the value of X (= 10 say).
2.T2 reads the value of X (= 10).
3.T1 updates the value of X (from 10 to 15 say) in the buffer.
4.T2 again reads the value of X (but = 15).
In this example,
•T2 gets to read a different value of X in its second reading.
•T2 wonders how the value of X got changed because according
to it, it is running in isolation.
Lost Update Problem-
This problem occurs when multiple transactions execute concurrently
and updates from one or more transactions get lost.
Here,
1.T1 reads the value of A (= 10 say).
2.T2 updates the value to A (= 15 say) in the buffer.
3.T2 does blind write A = 25 (write without read) in the buffer.
4.T2 commits.
5.When T1 commits, it writes A = 25 in the database.
In this example,
•T1 writes the over written value of X in the database.
•Thus, update from T1 gets lost.
NOTE-
•This problem occurs whenever there is a write-write conflict.
•In write-write conflict, there are two writes one by each transaction
on the same data item without any read in the middle.
Phantom Read Problem
This problem occurs when a transaction reads some variable from the
buffer and when it reads the same variable later, it finds that the
variable does not exist.
Here,
1.T1 reads X.
2.T2 reads X.
3.T1 deletes X.
4.T2 tries reading X but does not find it.
In this example,
•T2 finds that there does not exist any variable X when it tries
reading X again.
•T2 wonders who deleted the variable X because according Avoiding
to Concurrency Problems-
it, it is running in isolation.
•To ensure consistency of the database, it is very
important to prevent the occurrence of above problems.
•Concurrency Control Protocols and Schedulling help
to prevent the occurrence of above problems and maintain
the consistency of the database.
DBMS Schedule
A series of operation from one transaction to another transaction is
known as schedule. It is used to preserve the order of the operation in
each of the individual transaction.
Serial Schedule
• The serial schedule is a type of schedule where one transaction is executed
completely before starting another transaction. In the serial schedule, when the
first transaction completes its cycle, then the next transaction is executed. Serial
schedules are always-
• Consistent
• Recoverable
• For example: Suppose there are two transactions T1 and T2 which have some
operations. If it has no interleaving of operations, then there are the following
two possible outcomes:
• Execute all the operations of T2 which was followed by all the operations of T1.
• Execute all the operations of T1 which was followed by all the operations of T2.
Serial Schedule
Non-serial Schedule
• If a precedence graph contains a single edge Ti → Tj, then all the instructions of Ti are executed
before the first instruction of Tj is executed.
• If a precedence graph for schedule S contains a cycle, then S is non-serializable. If the precedence
graph has no cycle, then S is known as serializable.
Serializability in DBMS-
Conflict Serializability-
If a given non-serial schedule can be converted into a serial schedule by swapping its non-conflicting operations,
then it is called as a conflict serializable schedule.
Conflicting Operations-
Two operations are called as conflicting operations if all the following conditions hold true for them-
•Both the operations belong to different transactions
•Both the operations are on the same data item
•At least one of the two operations is a write operation
Problem on Serializability
Check whether the given schedule S is conflict serializable or not-
S : R1(A) , R2(A) , R1(B) , R2(B) , R3(B) , W1(A) , W2(B)
Step-01:
• List all the conflicting operations and determine the dependency between the transactions-
• R2(A) , W1(A) (T2 → T1)
• R1(B) , W2(B) (T1 → T2)
• R3(B) , W2(B) (T3 → T2)
• Concurrent access is quite easy if all users are just reading data. There
is no way they can interfere with one another. Though for any
practical Database, it would have a mix of READ and WRITE
operations and hence the concurrency is a challenge.
Concurrency Control Techniques
• Concurrency control protocols ensure atomicity, isolation, and
serializability of concurrent transactions. The concurrency control
protocol can be divided into three categories:
• Lock based protocol
• Time-stamp based protocol
• Deadlock handling
• Recovery
• Log based recovery
• Shadow paging
Lock-Based Protocol
• The wait for graph shows the relationship between the resources and
transactions. If a transaction requests a resource or if it already holds
a resource, it is visible as an edge on the wait for graph. If the wait for
graph contains a cycle, then there may be a deadlock in the system,
otherwise not
Deadlock Prevention
Recovery using Log records – After a system crash has occurred, the system consults the log to determine which transactions
need to be redone and which need to be undone.
•Transaction Ti needs to be undone if the log contains the record <Ti start> but does not contain either the record <Ti
commit> or the record <Ti abort>.
•Transaction Ti needs to be redone if log contains record <Ti start> and either the record <Ti commit> or the record <Ti
abort>.
• Step 1 − Page is a segment of memory. Page table is an index of pages. Each table entry points
to a page on the disk.
• Step 2 − Two page tables are used during the life of a transaction: the current page table and
the shadow page table. Shadow page table is a copy of the current page table.
• Step 3 − When a transaction starts, both the tables look identical, the current table is updated
for each write operation.
• Step 4 − The shadow page is never changed during the life of the transaction.
• Step 5 − When the current transaction is committed, the shadow page entry becomes a copy of
the current page table entry. After transaction, both tables become identical.
• Step 6 − The shadow page table is stored in non-volatile memory. If the system crash occurs,
then the shadow page table is copied to the current page table.
Advantages & Disadvantages
of Shadow Paging
Advantages
• The advantages of shadow paging are as follows −
• No need for log records.
• No undo/ Redo algorithm.
• Recovery is faster.
Disadvantages
• The disadvantages of shadow paging are as follows −
• Data is fragmented or scattered.
• Garbage collection problem. Database pages containing old versions of modified
data need to be garbage collected after every transaction.
• Concurrent transactions are difficult to execute.