You are on page 1of 97

TRANSACTION MANAGEMENT,

RECOVERY
AND
QUERY PROCESSING
Learning Objectives
ž  A transaction represents a real‑world event such as the sale
of a product.
ž  A transaction must be a logical unit of work. That is, no
portion of a transaction stands by itself. For example, the
product sale has an effect on inventory and, if it is a credit
sale, it has an effect on customer balances.
ž  A transaction must take a database from one consistent
state to another. Therefore, all parts of a transaction must be
executed or the transaction must be aborted. (A consistent
state of the database is one in which all data integrity
constraints are satisfied.)
Course Contain
ž  Introduction to Transaction Management
ž  ACID Properties
ž  Introduction to Concurrency Control
ž  Reasons of Transaction Failure, System
Recovery and Media Recovery
ž  Introduction to Query Processing
ž  Steps in query Processing
Introduction
ž  A transaction is a logical processing corresponding to a
series of elementary physical operations(reads/writes)
on the DB
ž  Examples:
—  Transfer of a sum between bank accounts
—  UPDATE CC UPDATE CC
—  SET balance=balance-50 SET
balance=balance+50
—  WHERE account=123 WHERE
account=235
—  Updating wages of employees in a branch
—  UPDATE Emp
—  SET wage=1.1*wage
—  WHERE branch=‘S01’
Transaction Concept
ž  A transaction is a unit of program execution that accesses
and possibly updates various data items.
ž  A transaction must see a consistent database. During
transaction execution the database may be inconsistent.
ž  When the transaction is committed, the database must be
consistent
ž  Two main issues to deal with:
—  Failures of various kinds, such as hardware failures and
system crashes
—  Concurrent execution of multiple transactions
Transactions in DBMS
ž  Transactions are a set of operations used to perform a
logical set of work. A transaction usually means that the
data in the database has changed. One of the major uses
of DBMS is to protect the user’s data from system failures.
It is done by ensuring that all the data is restored to a
consistent state when the computer is restarted after a
crash. The transaction is any one execution of the user
program in a DBMS. Executing the same program multiple
times will generate multiple transactions.
ž  Example –
Transaction to be performed to withdraw cash from an
ATM vestibule.
ž  Set of Operations :
Consider the following example for transaction operations
as follows.
ž  Example -ATM transaction steps.
—  Transaction Start.
—  Insert your ATM card.
—  Select language for your transaction.
—  Select Savings Account option.
—  Enter the amount you want to withdraw.
—  Enter your secret pin.
—  Wait for some time for processing.
—  Collect your Cash.
—  Transaction Completed.
ž  Three operations can be performed in a transaction as follows.
—  Read/Access data (R).
—  Write/Change data (W).
—  Commit.
ž  Example –
Transfer of Rs.500 from Account A to Account B. Initially A= 500,
B= Rs.800. This data is brought to RAM from Hard Disk.
ž  The updated value of Account A = 450 and Account B = 850.
ž  All instructions before commit come under a partially
committed state and are stored in RAM. When the commit is
read the data is fully accepted and is stored in Hard Disk.
ž  If the data is failed anywhere before commit we have to go
back and start from the beginning. We can’t continue from
the same state. This is known as Roll Back.
Transaction failure in between the operations

ž  The transaction can fail before finishing the all the


operations in the set. This can happen due to power failure,
system crash etc. This is a serious problem that can leave
database in an inconsistent state. Assume that transaction
fail after third operation then the amount would be deducted
from your account but your friend will not receive it.
ž  To solve this problem, we have the following two operations
—  Commit: If all the operations in a transaction are
completed successfully then commit those changes to
the database permanently.
—  Rollback: If any of the operation fails then rollback all the
changes done by previous operations.
—  Uses of Transaction Management :
ž  The DBMS is used to schedule the access of data
concurrently. It means that the user can access multiple
data from the database without being interfered with each
other. Transactions are used to manage concurrency.
ž  It is also used to satisfy ACID properties.
ž  It is used to solve Read/Write Conflict.
ž  It is used to implement Recoverability, Serializability, and
Cascading.
ž  Transaction Management is also used for Concurrency
Control Protocols and Locking of data.
Disadvantage of using a Transaction

ž  It may be difficult to change the information within the


transaction database by end-users.
ž  We need to always roll back and start from the
beginning rather than continue from the previous state.
ACID Properties in DBMS
Atomicity

ž  It states that all operations of the transaction take place at


once if not, the transaction is aborted.
ž  There is no midway, i.e., the transaction cannot occur
partially. Each transaction is treated as one unit and either
run to completion or is not executed at all.
ž  Atomicity involves the following two operations:
—  Abort: If a transaction aborts then all the changes made
are not visible.
—  Commit: If a transaction commits then all the changes
made are visible.
ž  Consider the following transaction T consisting of T1 and T2:
Transfer of 100 from account X to account Y.

ž  If the transaction fails after completion of T1 but before


completion of T2.( say, after write(X) but before write(Y)),
then amount has been deducted from X but not added to Y.
This results in an inconsistent database state. Therefore, the
transaction must be executed in entirety in order to ensure
correctness of database state.
Consistency
ž  The integrity constraints are maintained so that the database is
consistent before and after the transaction.
ž  The execution of a transaction will leave a database in either its prior
stable state or a new stable state.
ž  The consistent property of database states that every transaction
sees a consistent database instance.
ž  The transaction is used to transform the database from one
consistent state to another consistent state.
ž  For example: The total amount must be maintained before or after
the transaction.

ž  Therefore, the database is consistent. In the case when T1 is


completed but T2 fails, then inconsistency will occur.
Isolation
ž  It shows that the data which is used at the time of execution
of a transaction cannot be used by the second transaction
until the first one is completed.
ž  In isolation, if the transaction T1 is being executed and using
the data item X, then that data item can't be accessed by any
other transaction T2 until the transaction T1 ends.
ž  The concurrency control subsystem of the DBMS enforced
the isolation property.
ž  Consider two transactions T and T”

ž  Suppose T has been executed till Read (Y) and then T’’ starts.
As a result , interleaving of operations takes place due to which
T’’ reads correct value of X but incorrect value of Y and sum
computed by
—  T’’: (X+Y = 50, 000+500=50, 500)
—  is thus not consistent with the sum at end of transaction:
—  T: (X+Y = 50, 000 + 450 = 50, 450).
ž  This results in database inconsistency, due to a loss of 50 units.
Hence, transactions must take place in isolation and changes
should be visible only after they have been made to the main
memory.
Durability
ž  The durability property is used to indicate the performance
of the database's consistent state. It states that the
transaction made the permanent changes.
ž  They cannot be lost by the erroneous operation of a faulty
transaction or by the system failure. When a transaction is
completed, then the database reaches a state known as the
consistent state. That consistent state cannot be lost, even
in the event of a system's failure.
ž  The recovery subsystem of the DBMS has the responsibility
of Durability property.
Transaction States
ž  Transactions can be implemented using SQL queries and Server. In
the below-given diagram, you can see how transaction states
works.
Transaction Support

ž  Two possible outcomes:


—  Success - transaction commits and database
reaches new consistent state
—  Failure - transaction aborts, and database restored to
consistent state before transaction started
○  Referred to as rolled back or undone transaction
ž  Committed transaction cannot be aborted
ž  Aborted transaction that is rolled back can be restarted
later
ž  Active state
—  The active state is the first state of every transaction. In this
state, the transaction is being executed.
—  For example: Insertion or deletion or updating a record is
done here. But all the records are still not saved to the
database.
ž  Partially committed
—  In the partially committed state, a transaction executes its
final operation, but the data is still not saved to the
database.
—  In the total mark calculation example, a final display of the
total marks step is executed in this state.
ž  Committed
—  A transaction is said to be in a committed state if it
executes all its operations successfully. In this state, all the
effects are now permanently saved on the database
system.
ž  Failed state
—  If any of the checks made by the database recovery system fails,
then the transaction is said to be in the failed state.
—  In the example of total mark calculation, if the database is not able
to fire a query to fetch the marks, then the transaction will fail to
execute.
ž  Aborted
—  If any of the checks fail and the transaction has reached a failed
state then the database recovery system will make sure that the
database is in its previous consistent state. If not then it will abort or
roll back the transaction to bring the database into a consistent
state.
—  If the transaction fails in the middle of the transaction then before
executing the transaction, all the executed transactions are rolled
back to its consistent state.
—  After aborting the transaction, the database recovery module will
select one of the two operations:
○  Re-start the transaction
○  Kill the transaction
Schedule
ž  A series of operation from one transaction to another transaction
is known as schedule. It is used to preserve the order of the
operation in each of the individual transaction.
Serial Schedule
ž  The serial schedule is a type of schedule where one
transaction is executed completely before starting another
transaction. In the serial schedule, when the first
transaction completes its cycle, then the next transaction
is executed.
ž  For example: Suppose there are two transactions T1 and
T2 which have some operations. If it has no interleaving
of operations, then there are the following two possible
outcomes:
—  Execute all the operations of T1 which was followed by
all the operations of T2.
—  In the given (a) figure, Schedule A shows the serial
schedule where T1 followed by T2.
—  In the given (b) figure, Schedule B shows the serial
schedule where T2 followed by T1.
Non-serial Schedule

ž  If interleaving of operations is allowed, then there will be


non-serial schedule.
ž  It contains many possible orders in which the system can
execute the individual operations of the transactions.
ž  In the given figure (c) and (d), Schedule C and Schedule D
are the non-serial schedules. It has interleaving of
operations.
Serializable schedule

ž  The serializability of schedules is used to find non-serial


schedules that allow the transaction to execute
concurrently without interfering with one another.
ž  It identifies which schedules are correct when executions
of the transaction have interleaving of their operations.
ž  A non-serial schedule will be serializable if its result is
equal to the result of its transactions executed serially.
ž  Here,
ž  Schedule A and Schedule B are serial
schedule.
ž  Schedule C and Schedule D are Non-serial
schedule.
Serializable

ž  These are of two types:


—  Conflict Serializable:
—  A schedule is called conflict serializable if it can be
transformed into a serial schedule by swapping
non-conflicting operations.
—  View Serializable:
—  A Schedule is called view serializable if it is view
equal to a serial schedule (no overlapping
transactions). A conflict schedule is a view
serializable but if the serializability contains blind
writes, then the view serializable does not conflict
serializable.
Non-Serializable

ž  The non-serializable schedule is divided into two types,


Recoverable and Non-recoverable Schedule. Recoverable
Schedule: Schedules in which transactions commit only after all
transactions whose changes they read commit are called
recoverable schedules. In other words, Non-Recoverable, if some
transaction Tj is reading value updated or written by some other
transaction Ti, then the commit of Tj must occur after the commit of
Ti.
ž  Example – Consider the following schedule involving two
transactions T1 and T2.
This is a recoverable schedule since T1 commits before T2, that
makes the value read by T2 correct.
Non-Recoverable Schedule
ž  Example: Consider the following schedule involving two
transactions T1 and T2.

ž  T2 read the value of A written by T1, and committed. T1 later


aborted, therefore the value read by T2 is wrong, but since T2
committed, this schedule is non-recoverable.
Concurrency Control
ž  When more than one transactions are running simultaneously
there are chances of a conflict to occur which can leave
database to an inconsistent state. To handle these conflicts we
need concurrency control in DBMS, which allows transactions to
run simultaneously but handles them in such a way so that the
integrity of data remains intact.
○  Conflict Example
ž  You and your brother have a joint bank account, from which you
both can withdraw money. Now let’s say you both go to different
branches of the same bank at the same time and try to withdraw
Rs. 5000 , your joint account has only 6000 balance. Now if we
don’t have concurrency control in place you both can get Rs.
5000 at the same time but once both the transactions finish the
account balance would be -4000 which is not possible and
leaves the database in inconsistent state.
ž  We need something that controls the transactions in such a way
that allows the transaction to run concurrently but maintaining
the consistency of data to avoid such issues.
Concurrency Control
ž  Process of managing simultaneous
operations on the database without having
them interfere with one another
ž  Prevents interference when two or more
users access database simultaneously and at
least one updates data
ž  Interleaving of operations may produce
incorrect results
Need for Concurrency Control
ž  Potential concurrency problems:
—  Lost update problem
—  Uncommitted dependency problem
—  Inconsistent analysis problem
Lost Update Problem

ž  Successfully completed update


overridden by another user
ž  Example:
—  T1 withdrawing Rs.10 from an account with
balx, initially Rs.100
—  T2 depositing Rs.100 into same account
—  Serially, final balance would be Rs.190
Lost Update Problem

ž  Loss of T2’s update avoided by preventing T1 from reading balx


until after update
Uncommitted Dependency Problem

ž  Occurs when one transaction can see intermediate


results of another transaction before it has
committed
ž  Example:
—  T4 updates balx to Rs.200 but it aborts, so balx
should be back at original value of Rs.100
—  T3 has read new value of balx (Rs.200) and uses
value as basis of Rs.10 reduction, giving a new
balance of Rs.190, instead of Rs.90
ž  Problem avoided by preventing T3 from reading balx
until after T4 commits or aborts
Inconsistent Analysis Problem

ž  Occurs when transaction reads several values but 2nd


transaction updates some of them during execution of
first
ž  Aka dirty read or unrepeatable read
ž  Example:
ž  T6 is totaling balances of account x (Rs.100), account y
(Rs.50), and account z (Rs.25).
ž  Meantime, T5 has transferred Rs.10 from balx to balz,
so T6 now has wrong result (Rs.10 too high).
ž  Problem avoided by preventing T6 from reading balx and balz
until after T5 completed updates
Concurrency Control

ž  Concurrency Control is a method used to ensure that database


transactions are executed in a safe manner (i.e. without data loss).
It is especially applicable to relational database and DBMS, which
must ensure that transactions are executed safely and that they
follow the ACID rules. The DBMS must be able to ensure that only
serializable, recoverable schedules are allowed.
ž  There are several methods/categorized to concurrency control
—  Lock Based Concurrency Control Protocol
—  Time Stamp Concurrency Control Protocol
—  Validation Based Concurrency Control Protocol
Lock Based Concurrency Control Protocol

ž  n this type of protocol, any transaction cannot read or write data


until it acquires an appropriate lock on it. There are two types of
lock:
—  1. Shared lock:
○  It is also known as a Read-only lock. In a shared lock, the
data item can only read by the transaction.
○  It can be shared between the transactions because when the
transaction holds a lock, then it can't update the data on the
data item.
—  2. Exclusive lock:
○  In the exclusive lock, the data item can be both reads as well
as written by the transaction.
○  This lock is exclusive, and in this lock, multiple transactions
do not modify the same data simultaneously.
ž  There are four types of lock protocols available:
ž  1. Simplistic lock protocol
—  It is the simplest way of locking the data while transaction. Simplistic
lock-based protocols allow all the transactions to get the lock on the
data before insert or delete or update on it. It will unlock the data item
after completing the transaction.
ž  2. Pre-claiming Lock Protocol
—  Pre-claiming Lock Protocols evaluate the transaction to list all the data
items on which they need locks.
—  Before initiating an execution of the transaction, it requests DBMS for
all the lock on all those data items.
—  If all the locks are granted then this protocol allows the transaction to
begin. When the transaction is completed then it releases all the lock.
—  If all the locks are not granted then this protocol allows the transaction
to rolls back and waits until all the locks are granted.
ž  3. Two-phase locking (2PL)
—  The two-phase locking protocol divides the execution phase of
the transaction into three parts.
—  In the first part, when the execution of the transaction starts, it
seeks permission for the lock it requires.
—  In the second part, the transaction acquires all the locks. The
third phase is started as soon as the transaction releases its first
lock.
—  In the third phase, the transaction cannot demand any new locks.
It only releases the acquired locks.
ž  There are two phases of 2PL:
ž  Growing phase:
—  In the growing phase, a new lock on the data item may be
acquired by the transaction, but none can be released.
ž  Shrinking phase:
—  In the shrinking phase, existing lock held by the transaction may
be released, but no new locks can be acquired.
ž  In the below example, if lock conversion is allowed then the
following phase can happen:
—  Upgrading of lock (from S(a) to X (a)) is allowed in growing
phase.
—  Downgrading of lock (from X(a) to S(a)) must be done in
shrinking phase.
ž  Example:
The following way shows how unlocking and locking work with 2-PL.
Transaction T1:
Growing phase: from step 1-3
Shrinking phase: from step 5-7
Lock point: at 3
Transaction T2:
Growing phase: from step 2-6
Shrinking phase: from step 8-9
Lock point: at 6
ž  4. Strict Two-phase locking (Strict-2PL)
The first phase of Strict-2PL is similar to 2PL. In the first phase,
after acquiring all the locks, the transaction continues to execute
normally.
The only difference between 2PL and strict 2PL is that Strict-2PL
does not release a lock after using it.
Strict-2PL waits until the whole transaction to commit, and then it
releases all the locks at a time.
Strict-2PL protocol does not have shrinking phase of lock release.
Timestamp Ordering Protocol
ž  The Timestamp Ordering Protocol is used to order the transactions
based on their Timestamps. The order of transaction is nothing but
the ascending order of the transaction creation.
ž  The priority of the older transaction is higher that's why it executes
first. To determine the timestamp of the transaction, this protocol
uses system time or logical counter.
ž  The lock-based protocol is used to manage the order between
conflicting pairs among transactions at the execution time. But
Timestamp based protocols start working as soon as a transaction
is created.
ž  Let's assume there are two transactions T1 and T2. Suppose the
transaction T1 has entered the system at 007 times and transaction
T2 has entered the system at 009 times. T1 has the higher priority,
so it executes first as it is entered the system first.
ž  The timestamp ordering protocol also maintains the timestamp of
last 'read' and 'write' operation on a data.
ž  Basic Timestamp ordering protocol works as follows:
ž  1. Check the following condition whenever a transaction Ti issues
a Read (X) operation:
—  If W_TS(X) >TS(Ti) then the operation is rejected.
—  If W_TS(X) <= TS(Ti) then the operation is executed.
—  Timestamps of all the data items are updated.
ž  2. Check the following condition whenever a transaction Ti issues
a Write(X) operation:
—  If TS(Ti) < R_TS(X) then the operation is rejected.
—  If TS(Ti) < W_TS(X) then the operation is rejected and Ti is
rolled back otherwise the operation is executed.
○  Where,
○  TS(TI) denotes the timestamp of the transaction Ti.
○  R_TS(X) denotes the Read time-stamp of data-item X.
○  W_TS(X) denotes the Write time-stamp of data-item X.
ž  Advantages and Disadvantages of TO protocol:
ž  TO protocol ensures serializability since the precedence graph is
as follows:

ž  TS protocol ensures freedom from deadlock that means no


transaction ever waits.
ž  But the schedule may not be recoverable and may not even be
cascade- free.
Thomas write Rule
ž  Thomas Write Rule provides the guarantee of serializability order
for the protocol. It improves the Basic Timestamp Ordering
Algorithm.
ž  The basic Thomas write rules are as follows:
ž  If TS(T) < R_TS(X) then transaction T is aborted and rolled back,
and operation is rejected.
ž  If TS(T) < W_TS(X) then don't execute the W_item(X) operation of
the transaction and continue processing.
ž  If neither condition 1 nor condition 2 occurs, then allowed to
execute the WRITE operation by transaction Ti and set W_TS(X) to
TS(T).
ž  If we use the Thomas write rule then some serializable schedule
can be permitted that does not conflict serializable as illustrate by
the schedule in a given figure:
ž  Figure: A Serializable Schedule that is not Conflict Serializable
ž  In the above figure, T1's read and precedes T1's write of the same data
item. This schedule does not conflict serializable.
ž  Thomas write rule checks that T2's write is never seen by any transaction.
If we delete the write operation in transaction T2, then conflict serializable
schedule can be obtained which is shown in below figure.
ž  Figure: A Conflict Serializable Schedule
Validation Based Protocol
ž  Validation phase is also known as optimistic concurrency control
technique. In the validation based protocol, the transaction is
executed in the following three phases:
—  Read phase: In this phase, the transaction T is read and
executed. It is used to read the value of various data items and
stores them in temporary local variables. It can perform all the
write operations on temporary variables without an update to the
actual database.
—  Validation phase: In this phase, the temporary variable value
will be validated against the actual data to see if it violates the
serializability.
—  Write phase: If the validation of the transaction is validated, then
the temporary results are written to the database or system
otherwise the transaction is rolled back.
ž  Here each phase has the following different timestamps:
ž  Start(Ti):
—  It contains the time when Ti started its execution.
ž  Validation (Ti):
—  It contains the time when Ti finishes its read phase and starts its
validation phase.
ž  Finish(Ti):
—  It contains the time when Ti finishes its write phase.
○  This protocol is used to determine the time stamp for the
transaction for serialization using the time stamp of the
validation phase, as it is the actual phase which determines if
the transaction will commit or rollback.
○  Hence TS(T) = validation(T).
○  The serializability is determined during the validation process. It
can't be decided in advance.
○  While executing the transaction, it ensures a greater degree of
concurrency and also less number of conflicts.
○  Thus it contains transactions which have less number of
rollbacks.
Multiple Granularity
ž  Granularity: It is the size of data item allowed to lock.
ž  Multiple Granularity:
—  It can be defined as hierarchically breaking up the database into
blocks which can be locked.
—  The Multiple Granularity protocol enhances concurrency and
reduces lock overhead.
—  It maintains the track of what to lock and how to lock.
—  It makes easy to decide either to lock a data item or to unlock a data
item. This type of hierarchy can be graphically represented as a tree.
ž  For example: Consider a tree which has four levels of nodes.
—  The first level or higher level shows the entire database.
—  The second level represents a node of type area. The higher level
database consists of exactly these areas.
—  The area consists of children nodes which are known as files. No file
can be present in more than one area.
—  Finally, each file contains child nodes known as records. The file has
exactly those records that are its child nodes. No records represent
in more than one file.
ž  Hence, the levels of the tree starting from the top level are as follows:
—  Database
—  Area
—  File
—  Record
ž  In this example, the highest level shows the entire database. The
levels below are file, record, and fields.
ž  There are three additional lock modes with multiple granularity:
ž  Intention Mode Lock
—  Intention-shared (IS): It contains explicit locking at a lower
level of the tree but only with shared locks.
—  Intention-Exclusive (IX): It contains explicit locking at a lower
level with exclusive or shared locks.
—  Shared & Intention-Exclusive (SIX): In this lock, the node is
locked in shared mode, and some node is locked in exclusive
mode by the same transaction.
—  Compatibility Matrix with Intention Lock Modes: The below
table describes the compatibility matrix for these lock modes:
ž  It uses the intention lock modes to ensure serializability. It requires that if
a transaction attempts to lock a node, then that node must follow these
protocols:
•  Transaction T1 should follow the lock-compatibility matrix.
•  Transaction T1 firstly locks the root of the tree. It can lock it in any mode.
•  If T1 currently has the parent of the node locked in either IX or IS mode,
then the transaction T1 will lock a node in S or IS mode only.
•  If T1 currently has the parent of the node locked in either IX or SIX
modes, then the transaction T1 will lock a node in X, SIX, or IX mode only.
•  If T1 has not previously unlocked any node only, then the Transaction T1
can lock a node.
•  If T1 currently has none of the children of the node-locked only, then
Transaction T1 will unlock a node.
ž  Observe that in multiple-granularity, the locks are acquired in
top-down order, and locks must be released in bottom-up
order.
—  If transaction T1 reads record Ra9 in file Fa, then
transaction T1 needs to lock the database, area A1 and file
Fa in IX mode. Finally, it needs to lock Ra2 in S mode.
—  If transaction T2 modifies record Ra9 in file Fa, then it can
do so after locking the database, area A1 and file Fa in IX
mode. Finally, it needs to lock the Ra9 in X mode.
—  If transaction T3 reads all the records in file Fa, then
transaction T3 needs to lock the database, and area A in
IS mode. At last, it needs to lock Fa in S mode.
—  If transaction T4 reads the entire database, then T4 needs
to lock the database in S mode.
Recovery with Concurrent Transaction

ž  Whenever more than one transaction is being executed,


then the interleaved of logs occur. During recovery, it
would become difficult for the recovery system to
backtrack all logs and then start recovering.
ž  To ease this situation, 'checkpoint' concept is used by
most DBMS.
ž  As we have discussed checkpoint in Transaction
Processing Concept of this tutorial, so you can go
through the concepts again to make things more clear.
Reasons for Transaction Failure
ž  Data is manipulated by processes. Records can be altered, new
records added and old records deleted. A transaction is a
complete function undertaken by a set of processes.
ž  When a transaction is submitted for execution ( for example when
the save button is pressed) the system checks whether
ž  all operations involved in the transaction are successfully
completed
ž  the transaction has had no effect on the database or any other
transaction.
ž  If either of these checks fail then the system will generate an error
message depending on the nature of the failure.
Types of Failure
ž  Transaction :Caused by errors within the transaction processes.
ž  System :Caused by failure of network or operating system or
physical threats to the system as a whole.
ž  Media :Failure of hard disk, out of memory errors, out of disk space
errors.
ž  1. Transaction failure
ž  The transaction failure occurs when it fails to execute or when it
reaches a point from where it can't go any further. If a few
transaction or process is hurt, then this is called as transaction
failure.
ž  Reasons for a transaction failure could be -
—  Logical errors: If a transaction cannot complete due to some
code error or an internal error condition, then the logical error
occurs.
—  Syntax error: It occurs where the DBMS itself terminates an
active transaction because the database system is not able to
execute it. For example, The system aborts an active transaction,
in case of deadlock or resource unavailability.
ž  2. System Crash
—  System failure can occur due to power failure or other
hardware or software failure. Example: Operating system
error.
—  Fail-stop assumption: In the system crash, non-volatile
storage is assumed not to be corrupted.
ž  3. Disk Failure
—  It occurs where hard-disk drives or storage drives used to fail
frequently. It was a common problem in the early days of
technology evolution.
—  Disk failure occurs due to the formation of bad sectors, disk
head crash, and unreachability to the disk or any other failure,
which destroy all or part of disk storage.
Reasons for Failure
ž  Failure may be caused by a number of things.

ž  Transaction errors, system errors, system crashes, concurrency


problems and local errors or exceptions are the more common
causes of system failure. The system must be able to recover from
such failures without loss of data.
System Recovery
What is Transaction Failure?
ž  When a transaction is submitted to a database system, this is the
responsibility of the database management system to execute all
the operations in the Transaction.
ž  According to the atomicity property of Transaction, all the
operations in a transaction have to be executed, or none will be
completed. There won’t be a case where only half of the operations
will be executed, or this case will lead to a transaction failure.

Recovery System in DBMS from Transaction Failure


ž  In a database recovery management system, there are mainly two
recovery techniques that can help a DBMS in recovering and
maintaining the atomicity of a transaction. Those are as follows
—  1.Log Based Recovery.
—  2.Shadow Paging
Log Based Recovery in DBMS
ž  A log is a sequence of records that contains the history of all
updates made to the Database. Log the most commonly used
structure for recording database modification. Some time log
record is also known as system log.
ž  Update log has the following fields-
—  Transaction Identifier: To get the Transaction that is executing.
—  Data item Identifier: To get the data item of the Transaction
that is running.
—  The old value of the data item (Before the write operation).
—  The new value of the data item (After the write operation).
ž  the basic structure of the format of a log record.
—  <T, Start >. The Transaction has started.
—  <T, X, V1,V2>. The Transaction has performed write on data. V
is a value that X will have value before writing, and V2 is a
Value that X will have after the writing operation.
—  <T, Commit>. The Transaction has been committed.
—  <T, Abort>. The Transaction has aborted.
ž  Consider the data Item A and B with initial value 1000. (A=B=1000)

ž  In the above table, in the left column, a transaction is written, and in


the right column of the table, a log record is written for this
Transaction.
Shadow Paging Recovery Method
ž  It is a commonly used method for database recovery systems
in DBMS. It requires less disk access than do-log methods.
ž  Here the D.B. is partitioned into some number of fixed-length blocks
known as pages, and it maintains two-page tables during the life
cycle of Transaction.
ž  At the starting of the Transaction, the page tables are identical at
that time.
ž  Here each entry contains a pointer to a certain block on the disk.
The key idea is to maintain two-page tables during the
transaction-1) Current page table 2) Shadow page table.
ž  When the Transaction starts, both the pages are identical. But
during the Transaction, the current page table makes all the
changes while the shadow page table remains as it was before.
On the shadow page, the instructions of the Transaction are
stored.
CheckPoints Recovery Method in DBMS

ž  A checkpoint is another recovery technique used in database


recovery management in DBMS. In this technique, checkpoint
operation is performed periodically that copies log information
onto stable storage (volatile to stable storage). The information
and operations performed at each checkpoint consists of the
following-
—  The Start of the checkpoint and the time and date of the
checkpoint is written to the log, and it’s done on a stable
storage device.
—  All log data from the buffers within the computer memory is
copied to the log on the stable storage.
—  The databases are updated from the buffers that are in the
volatile storage that are then moved to the physical Database.
—  An end of checkpoint record is written, and the address of
the checkpoint record is saved on a file accessible to the
recovery routine on start-up after a system crash.
—  The frequency of checkpointing is a design consideration
of the recovery system. Following are the options-
ž  The fixed interval of time.
ž  Transaction consistent checkpoint.
ž  Action-consistent checkpoint.
ž  Transaction oriented checkpoint
ž  Solution – Option B is the right answer.
ž  Explanation – Since the system failed before record 7 in
transaction T2. It means T2 does not perform commit
operations. When the Recovery manager checks the log
file to recover the Database, he will find the entry <T1,
Start > and <T1, Commit> in the Log file.
ž  It means T1 has committed successfully, so record two
and record 3 of Transaction T1 will be REDO, and new
Value or Updated value of B and M will be set in the
Database.
ž  For transaction T2, there is <t2, Start> but < T2, Commit>
is not present in the Log record, so in this case, to bring
the Database inconsistent state record six will be UNDO,
it means the value of B will not be changed to 10500.
Value of B set by transaction T1 means 10000 will be
written in the Database.
Media Recovery
ž  Unlike crash and instance recovery, media recovery is executed
on your command. In media recovery, you use online and archived
redo logs and incremental backups to make a restored backup
current or to update it to a specific time. It is called media recovery
because you usually perform it in response to media failure.
ž  Media recovery uses redo records or incremental backups to
recover restored data files either to the present or to a specified
non-current time. When performing media recovery, you can
recover the whole database, a table space, or a data file. In any
case, you always use a restored backup to perform the recovery.
The principal division in media recovery is between
complete recovery and incomplete recovery.
Complete Recovery
ž  Complete recovery involves using redo data or incremental
backups combined with a backup of a database, table space, or
data file to update it to the most current point in time. It is called
complete because Oracle applies all of the redo changes to the
backup. Typically, you perform media recovery after a media failure
damages data files or the control file.

ž  Requirements for Complete Recovery


ž  You can perform complete recovery on a database, table space, or
data file. If you are performing complete recovery on the whole
database, then you must:
—  Mount the database.
—  Ensure that all data files you want to recover are online.
—  Restore a backup of the whole database or the files you want to
recover.
—  Apply online or archived redo logs, or a combination of the two.
ž  If you are performing complete recovery on a tablespace or
datafile, then you must:
—  Take the tablespace or datafile to be recovered offline if the
database is open.
—  Restore a backup of the datafiles you want to recover.
—  Apply online or archived redo logs, or a combination of the
two.
Incomplete Recovery
ž  Incomplete recovery uses a backup to produce a non-current version
of the database. In other words, you do not apply all of the redo data
generated since the most recent backup. You usually perform
incomplete recovery when:
—  Media failure destroys some or all of the online redo logs.
—  A user error causes data loss, e.g., a user inadvertently drops a
table.
—  You cannot perform complete recovery because an archived redo
log is missing.
—  You lose your current control file and must use a backup control file
to open the database.
ž  To perform incomplete media recovery, you must restore all datafiles
from backups created prior to the time to which you want to recover
and then open the database with the RESETLOGS option when
recovery completes. The RESETLOGS operation creates a new
incarnation of the database. All archived redo logs generated after the
point of the RESETLOGS on the old incarnation are invalid on the
new incarnation.
Query Processing in DBMS

ž  Query Processing is a translation of high-level queries into


low-level expression.
ž  It is a step wise process that can be used at the physical
level of the file system, query optimization and actual
execution of the query to get the result.
ž  It requires the basic concepts of relational algebra and file
structure.
ž  It refers to the range of activities that are involved in
extracting data from the database.
ž  It includes translation of queries in high-level database
languages into expressions that can be implemented at the
physical level of the file system.
ž  In query processing, we will actually understand how these
queries are processed and how they are optimized.
In the above diagram,
ž  T h e f i r s t s t e p i s t o
transform the query into a
standard form.
ž  A query is translated into
SQL and into a relational
algebraic expression.
During this process,
Parser checks the syntax
and verifies the relations
and the attributes which
are used in the query.
ž  The second step is Query
O p t i m i z e r. I n t h i s , i t
transforms the query into
equivalent expressions
that are more efficient to
execute.
ž  The third step is Query
evaluation. It executes the
above query execution
plan and returns the result.
ž  Translating SQL Queries into Relational Algebra
ž  Example
—  SELECT Ename FROM Employee
WHERE Salary > 5000;
ž  Translated into Relational Algebra Expression
σ Salary > 5000 (π Ename (Employee))
OR
π Ename (σ Salary > 5000 (Employee))

•  A sequence of primitive operations


that can be used to evaluate a query
is a Query Execution Plan or Query
Evaluation Plan.

•  The above diagram indicates that the


query execution engine takes a query
execution plan and returns the
answers to the query.

•  Query Execution Plan minimizes the


cost of query evaluation.
ž  Block Diagram of Query Processing is as:
ž  Detailed Diagram is drawn as:
Basic Steps in Query Processing
ž  Parsing and translation
ž  Optimization
ž  Evaluation
1. Parsing and translation
ž  Translate the query into its internal form. This is then translated into
relational algebra.
ž  Parser checks syntax, verifies relation.
ž  Step-1:
Parser: During parse call, the database performs the following checks-
Syntax check, Semantic check and Shared pool check, after converting
the query into relational algebra.Parser performs the following checks
as (refer detailed diagram):
—  Syntax check – concludes SQL syntactic validity. Example:
SELECT * FORM employee
Here error of wrong spelling of FROM is given by this check.
—  Semantic check – determines whether the statement is meaningful
or not. Example: query contains a tablename which does not exist is
checked by this check.
—  Shared Pool check – Every query possess a hash code during its
execution. So, this check determines existence of written hash code
in shared pool if code exists in shared pool then database will not
take additional steps for optimization and execution.
ž  Hard Parse and Soft Parse –
If there is a fresh query and its hash code does not exist in
shared pool then that query has to pass through from the
additional steps known as hard parsing otherwise if hash code
exists then query does not passes through additional steps. It just
passes directly to execution engine (refer detailed diagram). This
is known as soft parsing.
Hard Parse includes following steps – Optimizer and Row source
generation.
2. Optimization

ž  SQL is a very high level language:


—  The users specify what to search for- not how the search is
actually done
—  The algorithms are chosen automatically by the DBMS.
ž  For a given SQL query there may be many possible execution
plans.
ž  Amongst all equivalent plans choose the one with lowest cost.
ž  Cost is estimated using statistical information from the database
catalog.
ž  Step-2:
Optimizer:
ž  During optimization stage, database must perform a hard parse
atleast for one unique DML statement and perform optimization
during this parse. This database never optimizes DDL unless it
includes a DML component such as subquery that require
optimization.It is a process in which multiple query execution plan
for satisfying a query are examined and most efficient query plan
is satisfied for execution.
Database catalog stores the execution plans and then optimizer
passes the lowest cost plan for execution.

ž  Row Source Generation –


The Row Source Generation is a software that receives a optimal
execution plan from the optimizer and produces an iterative
execution plan that is usable by the rest of the database. the
iterative plan is the binary program that when executes by the sql
engine produces the result set.
3. Evaluation

ž  The query evaluation engine takes a query


evaluation plan, executes that plan and returns
the answer to that query.
ž  Step-3:
Execution Engine: Finally runs the query and
display the required result.

You might also like