You are on page 1of 30

UNIT-5

Introduction to Concurrency Control in DBMS

Concurrency control in DBMS is an important concept that is related to

the transactions and data consistency of the database management

systems. Concurrency control refers to the process of managing

independent operations of the database that are simultaneous and

considered as a transaction in DBMS. Concurrency Control works on the

principle of maintaining the transactions state that can be a complete

transaction or an incomplete transaction. In the case of the complete

transaction, all the associated database operations need to be completed

with specified rules and sequences, whereas an incomplete transaction

occurs in case all the database operations are not completed due to some

technical, power failure, or network connectivity issue.

How does Concurrency Control work in DBMS?


The concurrency control is the process to maintain the data where there

are multiple resources or users are accessing the data element and
performing the database operations. There are several enterprise systems

such as banking, ticket booking, and traffic light systems that use a

shared database as part of the data store associated with concurrent

transactions. There is a chance of conflict for these transactions and

resulting data inconsistency.

We will discuss the protocols and the problems related to concurrency

control in DBMS.

Concurrency Control Protocols


Concurrency control protocols are the techniques used to maintain data

consistency, atomicity, and serializability. Following are some of the

concurrency control protocols

1. Lock based
The lock-based protocol is the technique of applying lock condition on

the data element, which helps in restricting another resource to perform

read and write operation until the lock is active. There are mainly two

types of lock such as shared or read-only lock and exclusive lock.

2. Validation based
The validation based protocol is also known as an optimistic

concurrency control technique. It involves the read phase, validation

phase, and writes phase for concurrency control.

3. Timestamp based
The timestamp-based protocol uses system time or logical count as a

timestamp to serialize the execution of concurrent transactions that helps

to maintain the order of the transactions.

4. Two-phase protocol
The two-phase protocol (2PL) is a locking mechanism that ensures

serializability by using two distinct phases of lock condition. It uses the

expanding phase and shrinking phase to acquire and release the lock

condition to maintain concurrency control.

Concurrency Control Problems


There are multiple problems that can arise in concurrent transaction

scenarios. Some of the common problems are:

1. Dirty Read
Dirty read or temporary update problems happen while there is an

incomplete transaction. In this scenario, the data element or item got

updated by one transaction and filed before completing it. And another

transaction tries to access the data element before it is modified or rolled

back to its last value.

Transaction T1 Transaction T2

Read(X) Read(X)

X=X-n # X Value X-n

X=X+n1

Write(X)

Write(X)

operation Failed

Explanation: As shown in the table the transaction T1 reads a data item

X as Reading (X) operation and performs some arithmetic operation on

the Value X using a numeric value n with Write (X-n) operation. While
the write operation in action, it got interrupted and not yet reverted to the

databases, The Transaction T2 tries to read the Value X through reading

(X) operation that represents the value as X-n. This results in a dirty read

problem.

2. Unrepeatable Read
Unrepeatable Read is the scenario where two or more read operations

read the same variable as different values and that value is modified by a

different transaction by writing operations.

Transaction T1 Transaction T2

Read(X) Read(X)

X=X-n Read(X)

Write(X)

Explanation: The table shows two transactions T1 and T2. The T1

reads the Variable X and performs an arithmetic operation as X-n with

numeric value n, At the Same time T2 reads the value X and captures the

initial value of X. Next T1 performs a Write(X) operation and modified


the value of X in the database. Next T2 reads the X values again and this

time it finds a different value of X due to the T1. This results in an

unrepeatable read problem.

3. Phantom Read
Phantom read problem refers to the scenario where the Transaction reads

a variable once and when it tries to read the variable again it gets an

error showing the variable does not exist, as the variable is deleted by

another transaction.

Transaction T1 Transaction T2

Read(X) Read(X)

Delete(X) Read(X)

Explanation: The Table shows T1 reads Variable X, simultaneously T2

reads X. The T1 Deletes X with Delete(X) operation, without T2

acknowledgment. While, T2 tried to read the variable X again, it not

able to find the variable. This results in the phantom read problem.

4. Lost updates
Lost updates are the concurrency problem scenario where modification

to the variable done by a transaction is lost due to write operation by

another transaction.

Transaction T1 Transaction T2

Read(X) X=X+n1

X=X+n Write(X)

Explanation: The Table shows the T1 reads the variable X and modifies

the values by adding a numerical value n in the operation X=X+n

statement. However, T2 performs X=X+n1 statement that overwrites the

T1 arithmetic operation. This results in a lost update problem for the T1

transaction.

5. Incorrect Summary
An incorrect Summary problem in concurrency control scenario appears

while a transaction applies an aggregate function to some of the

variables while another transaction tries to update the variable.


Transaction T1 Transaction T2

Read(X) Read(X)

Sum=0 X=X+n

Sum=Sum+X Write(X)

Explanation: The tables show Transaction T1 reads the variable X and

uses the Value of X to generate the aggregate value of Sum=Sum+X,

whereas T2 reads the value of X, Modifies it by X=X+n statement and

writes it to the database using Write(X) operation. It results in an

incorrect summary problem in T1.

Conclusion
Concurrency Control in DBMS is a very useful technique to maintain

mutually exclusive transactions for database operations. It manages the

requests and streamlines the operations where multiple systems or

processes try accessing the same database resource. It helps in data

integrity across systems and avoids the occurrence of transaction

conflicts
Lock-Based Protocol

In this type of protocol, any transaction cannot read or write data


until it acquires an appropriate lock on it. There are two types of
lock:

1. Shared lock:

o It is also known as a Read-only lock. In a shared lock, the


data item can only read by the transaction.
o It can be shared between the transactions because when the
transaction holds a lock, then it can't update the data on the
data item.

2. Exclusive lock:

o In the exclusive lock, the data item can be both reads as well
as written by the transaction.
o This lock is exclusive, and in this lock, multiple transactions
do not modify the same data simultaneously.

There are four types of lock protocols available:

1. Simplistic lock protocol

It is the simplest way of locking the data while transaction.


Simplistic lock-based protocols allow all the transactions to get
the lock on the data before insert or delete or update on it. It will
unlock the data item after completing the transaction.
2. Pre-claiming Lock Protocol

o Pre-claiming Lock Protocols evaluate the transaction to list


all the data items on which they need locks.
o Before initiating an execution of the transaction, it requests
DBMS for all the lock on all those data items.
o If all the locks are granted then this protocol allows the
transaction to begin. When the transaction is completed
then it releases all the lock.
o If all the locks are not granted then this protocol allows the
transaction to rolls back and waits until all the locks are
granted.

3. Two-phase locking (2PL)

o The two-phase locking protocol divides the execution phase


of the transaction into three parts.
o In the first part, when the execution of the transaction starts,
it seeks permission for the lock it requires.
o In the second part, the transaction acquires all the locks. The
third phase is started as soon as the transaction releases its
first lock.
o In the third phase, the transaction cannot demand any new
locks. It only releases the acquired locks.

There are two phases of 2PL:

Growing phase: In the growing phase, a new lock on the data


item may be acquired by the transaction, but none can be
released.

Shrinking phase: In the shrinking phase, existing lock held by the


transaction may be released, but no new locks can be acquired.

In the below example, if lock conversion is allowed then the


following phase can happen:
1. Upgrading of lock (from S(a) to X (a)) is allowed in growing
phase.
2. Downgrading of lock (from X(a) to S(a)) must be done in
shrinking phase.

Example:
The following way shows how unlocking and locking work with 2-
PL.

Transaction T1:

o Growing phase: from step 1-3


o Shrinking phase: from step 5-7
o Lock point: at 3

Transaction T2:

o Growing phase: from step 2-6


o Shrinking phase: from step 8-9
o Lock point: at 6

4. Strict Two-phase locking (Strict-2PL)

o The first phase of Strict-2PL is similar to 2PL. In the first


phase, after acquiring all the locks, the transaction continues
to execute normally.
o The only difference between 2PL and strict 2PL is that Strict-
2PL does not release a lock after using it.
o Strict-2PL waits until the whole transaction to commit, and
then it releases all the locks at a time.
o Strict-2PL protocol does not have shrinking phase of lock
release.
It does not have cascading abort as 2PL does.

Timestamp based Concurrency Control


Concurrency Control can be implemented in different ways. One
way to implement it is by using Locks. Now, let us discuss Time
Stamp Ordering Protocol.
As earlier introduced, Timestamp is a unique identifier created
by the DBMS to identify a transaction. They are usually assigned
in the order in which they are submitted to the system. Refer to
the timestamp of a transaction T as TS(T). For the basics of
Timestamp, you may refer here.
Timestamp Ordering Protocol –
The main idea for this protocol is to order the transactions based
on their Timestamps. A schedule in which the transactions
participate is then serializable and the only equivalent serial
schedule permitted has the transactions in the order of their
Timestamp Values. Stating simply, the schedule is equivalent to
the particular Serial Order corresponding to the order of the
Transaction timestamps. An algorithm must ensure that, for each
item accessed by Conflicting Operations in the schedule, the
order in which the item is accessed does not violate the ordering.
To ensure this, use two Timestamp Values relating to each
database item X.
 W_TS(X) is the largest timestamp of any transaction that
executed write(X) successfully.
 R_TS(X) is the largest timestamp of any transaction that
executed read(X) successfully.
Basic Timestamp Ordering –
Every transaction is issued a timestamp based on when it enters
the system. Suppose, if an old transaction T i has timestamp
TS(Ti), a new transaction Tj is assigned timestamp TS(Tj) such
that TS(Ti) < TS(Tj). The protocol manages concurrent execution
such that the timestamps determine the serializability order. The
timestamp ordering protocol ensures that any conflicting read
and write operations are executed in timestamp order. Whenever
some Transaction T tries to issue a R_item(X) or a W_item(X),
the Basic TO algorithm compares the timestamp
of T with R_TS(X) & W_TS(X) to ensure that the Timestamp
order is not violated. This describes the Basic TO protocol in the
following two cases.
1. Whenever a Transaction T issues
a W_item(X) operation, check the following conditions:
 If R_TS(X) > TS(T) or if W_TS(X) > TS(T), then
abort and rollback T and reject the operation.
else,
 Execute W_item(X) operation of T and set
W_TS(X) to TS(T).
2. Whenever a Transaction T issues a R_item(X) operation,
check the following conditions:
 If W_TS(X) > TS(T), then abort and reject T and
reject the operation, else
 If W_TS(X) <= TS(T), then execute the R_item(X)
operation of T and set R_TS(X) to the larger of
TS(T) and current R_TS(X).

Whenever the Basic TO algorithm detects two conflicting


operations that occur in an incorrect order, it rejects the latter of
the two operations by aborting the Transaction that issued it.
Schedules produced by Basic TO are guaranteed to be conflict
serializable. Already discussed that using Timestamp can ensure
that our schedule will be deadlock free.
One drawback of the Basic TO protocol is that Cascading
Rollback is still possible. Suppose we have a Transaction T 1 and
T2 has used a value written by T1. If T1 is aborted and
resubmitted to the system then, T 2 must also be aborted and
rolled back. So the problem of Cascading aborts still prevails.
Let’s gist the Advantages and Disadvantages of Basic TO
protocol:

 Timestamp Ordering protocol ensures serializability since


the precedence graph will be of the form:

Image – Precedence Graph for TS ordering


 Timestamp protocol ensures freedom from deadlock as
no transaction ever waits.
 But the schedule may not be cascade free, and may not
even be recoverable.
Strict Timestamp Ordering –
A variation of Basic TO is called Strict TO ensures that the
schedules are both Strict and Conflict Serializable. In this
variation, a Transaction T that issues a R_item(X) or W_item(X)
such that TS(T) > W_TS(X) has its read or write operation
delayed until the Transaction T‘ that wrote the values of X has
committed or aborted.

Validation Based Protocol in DBMS


Validation Based Protocol is also called Optimistic
Concurrency Control Technique. This protocol is used in DBMS
(Database Management System) for avoiding concurrency in
transactions. It is called optimistic because of the assumption it
makes, i.e. very less interference occurs, therefore, there is no
need for checking while the transaction is executed.

In this technique, no checking is done while the transaction is


been executed. Until the transaction end is reached updates in
the transaction are not applied directly to the database. All
updates are applied to local copies of data items kept for the
transaction. At the end of transaction execution, while execution
of the transaction, a validation phase checks whether any of
transaction updates violate serializability. If there is no violation
of serializability the transaction is committed and the database is
updated; or else, the transaction is updated and then restarted.
Optimistic Concurrency Control is a three-phase protocol. The
three phases for validation based protocol:

1. Read Phase:
Values of committed data items from the database can
be read by a transaction. Updates are only applied to
local data versions.

2. Validation Phase:
Checking is performed to make sure that there is no
violation of serializability when the transaction updates
are applied to the database.

3. Write Phase:
On the success of the validation phase, the transaction
updates are applied to the database, otherwise, the
updates are discarded and the transaction is slowed
down.

The idea behind optimistic concurrency is to do all the checks at


once; hence transaction execution proceeds with a minimum of
overhead until the validation phase is reached. If there is not
much interference among transactions most of them will have
successful validation, otherwise, results will be discarded and
restarted later. These circumstances are not much favourable for
optimization techniques, since, the assumption of less
interference is not satisfied.
Validation based protocol is useful for rare conflicts. Since only
local copies of data are included in rollbacks, cascading
rollbacks are avoided. This method is not favourable for longer
transactions because they are more likely to have conflicts and
might be repeatedly rolled back due to conflicts with short
transactions.
In order to perform the Validation test, each transaction should
go through the various phases as described above. Then, we
must know about the following three time-stamps that we
assigned to transaction Ti, to check its validity:
1. Start(Ti): It is the time when Ti started its execution.
2. Validation(Ti): It is the time when Ti just finished its read
phase and begin its validation phase.
3. Finish(Ti): the time when Ti end it’s all writing operations in
the database under write-phase.
Two more terms that we need to know are:
1. Write_set: of a transaction contains all the write operations
that Ti performs.
2. Read_set: of a transaction contains all the read operations
that Ti performs.
In the Validation phase for transaction T i the protocol inspect that
Ti doesn’t overlap or intervene with any other transactions
currently in their validation phase or in committed. The validation
phase for Ti checks that for all transaction T j one of the following
below conditions must hold to being validated or pass validation
phase:
1. Finish(Tj)<Starts(Ti), since Tj finishes its execution means
completes its write-phase before T i started its execution(read-
phase). Then the serializability indeed maintained.
2. Ti begins its write phase after Tj completes its write phase, and
the read_set of Ti should be disjoint with write_set of T j.
3. Tj completes its read phase before T i completes its read phase
and both read_set and write_set of T i are disjoint with the
write_set of Tj.
Ex: Here two Transactions Ti and Tj are given, since
TS(Tj)<TS(Ti) so the validation phase succeeds in the Schedule-
A. It’s noteworthy that the final write operations to the database
are performed only after the validation of both Ti and T j. Since
Ti reads the old values
of x(12) and y(15) while print(x+y) operation unless final write
operation take place.
Schedule-A
Tj Ti

r(x) // x=12

r(x)

x=x-10
r(y) //y=15

y=y+10
r(x)

<validate>
print(x+y)
Tj Ti

<validate>

w(x)
w(y)
Schedule-A is a
validated schedule
Advantages:
1. Avoid Cascading-rollbacks: This validation based scheme
avoid cascading rollbacks since the final write operations to the
database are performed only after the transaction passes the
validation phase. If the transaction fails then no updation
operation is performed in the database. So no dirty read will
happen hence possibilities cascading-rollback would be null.
2. Avoid deadlock: Since a strict time-stamping based
technique is used to maintain the specific order of transactions.
Hence deadlock isn’t possible in this scheme.
Disadvantages:
1. Starvation: There might be a possibility of starvation for long-
term transactions, due to a sequence of conflicting short-term
transactions that cause the repeated sequence of restarts of the
long-term transactions so on and so forth. To avoid starvation,
conflicting transactions must be temporarily blocked for some
time, to let the long-term transactions to finish.

Multiple Granularity

Let's start by understanding the meaning of granularity.

Granularity: It is the size of data item allowed to lock.


Multiple Granularity:

o It can be defined as hierarchically breaking up the database


into blocks which can be locked.
o The Multiple Granularity protocol enhances concurrency and
reduces lock overhead.
o It maintains the track of what to lock and how to lock.
o It makes easy to decide either to lock a data item or to
unlock a data item. This type of hierarchy can be graphically
represented as a tree.

For example: Consider a tree which has four levels of nodes.

o The first level or higher level shows the entire database.


o The second level represents a node of type area. The higher
level database consists of exactly these areas.
o The area consists of children nodes which are known as files.
No file can be present in more than one area.
o Finally, each file contains child nodes known as records. The
file has exactly those records that are its child nodes. No
records represent in more than one file.
o Hence, the levels of the tree starting from the top level are as
follows:
1. Database
2. Area
3. File
4. Record
In this example, the highest level shows the entire database. The
levels below are file, record, and fields.

There are three additional lock modes with multiple granularity:

Intention Mode Lock

Intention-shared (IS): It contains explicit locking at a lower level


of the tree but only with shared locks.
Intention-Exclusive (IX): It contains explicit locking at a lower
level with exclusive or shared locks.

Shared & Intention-Exclusive (SIX): In this lock, the node is


locked in shared mode, and some node is locked in exclusive
mode by the same transaction.

Compatibility Matrix with Intention Lock Modes: The below


table describes the compatibility matrix for these lock modes:

It uses the intention lock modes to ensure serializability. It


requires that if a transaction attempts to lock a node, then that
node must follow these protocols:

o Transaction T1 should follow the lock-compatibility matrix.


o Transaction T1 firstly locks the root of the tree. It can lock it
in any mode.
o If T1 currently has the parent of the node locked in either IX
or IS mode, then the transaction T1 will lock a node in S or IS
mode only.
o If T1 currently has the parent of the node locked in either IX
or SIX modes, then the transaction T1 will lock a node in X,
SIX, or IX mode only.
o If T1 has not previously unlocked any node only, then the
Transaction T1 can lock a node.
o If T1 currently has none of the children of the node-locked
only, then Transaction T1 will unlock a node.

Observe that in multiple-granularity, the locks are acquired in top-


down order, and locks must be released in bottom-up order.

o If transaction T1 reads record R a9 in file Fa, then transaction


T1 needs to lock the database, area A 1 and file Fa in IX mode.
Finally, it needs to lock Ra2 in S mode.
o If transaction T2 modifies record R a9 in file Fa, then it can do
so after locking the database, area A 1 and file Fa in IX mode.
Finally, it needs to lock the Ra9 in X mode.
o If transaction T3 reads all the records in file F a, then
transaction T3 needs to lock the database, and area A in IS
mode. At last, it needs to lock Fa in S mode.
o If transaction T4 reads the entire database, then T4 needs to
lock the database in S mode.

Multi-Version Schemes of Concurrency Control


with example
Multi-version protocol minimizes the delay for reading

operation and maintains different versions of data

items. For each writes operation performed, it creates a

new version of transaction data so that whenever any


transaction performs read operation to read that data

then the appropriate created data version is selected by

the control manager to make that read operation

conflict-free and successful.

When the write operation and new version of data is

created then that new version contains some

information that is given below

1. Content: This field contains the data value of

that version.

2. Write_timestamp: This field contains the


timestamp of the transaction whose new
version is created.
3. Read_timestamp: This field contains the

timestamp of the transaction of that

transaction that will read that newly created

value.
Now let us understand this concept using an example.

Let T1 and T2 be two transactions having timestamp

values 15 and 10, respectively.

The transaction T2 calls for a write operation on data

(let’s say X) from the database. As T2 calls write

operation, then a new version of data value X is

created, which contains the value of X, timestamp of

T2, and timestamp of that transaction which will read X,

but in this case, no one is reading the newly created

value so that filed remains empty.

X 10

Now let the transaction T1 (having timestamp 15) call a

read operation to read the newly created value X, then

the newly created variable contained will be

X 10 15
Now let’s discuss some important cases:

If timestamp of T2 is less than or equal to timestamp

of T1 then

1. Read(X) operation performed by T1: In this

case, content of X is returned to T1.

2. Write(X) operation performed by T1: In this

case, T1 will be rolled back if timestamp of T1

is smaller than timestamp of read operation on

X. And if timestamp of T1 is equal to the

timestamp of write operation on X then the

new version is created again with new

contents.

Recovery With Concurrent Transactions

Concurrency control means that multiple transactions can be executed at


the same time and then the interleaved logs occur. But there may be
changes in transaction results so maintain the order of execution of those
transactions.
During recovery, it would be very difficult for the recovery system to
backtrack all the logs and then start recovering.
Recovery with concurrent transactions can be done in the following four
ways.
1. Interaction with concurrency control
2. Transaction rollback
3. Checkpoints
4. Restart recovery
Interaction with concurrency control :
In this scheme, the recovery scheme depends greatly on the concurrency
control scheme that is used. So, to rollback a failed transaction, we must
undo the updates performed by the transaction.
Transaction rollback :
 In this scheme, we rollback a failed transaction by using the log.
 The system scans the log backward a failed transaction, for
every log record found in the log the system restores the data
item.
Checkpoints :
 Checkpoints is a process of saving a snapshot of the
applications state so that it can restart from that point in case of
failure.
 Checkpoint is a point of time at which a record is written onto
the database form the buffers.
 Checkpoint shortens the recovery process.
 When it reaches the checkpoint, then the transaction will be
updated into the database, and till that point, the entire log file
will be removed from the file. Then the log file is updated with
the new step of transaction till the next checkpoint and so on.
 The checkpoint is used to declare the point before which the
DBMS was in the consistent state, and all the transactions were
committed.
To ease this situation, ‘Checkpoints‘ Concept is used by the most
DBMS.
 In this scheme, we used checkpoints to reduce the number of
log records that the system must scan when it recovers from a
crash.
 In a concurrent transaction processing system, we require that
the checkpoint log record be of the form <checkpoint L>,
where ‘L’ is a list of transactions active at the time of the
checkpoint.
 A fuzzy checkpoint is a checkpoint where transactions are
allowed to perform updates even while buffer blocks are being
written out.
Restart recovery :
 When the system recovers from a crash, it constructs two lists.
 The undo-list consists of transactions to be undone, and the
redo-list consists of transaction to be redone.
 The system constructs the two lists as follows: Initially, they are
both empty. The system scans the log backward, examining
each record, until it finds the first <checkpoint> record.

You might also like