You are on page 1of 68

Advanced Database systems

Chapter 1
Transaction Management and
Concurrency Control
1
Agenda
 Introduction
 Transaction and System Concepts
 Properties of Transaction
 Schedules and Recoverability
 Serializability of Schedules

2
Why We study about
Transaction?

 To understand the basic properties of a


transaction and learn the concepts
underlying transaction processing as
well as the concurrent executions of
transactions.

3
Introduction
 One criterion for classifying a database system is according to
the number of users who can use the system concurrently.
 Single-user VS multi-user systems
 A DBMS is single-user if at most one user can use the system

at a time
 A DBMS is multi-user if many users can use the system and

access database concurrently.


 Problem
How to make the simultaneous interactions of multiple users
with the database safe, consistent, correct, and efficient?

4
Introduction(cont..)
 It depends on
 Computing systems(CPU +programming language)
 Single-processor computer system(one cpu)
 Multiprogramming execute some commands from one process, then
suspend that process and execute some commands from the next process.
A process is resumed at the point where it was suspended
whenever
it gets its turn to use the CPU again.
 concurrent execution of processes is actually interleaved

 Inter-leaved Execution

 Multi-processor computer system (multiple CPUs)


 Parallel processing

5
Concurrent Transactions

B B B
CPU2
A A
CPU1 A
CPU1

time
t1 t2 t1 t2
Interleaved processing Parallel processing
(Single processor) (Two or more processors)

6
Introduction (cont..)
 What is Transaction?
 Business(money) Exchange(dictionary definition)

 a unit of a program execution that accesses and

possibly modifies various data objects (tuples,


relations)
 Action, or series of actions, carried out by user or

application, which accesses or changes contents of


database.
 Basic operations a transaction can include “actions”:
 Reads, writes

7
 Special actions: commit, abort
Transaction: Database Read and Write
Operations
 A database is represented as a collection of named data
items
 Read-item (X)
1. Find the address of the disk block that contains item X
2. Copy the disk block into a buffer in main memory
3. Copy the item X from the buffer to the program variable named X
 Write-item (X)
1. Find the address of the disk block that contains item X.
2. Copy that disk block into a buffer in main memory
3. Copy item X from the program variable named X into its correct
location in the buffer.
4. Store the updated block from the buffer back to disk (either
immediately or at some later point in time). 8
Transaction(example)
 Example: fund Transfer
 transaction to transfer $50 from account A to account B:
 For a user it is one activity
 To database
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
 Two main issues to deal with:
 Failures of various kinds, such as hardware failures and system

crashes 9

Why concurrency control (During multiple
transaction Execution)is needed?

Process of managing simultaneous execution of


transactions in a shared database, to ensure the
serializability of transactions, is known as concurrency
control
 Mostly three problems are
1. The lost update problem

2. The temporary update (dirty read) problem

3. Incorrect summary problem

4. The Unrepeatable Read Problem 10


A Transaction: A Formal
Example
T1
t0
read_item(X);
read_item(Y);
X:=X - 400000;
Y:=Y + 400000;
tk
write _item(X);
write_item(Y);
11
Problems in concurrent execution
of Transaction(Lost Update)
Occurs when two transactions that access the same
database items have their operations interleaved in a way
that makes the value of some database items incorrect.

12
Problems in concurrent execution
of Transaction(Dirty Read)
 occurs when one transaction updates a database
item and then the transaction fails for some
reason.

TT1: : WRITE(X)
1 WRITE(X)
TT2: : READ(X)
2 READ(X)

TT1: : ABORT
1 ABORT

13
Problems in concurrent execution of
Transaction
( Incorrect Summary)
 Occurs if one transaction is calculating an aggregate summary
function on a number of database items while other transactions are
updating some of these items, the aggregate function may calculate
some values before they are updated and others after they are updated.

14
Transaction
(The Unrepeatable Read
Problem)
 a transaction T reads the same item
twice and the item is changed by
another transaction T’ between the two
reads. Hence, T receives different
values for its two reads of the same
item.

15
How those problems are
solved?
 DBMS has a Concurrency Control subsystem to assure database
remains in consistent state despite concurrent execution of
transactions.
 Other problems
 System failures may occur
 Types of failures:

 System crash

 Transaction or system error

 Local errors
 Concurrency control enforcement

 Disk failure

 Physical failures

 DBMS has a Recovery Subsystem to protect database against system


failures 16
Why recovery is needed?
1. A computer failure (system crash) e.g. main memory
failure.
2. A transaction or system error e.g. such as integer
overflow or division by zero
3. Local errors or exception conditions detected by the
transaction e.g. data for the transaction may not be found.
4. Concurrency control enforcement e.g. violation of
serializability.
5. Disk failure e.g. disk read/write head crash.
6. Physical problems and catastrophes e.g. power or air-
conditioning failure 17
Transaction and System
Concepts
 Transaction State
 A transaction is an atomic unit of work that should
either be completed in its entirety (Committed) or not
done at all (aborted).
 For recovery purposes, the system needs to keep track
of when each transaction starts, terminates, and
commits or aborts.
 the recovery manager of the DBMS needs to keep
track of the following operations:
18
Transaction and System
Concepts
 BEGIN_TRANSACTION: marks start of transaction
 READ or WRITE: two possible operations on the data
 END_TRANSACTION: marks the end of the read or
write operations; start checking whether everything
went according to plan
 COMIT_TRANSACTION: signals successful end of
transaction; changes can be “committed” to DB
 Partially committed
 ROLLBACK (or ABORT): signals unsuccessful end of
transaction, changes applied to DB must be undone
19
Transaction States
 state transition diagram illustrates how a
transaction moves through its execution states.

A transaction must be in one of these states. 20


Properties of Transaction
 ACID properties
 Atomicity

 Consistency

 Isolation

 Durability

21
Atomicity and Consistency
 Atomicity  Consistency
 Transactions are  Transactions take
atomic – they don’t the database from
have parts one consistent state
(conceptually)
into another
 can’t be executed
 In the middle of a
partially; it should not
be detectable that they transaction the
interleave with database might not
another transaction be consistent
22
Isolation and Durability

 Isolation  Durability
 The effects of a  Once a transaction has
transaction are not completed, its changes
visible to other
are made permanent
transactions until it has
completed  Even if the system
 From outside the crashes, the effects of a
transaction has either transaction must
happened or not remain in place

23
Properties of Transaction(cont..)
 Transfer £50 from account A Atomicity - shouldn’t take
to account B money from A without
Read(A) giving it to B
A = A - 50 Consistency - money isn’t
lost or gained
Write(A) transaction
Isolation - other queries
Read(B)
shouldn’t see A or B
B = B+50 change until completion
Write(B) Durability - the money does
not go back to A

24
Who will enforce the ACID
properties?
 The transaction manager
 It schedules the operations of transactions
 COMMIT and ROLLBACK are used to ensure atomicity
 Locks or timestamps are used to ensure consistency and
isolation for concurrent transactions (next lectures)
 A log is kept to ensure durability in the event of system
failure.

25
COMMIT and ROLLBACK

 COMMIT signals the  ROLLBACK signals


successful end of a the unsuccessful end
transaction of a transaction
 Any changes made  Any changes made

by the transaction by the transaction


should be saved should be undone
 These changes are  It is now as if the

now visible to other transaction never


transactions existed
26
The Transaction Log
 The transaction log  The log is stored on
records the details of all disk, not in memory
transactions  If the system crashes

 Any changes the it is preserved


transaction makes to  Write ahead log rule
the database  The entry in the log
 How to undo these
must be made before
changes COMMIT processing
 When transactions can complete
complete and how
27
Schedules and Recoverability
 What is Schedule?
 sequences that indicate the chronological order in which
instructions of concurrent transactions are executed.
 Ordering of execution of operations from various
transactions T1, T2, … , Tn is called a schedule S.
 Given multiple transactions,
 A schedule is a sequence of interleaved actions

from all transactions

28
Schedules and
Recoverability(cont..)
 a schedule for a set of transactions must consist of all
instructions of those transactions
 must preserve the order in which the instructions
appear in each individual transaction.
 Example
 Transaction T1: r1(X); w1(X); r1(Y); w1(Y); c1

Transaction T2: r2(X); w2(X); c2

 A schedule, S:
r1(X); r2(X); w1(X); r1(Y); w2(X); w1(Y); c1; c2
29
Schedule(cont..)
Operations
 read(Q,q)
read the value of the database item Q and store in
the local variable q.
 write(Q,q)
write the value of the database item Q and store in the
local variable q.
 other operations such as arithmetic

 commit

 rollback
30
Example: A “Good” Schedule

• One possible schedule,


initially X = 10, Y=10

Resulting Database: X=21,Y=21, X=Y


31
Example: A “Bad” Schedule

• Another possible
schedule

Resulting Database X=21,Y=21, X=Y


32
When does Conflicts occur
between two operations?
 Two operations conflict if they satisfy ALL
three conditions:
1. they belong to different transactions AND
2. they access the same item AND
3. at least one is a write_item()operation
 Example.: Transaction T1 T2
Read(X) Read(X)
Read(X) Write(X)
Write(X) Read(X)
33
Write(X) Write(X)
Serializability of Schedules
 What is serializable schedules?
 types of schedules that are always considered to be correct
when concurrent transactions are executing.
 Suppose that two users
 for example, two airline reservations agents submit to the
DBMS transactions T1 and T2 approximately at the same time.
 If no interleaving of operations is permitted, there are only two
possible outcomes:

34
Serializability of Schedules(cont..)
1. Execute all the operations of transaction
T1 (in sequence) followed by all the
operations of transaction T2 (in sequence).
2.Execute all the operations of transaction T2
(in sequence) followed by all the
operations of transaction T1 (in sequence).

35
Serializability of
Schedules(classification)
 Serial Schedule
 Non-serial schedule
 Serializable schedule
 Conflict equivalent—all pairs of

conflicting ops are ordered the same way


 View equivalent—all users get the same

view
36
Serializability of
Schedules(classification)
 Serial Schedule
Schedule where operations of each transaction are
executed consecutively without any interleaved
operations from other transactions. The opposite of
serial is non serial schedule.
 No guarantee that results of all serial executions of a
given set of transactions will be identical.

37
38
Serializability of Schedules
 Objective of serializability is to find non_serial schedules that
allow transactions to execute concurrently without interfering
with one another.

 In other words, want to find non_serial schedules that are


equivalent to some serial schedule. Such a schedule is called
serializable.
 When are two schedules equivalent?
• Option 1: They lead to same result (result equivalent)
• Option 2: The order of any two conflicting operations is the
same (conflict equivalent)
39
Result Equivalent Schedules
 Two schedules are result equivalent if they produce the
same final state of the database
 Problem: May produce same result by accident!

S2
S1
read_item(X);
read_item(X);
X:=X*1.1;
X:=X+10;
write_item(X);
write_item(X);
Schedules S1 and S2 are result equivalent for X=100 but not in general 40
Conflict Equivalent Schedules
 Let I and J be consecutive instructions by two different transactions within a schedule S.
 If I and J do not conflict, we can swap their order to produce a new schedule S'.

 The instructions appear in the same order in S and S', except for I and J, whose order

does not matter.


 Two schedules are conflict equivalent, if the order of any two conflicting operations is

the same in both schedules


 A schedule is conflict serializable if it can be transformed into a serial schedule by a

series of swappings of adjacent non-conflicting actions


 S and S' are termed conflict equivalent schedules.

41
Conflict Equivalent
schedule(example)
Serial Schedule S1
T1 T2
read_item(A);
write_item(A);
order doesn’t matter
order matters
read_item(B);
write_item(B);
read_item(A):
write_item(A);
read_item(B);
order matters order
write_item(B); doesn’t matter
42
Conflict Equivalence(Example)
Schedule S1’ T2
T1
read_item(A);
read_item(B);
same order as in S1
write_item(A);
read_item(A):
write_item(A);

write_item(B); same order as in S1


read_item(B);
write_item(B);
S1 and S1’ are conflict equivalent
(S1’ produces the same result as S1)

43
Example 2
 Consider the schedule S1 shown figure(a) in next slide containing operations
from two concurrently executing transactions T7 and T8. Since the write
operation on balx in T8 does not conflict with the subsequent read operation on
baly in T7, we can change the order of these operations to produce the
equivalent schedule S2 shown in Figure (b). If we also now change the order of
the following non-conflicting operations, we produce the equivalent serial
schedule S3 shown in figure (c).

44
Cont’d……
 Change the order of the write(balx) of
T8 with the write(baly) of T7.

 Change the order of the read(balx) of


T8 with the read(baly) of T7.

 Change the order of the read(balx) of


T8 with the write(baly) of T7

 Non serial S1

45
Cont’d

 Non serial S2 Serial Schedule(S3)


equivalent to S1 and S2

46
Testing for conflict serializability
 Under the constrained write rule (that is, a transaction updates a data item based on
its old value, which is first read by the transaction), a precedence (or serialization)
graph can be produced to test for conflict serializability.
 For a schedule S, a precedence graph is a directed graph G = (N, E) that consists of a
set of nodes N and a set of directed edges E, which is constructed as follows:
 Create a node for each transaction.
 Create a directed edge Ti → Tj, if Tj reads the value of an item written by Ti.
 Create a directed edge Ti → Tj, if Tj writes a value into an item after it has been
read by Ti.
 Create a directed edge Ti → Tj, if Tj writes a value into an item after it has been
written by Ti.

 If an edge Ti → Tj exists in the precedence graph for S, then in any serial schedule
S’ equivalent to S, Ti must appear before Tj. If the precedence graph contains a
cycle the schedule is not conflict serializable.
47
Example….
 Consider the two transactions shown in Figure below Transaction T9 is
transferring £100 from one account with balance balx to another account with
balance baly, while T10 is increasing the balance of these two accounts by 10%.

 Does this schedule is conflict serializable? Why?

48
Cont’d…
 Let us draw precedence graph

 Precedence graph for figure above showing a cycle, so schedule is not


conflict serializable.

49
Exercise
 Which of the following schedules are conflict serializable ,not conflict
serializable and draw equivalent serial schedule.

50
View Equivalence and View
Serializability
 Another less restrictive definition of equivalence of schedules is called view
equivalence.
 This leads to another definition of serializability called view serializability.
 Two schedules S and S’ are said to be view equivalent if the following three
conditions hold:
1. The same set of transactions participates in S and S’, and S and S’ include the same
operations of those transactions.
2. For any operation ri(X) of Ti in S, if the value of X read by the operation has been
written by an operation wj(X) of Tj (or if it is the original value of X before the
schedule started), the same condition must hold for the value of X read by operation
ri(X) of Ti in S’.
3. If the operation wk(Y) of Tk is the last operation to write item Y in S, then wk(Y) of
Tk must also be the last operation to write item Y in S’.
 A schedule is view serializable if it is view equivalent to a serial schedule

51
Cont’d…
 Every conflict serializable schedule is view serializable, although the converse is
not true.
 e.g. The schedule below is view serializable, although it is not conflict serializable.

 In this example, transactions T12 and T13 do not conform to the constrained
write rule; in other words, they perform blind writes.
 any view serializable schedule that is not conflict serializable contains one or more
blind writes.

52
Recoverability
 Serializability identifies schedules that maintain the consistency of the database,
assuming that none of the transactions in the schedule fails.

 An alternative perspective examines the recoverability of transactions within a schedule.

 If a transaction fails, the atomicity property requires that we undo the effects of the
transaction.

 In addition, the durability property states that once a transaction commits, its changes
cannot be undone. This leads to recoverable schedule.

 Recoverable schedule A schedule where, for each pair of transactions Ti and Tj, if Tj
reads a data item previously written by Ti, then the commit operation of Ti precedes the
commit operation of Tj.

53
Concurrency control techniques
 Serializability can be achieved in several ways.
 There are two main concurrency control techniques that allow transactions to execute
safely in parallel subject to certain constraints: locking and timestamp methods.
Locking Methods
 What is Locking? A procedure used to control concurrent access to data. When one
transaction is accessing the database, a lock may deny access to other transactions to
prevent incorrect results.
 There are several locking variations, but all share the same fundamental characteristic,
namely that a transaction must claim a shared (read) or exclusive (write) lock on a data
item before the corresponding database read or write operation.
 Shared lock: If a transaction has a shared lock on a data item, it can read the item but
not update it.
 Exclusive lock: If a transaction has an exclusive lock on a data item, it can both read
and update the item.

54
Cont’d…
 If a transaction holds the exclusive lock on the item, no other transactions can read
or update that data item.
How the locks are used?
 Any transaction that needs to access a data item must first lock the item (i.e.
requesting for shared or exclusive locks.)
 If the item is not already locked by another transaction, the lock will be granted.
 If the item is currently locked, the DBMS determines whether the request is
compatible with the existing lock. If a shared lock is requested on an item that
already has a shared lock on it, the request will be granted; otherwise, the
transaction must wait until the existing lock is released.
 A transaction continues to hold a lock until it explicitly releases it either during
execution or when it terminates (aborts or commits). It is only when the exclusive
lock has been released that the effects of the write operation will be made visible to
other transactions.
55
Incorrect locking schedule
 Assume the following schedule  If we schedule the transactions
that we have seen in the earlier: by applying lock it becomes
S = {write_lock(T9, balx), read(T9,
balx), write(T9, balx),
unlock(T9,balx),write_lock(T10,
balx), read(T10, balx),
write(T10, balx), unlock(T10,
balx),write_lock(T10, baly),
read(T10, baly), write(T10,
baly), unlock(T10, baly),
commit(T10), write_lock(T9,
baly), read(T9, baly), write(T9,
baly),unlock(T9, baly),
commit(T9)}
56
cont’d…(diagrammatically)
 If, prior to execution, balx = 100,
baly = 400, the result should be
balx = 220, baly = 330, if T9
executes before T10, or balx =
210 and baly = 340, if T10
executes before T9. However, the
result of executing schedule S
would give balx = 220 and baly =
340. (S is not a serializable
schedule.) till the serializability
isn’t guaranteed.
 To guarantee serializability, we
must follow an additional protocol
i.e. 2PL

57
Two-phase locking (2PL)
 A transaction follows the two-phase locking protocol if all locking
operations precede the first unlock operation in the transaction.
 According to the rules of this protocol, every transaction can be divided
into two phases: a growing phase and shrinking phase.
 Growing phase- in which it acquires all the locks needed but cannot
release any locks.
 Shrinking phase - in which it releases its locks but cannot acquire any
new locks.

58
Preventing the lost update
problem using 2PL
 T2 blocks T1 from accessing balx because T2 issued with exclusive lock.
 e.g.

59
Preventing the uncommitted
dependency (Dirty read) problem
using 2PL
 To prevent this problem occurring, T4 first requests an exclusive lock on
balx. It can then proceed to read the value of balx from the database,
increment it by £100, and write the new value back to the database. When
the rollback is executed, the updates of transaction T4 are undone and the
value of balx in the database is returned to its original value of £100.
 Then the exclusive lock is released by T4 and granted by T3.

60
Preventing the inconsistent
analysis (incorrect summary)
problem using 2PL
 To prevent this problem occurring, T5 must precede its reads by exclusive
locks, and T6 must precede its reads with shared locks. Therefore, when
T5 starts it requests and obtains an exclusive lock on balx. Now, when T6
tries to share lock balx the request is not immediately granted and T6 has
to wait until the lock is released, which is when T5 commits.

61
Cascading rollback
 Is a situation, in which a single transaction leads to a series of rollbacks.
 E.g. Consider a schedule consisting of the three transactions shown in
Figure below, which conforms to the two-phase locking protocol. All txns
executing their database operations. Meanwhile, T14 has failed and has
been rolled back. However, since T15 is dependent on T14 (it has read an
item that has been updated by T14), T15 must also be rolled back.
Similarly, T16 is dependent on T15, so it too must be rolled back.

62
Cascading rollback (cont’d…)
 Are undesirable since they potentially lead to the undoing of a significant
amount of work.
 Clearly, it would be useful if we could design protocols that prevent
cascading rollbacks.
 One way to achieve this with two-phase locking is to leave the release of
all locks until the end of the transaction.
 In this way, the problem illustrated earlier slide would not occur, as T15
would not obtain its exclusive lock until after T14 had completed the
rollback. This is called rigorous 2PL.
 Another variant of 2PL, called strict 2PL, only holds exclusive locks
until the end of the transaction.
 Most database systems implement one of these two variants of 2PL.

63
Concurrency control with index
structures
 can be managed by treating each page of the index as a data item and applying
the two-phase locking protocol described earlier.

 However, since indexes are likely to be frequently accessed, particularly the


higher levels of trees (as searching occurs from the root downwards), this
simple concurrency control strategy may lead to high lock contention.
 Therefore, a more efficient locking protocol is required for indexes.

 For searches, obtain shared locks on nodes starting at the root and proceeding
downwards along the required path. Release the lock on a (parent) node once a
lock has been obtained on the child node.
 For insertions, a conservative approach would be to obtain exclusive locks on
all nodes as we descend the tree to the leaf node to be modified. This ensures
that a split in the leaf node can propagate all the way up the tree to the root.
However, if a child node is not full, the lock on the parent node can be released.
64
Latches
 DBMSs also support another type of lock called a latch.
 It is held for a much shorter duration than a normal lock.
 A latch can be used before a page is read from, or written to, disk to
ensure that the operation is atomic.
 For example, a latch would be obtained to write a page from the database
buffers to disk, the page would then be written to disk, and the latch
immediately unset.

65
Deadlock
 An impasse that may result when two (or more) transactions are each waiting
for locks to be released that are held by the other.
 Once deadlock occurs, the applications involved cannot resolve the problem.
Instead, the DBMS has to recognize that deadlock exists and break the
deadlock in some way.
 Unfortunately, there is only one way to break deadlock: abort one or more of
the transactions.
 Figure in next slide shows two transactions, T17 and T18, that are deadlocked
because each is waiting for the other to release a lock on an item it holds. At
time t2, transaction T17 requests and obtains an exclusive lock on item balx,
and at time t3 transaction T18 obtains an exclusive lock on item baly. Then at
t6, T17 requests an exclusive lock on item baly. Since T18 holds a lock on
baly, transaction T17 waits. Meanwhile, at time t7, T18 requests a lock on item
balx, which is held by transaction T17. Neither transaction can continue
because each is waiting for a lock it cannot obtain until the other completes. 66
Example
 In Figure below we may decide to abort transaction T18. Once this is
complete, the locks held by transaction T18 are released and T17 is able to
continue again. Deadlock should be transparent to the user, so the DBMS
should automatically restart the aborted transaction(s)

67
cont’d…
 There are three general techniques for handling deadlock: timeouts,
deadlock prevention, and deadlock detection and recovery.
 With timeouts, the transaction that has requested a lock waits for at most a
specified period of time.
 Using deadlock prevention, the DBMS looks ahead to determine if a
transaction would cause deadlock, and never allows deadlock to occur.
 Using deadlock detection and recovery, the DBMS allows deadlock to
occur but recognizes occurrences of deadlock and breaks them.
 Since it is more difficult to prevent deadlock than to use timeouts or
testing for deadlock and breaking it when it occurs, systems generally
avoid the deadlock prevention method.

68

You might also like