You are on page 1of 59

Transaction Processing concepts,

Concurrency Control and Recovery

PART 1

1
Lecture Objectives:
• Discuss transaction processing concepts
• Describe desirable properties of transactions
• Achieving the desired properties:
• Concurrency Control
• Recovery

2
What is a Transaction
• A transaction is the basic logical unit of execution in an
information system.

• A transaction is a sequence of operations that must be


executed as a whole, taking a consistent (& correct)
database state into another consistent (& correct) database
state;

• A collection of actions that make consistent transformations


of system states while preserving system consistency.
3
• An indivisible unit of processing against the database
Transaction concepts (2)
database in a database in a
consistent consistent
state state

Transfer £500

Account A Fred Bloggs £1000 Account A Fred Bloggs £500

Account B Sue Smith £0 Account B Sue Smith £500

begin Transaction execution of Transaction end Transaction

database may be
temporarily in an
inconsistent state
during execution

4
Transaction concepts (3)
• Single-user VS multi-user systems
• A DBMS is single-user if at most one user can use the
system at a time
• A DBMS is multi-user if many users can use the system
concurrently - the most common occurrence
• Problem
How to make the simultaneous interactions of multiple
users with the database safe, consistent, correct, and
efficient?

5
Transaction concepts (4)
• Concurrency in Computing systems
• Single-processor computer system
• Multiprogramming
• Inter-leaved Execution
• Pseudo-parallel processing
• Multi-processor computer system
• Parallel processing

6
Transaction concepts (5)

B B B
CPU2
A A
CPU1 A
CPU1

time
t1 t2 t1 t2
Interleaved processing Parallel processing
(Single processor) (Two or more processors)

7
Transaction concepts (6)
• A transaction T is a logical unit of
database processing that includes one or
more database access operations and can
be either:
• Embedded within an application
program
• Specified interactively (e.g., via SQL)

8
Transaction concepts (7)
• Transaction have boundaries:

• Begin/end transaction

• Types of transactions
• Read transaction
• write transaction

• Read-set of T: all data items that transaction T reads

• Write-set of T: all data items that transaction T writes


9
A Transaction: An Informal Example

• Transfer 400,000 from checking account to


savings account
• For a user it is one activity
• To database:
• Read balance of checking account: read( X)
• Read balance of savings account: read (Y)
• Subtract 400,000 from X
• Add 400,000 to Y
• Write new value of X back to disk
• Write new value of Y back to disk

10
General Idea on Database Read and Write
Operations
• A database is represented as a collection of named data items
• Read-item (X)
1. Find the address of the disk block that contains item X
2. Copy the disk block into a buffer in main memory
3. Copy the item X from the buffer to the program variable named X
• Write-item (X)
1. Find the address of the disk block that contains item X.
2. Copy that disk block into a buffer in main memory
3. Copy item X from the program variable named X into its correct
location in the buffer.
4. Store the updated block from the buffer back to disk (either
immediately or at some later point in time).

11
A Transaction: A Formal Example
T1

t0 BEGIN:
read_item(X);
read_item(Y);
X:=X - 400000;
Y:=Y + 400000;
tk write _item(X);
Write_item(Y);
END
12
Transaction States: A state transition diagram

13
Transaction States
• BEGIN_TRANSACTION: marks start of transaction
• READ or WRITE: two possible operations on the data
• END_TRANSACTION: marks the end of the read or
write operations; start checking whether everything
went according to plan
• COMMIT_TRANSACTION: signals successful end of
transaction; changes can be “committed” to DB
• Partially committed
• ROLLBACK (or ABORT): signals unsuccessful end of
transaction, changes applied to DB must be undone

14
A Sample SQL Transaction
EXEC SQL WHENEVER SQLERROR GOTO UNDO;
EXEC SQL SET TRANSACTION
READ WRITE
DIAGONOSTIC SIZE 5
ISOLATION LEVEL SERIALIZABLE;
EXEC SQL INSERT INTO
EMPLOYEE(FNAME, LNAME, SSN, DNO, SALARY)
VALUES (‘Ali’, ’Al-Fares’, ‘991004321’, 2, 35000)
EXEC SQL UPDATE EMPLOYEE
SET SALARY = SALARY * 1.1 WHERE DNO = 2;
EXEC SQL COMMIT;
GOTO END_T;
UNDO: EXEC SQL ROLLBACK;
END_T: ……;

15
Desirable Properties of Transactions
• ACID properties
1. Atomicity
A transaction is an atomic unit of processing; it is either
performed in its entirety or not performed at all.
2. Consistency preservation
A transaction is consistency preserving if its complete execution
takes the database from one consistent state to another
3. Isolation
The execution of a transaction should not be interfered with by any
other transactions executing concurrently
4. Durability
The changes applied to the database by a committed transaction must
persist in the database. These changes must not be lost because of any
failure

16
Achieving the ACID Properties
• Assertion: If all transactions achieved the ACID
properties, then the database is assured to be in a
consistent and correct state always.

• A Concurrency control component of the DBMS is


used to ensure that the properties of Consistency and
Isolation are achieved

• Atomicity and Durability are used and ensured by the


Recovery component of the DBMS.
17
What Can Go Wrong?
• System may crash before data is written back to disk
= Problem of atomicity
• Some transaction is modifying shared data while
another transaction is ongoing (or vice versa)
= Problem of isolation and consistency
• System may not be able to obtain one or more of the
data items
= Problems of durability
• System may not be able to write one or more of the
data items
= Problems of atomicity
18
Other Problems
• System failures may occur
• Types of failures:
• Transaction or system error
• Local errors
• Concurrency control enforcement
• Disk failure
• Physical failures

19
Why Do We Interleave Transactions?

T1 T2
read_item(X);
X:=X-N;
write_item(X);
Could be a long wait
read_item(Y);
Y:=Y+N;
write_item(Y);
read_item(X):
X:=X+M;
write_item(X);

20
Concurrent Executions
• Serial execution is by far simplest method to execute
transactions
• No extra work ensuring consistency
• Inefficient!
• Reasons for concurrency:
• Increased throughput
• Reduces average response time
• However we need correct concurrent execution

21
Concurrency Control
• Why is concurrency control needed?

• Consider the following problems:


1. The lost update problem
2. The temporary update (dirty read) / uncommitted data
dependency problem
3. Incorrect summary problem/ incorrect analysis
4. Phantom record problem
5. Unrepeatable read problem

22
Lost Update

T1 T2

Read(X)
X = X - 5
Read(X)
X = X + 5
This update
Write(X) Only this update
Is lost
Write(X) succeeds
COMMIT
COMMIT
23
Uncommitted dependency /Temporary
Update / (“dirty read”)

T1 T2

Read(X)
X = X - 5
Write(X)
Read(X) This reads
the value
X = X + 5 of X which
Write(X) it should
ROLLBACK not have
COMMIT seen

24
Inconsistent analysis/Summary

T1 T2

Read(X)
X = X - 5
Write(X)
Read(X)
Read(Y) Summing up
Sum = X+Y data while it is
Read(Y) being updated
Y = Y + 5
Write(Y)
25
Phantom record problem

T1 T2

Read(X)
Read(X)

Delete(x)
Read (x)

26
Unrepeatable read problem

T1 T2

Read(X)
Read(X)

Write(X)
Read (x)

27
Schedules
• A schedule is a time-ordered sequence of actions taken by
one or more transactions.
Formal Definition:
• Schedule S of n transactions T1, T2, … , Tn is an ordering of
the operations of various transactions subject to the
constraint that, for each transaction Ti that participates in S,
the operations of Ti in S must appear in the same order in
which they occur in Ti.
• i.e READ and WRITE actions, and their orders are important
when considering concurrency.

28
Example of a Schedule

• Transaction T1: r1(X); w1(X); r1(Y); w1(Y); c1


• Transaction T2: r2(X); w2(X); c2

• A schedule, S:
r1(X); r2(X); w1(X); r1(Y); w2(X); w1(Y); c1; c2

29
Serial Schedule
• A serial schedule is a schedule where operations of
each transaction are executed consecutively without
any interleaved operations from other transactions
(each transaction commits before the next one is
allowed to begin).
• Non-serial schedule – operations from a set of
concurrent transactions are interleaved

30
Example
• If T2 is scheduled to run after T1 is completely
finished, then the schedule is serial.

• If T2 is scheduled to run when T1 is only halfway


finished then the schedule is not serial.

• NB: A serial schedule will preserve the consistency


of the database state.

31
Serial Schedule

• We consider transactions to be
Independent (Isolated), so a serial
schedule is always correct
• Based on C and I properties in ACID
• Furthermore, it does not matter which
transaction is executed first, as long as
every transaction is executed in its
entirety, from beginning to end
32
Serializable Schedule

Definition

A schedule S of n transactions is serializable if it is


equivalent to some serial schedule of the same n
transactions

33
Serializability
• Assumption: Every serial schedule is correct
• Goal: Find non-serial schedules which are also correct
• A schedule S of n transactions is serializable if it is
equivalent to some serial schedule of the same n
transactions
• Serializability of a schedule means equivalence to a serial
schedule (i.e., sequential with no transaction overlap in
time) with the same transactions.
• Q: When are two schedules equivalent?
• Option 1: They lead to same result (result equivalent)
• Option 2: The order of any two conflicting operations is the
same (conflict equivalent) 34
Types of Equivalence

• Conflict equivalence
• Result equivalence
• View equivalence

35
Example: Serial and Serializable

Interleaved Schedule Serial Schedule


T1 Read(X) T2 Read(X)
T2 Read(X) T2 Read(Y)
T2 Read(Y) T2 Read(Z)
T1 Read(Z)
T1 Read(Y) T1 Read(X)
T2 Read(Z) T1 Read(Z)
T1 Read(Y)
This schedule is serializable:

36
Conflict Equivalence
• Conflicting operations are used to define how
schedules are equivalent
• Two operations conflict if they satisfy ALL three
conditions:
1. they belong to different transactions AND
2. they access the same item AND
3. at least one is a write_item()operation
• Example.:
• S: r1(X); r2(X); w1(X); r1(Y); w2(X); w1(Y);

conflicts
37
Conflict Equivalent Schedules

• Two schedules are conflict equivalent if the order of


conflicting operations is the same in both schedules

38
Example 1: Conflict Serialisable Schedule

Interleaved Schedule Serial Schedule


T1 Read(X) T1 Read(X)
T1 Write(X) T1 Write(X)
T2 Read(X) T1 Read(Y)
T2 Write(X) T1 Write(Y)
T1 Read(Y)
T1 Write(Y) T2 Read(X)
T2 Read(Y) T2 Write(X)
T2 Write(Y) T2 Read(Y)
T2 Write(Y)
This schedule is serialisable,
even though T1 and T2 read
and write the same resources
X and Y: they have a conflict 39
Example 2: Conflict Equivalence

40
Conflict Equivalence
Serial Schedule S1
T1 T2
read_item(A);
write_item(A);

order doesn’t matter


order matters
read_item(B);
write_item(B);
read_item(A):
order matters write_item(A); order
doesn’t matter
read_item(B);
write_item(B);
41
Conflict Equivalence
Schedule S1’
T1 T2
read_item(A);
read_item(B);
same order as in S1
write_item(A);
read_item(A):
write_item(A);
same order as in S1
write_item(B);
read_item(B);
write_item(B);
S1 and S1’ are conflict equivalent
(S1’ produces the same result as S1)

42
Conflict Equivalence
Schedule S1’’
T1 T2
read_item(A):
write_item(A);
read_item(A); different order than in S1
write_item(A);
read_item(B);
write_item(B);
read_item(B); different order than in S1
write_item(B);
Schedule S1’’ is not conflict equivalent to S1
(produces a different result than S1)
43
Conflict Serialisability
• Conflict serialisable • Important questions: how to
schedules are the main focus
determine whether a
of concurrency control
schedule is conflict
• They allow for interleaving serialisable
and at the same time they • How to construct conflict
are guaranteed to behave as serialisable schedules
a serial schedule

Conflict serializable schedule is conflict equivalent to a serial schedule

44
Test for Conflict Serializability
• Construct a directed graph, precedence graph, G = (V, E)
• V: set of all transactions participating in schedule
• E: set of edges Ti  Tj for which one of the following holds:
• Ti executes a write_item(X) before Tj executes read_item(X)
• Ti executes a read_item(X) before Tj executes write_item(X)
• Ti executes a write_item(X) before Tj executes write_item(X)

• An edge Ti  Tj means that in any serial schedule equivalent


to S, Ti must come before Tj
• If G has a cycle, than S is not conflict serializable
• If not, use topological sort to obtain serialiazable schedule
(linear order consistent with precedence order of graph)

45
Sample Schedule S
T2
T1 T3
read_item(Y);
read_item(Z);
read_item(X);
write_item(X);
write_item(Y);
write_item(Z);
read_item(Z);

read_item(Y);
write_item(Y);
read_item(Y);
write_item(Y);
read_item(X);
write_item(X);
46
Precedence Graph for S

X,Y
T1 T2

Y Y,Z
no cycles  S is serializable
T3 Equivalent Serial Schedule:
T3  T1  T2
(precedence order)
47
Precedence Graph Example
• The lost update schedule has
the precedence graph:
T1 T2

Read(X)
X = X - 5
T1 Write(X) followed by T2 Write(X) Read(X)
X = X + 5
Write(X)
T1 T2
Write(X)
COMMIT
T2 Read(X) followed by T1 Write(X)
COMMIT

48
Precedence Graph Example
• No cycles: conflict
serialisable schedule
T1 T2

T1 reads X before T2 writes X and


Read(X)
T1 writes X before T2 reads X and Write(X)
T1 writes X before T2 writes X Read(X)
Write(X)
T1 T2

49
Result Equivalent Schedules
• Two schedules are result equivalent if they produce
the same final state of the database
• Problem: May produce same result by accident!

S1 S2
read_item(X); read_item(X);
X:=X+10; X:=X*1.1;
write_item(X); write_item(X);
Schedules S1 and S2 are result equivalent for X=100 but not in general

50
View Serializability

• View equivalence: A less restrictive definition of equivalence


of schedules

• View serializability
• Definition of serializability based on view equivalence.
A schedule is view serializable if it is view equivalent to a
serial schedule.

51
View Equivalence

Two schedules are said to be view equivalent if the following three conditions
hold:
1. The same set of transactions participates in S and S’, and S and S’ include
the same operations of those transactions.
2. For any operation Ri(X) of Ti in S, if the value of X read by the operation
has been written by an operation Wj(X) of Tj (or if it is the original value
of X before the schedule started), the same condition must hold for the
value of X read by operation Ri(X) of Ti in S’.
3. If the operation Wk(Y) of Tk is the last operation to write item Y in S, then
Wk(Y) of Tk must also be the last operation to write item Y in S’.

52
View Equivalence

The premise behind view equivalence:


• As long as each read operation of a transaction reads the
result of the same write operation in both schedules, the write
operations of each transaction must produce the same results.
• “The view”: the read operations are said to see the the same
view in both schedules.

53
View and Conflict equivalence

Relationship between view and conflict equivalence:


• The two are same under constrained write assumption which
assumes that if T writes X, it is constrained by the value of X it
read; i.e., new X = f(old X)
• Conflict serializability is stricter than view serializability.
With unconstrained write (or blind write), a schedule that is
view serializable is not necessarily conflict serializable.
• Any conflict serializable schedule is also view serializable, but
not vice versa.

54
View and Conflict equivalence

Relationship between view and conflict equivalence (cont):


Consider the following schedule of three transactions
T1: r1(X), w1(X); T2: w2(X); and T3: w3(X):
Schedule Sa: r1(X); w2(X); w1(X); w3(X); c1; c2; c3;

In Sa, the operations w2(X) and w3(X) are blind writes, since T1 and T3 do
not read the value of X.

Sa is view serializable, since it is view equivalent to the serial schedule T1,


T2, T3. However, Sa is not conflict serializable, since it is not conflict
equivalent to any serial schedule.

55
Characterizing Schedules based on
Serializability
• Being serializable is not the same as being serial
• Being serializable implies that the schedule is a
correct schedule.
• It will leave the database in a consistent state.
• The interleaving is appropriate and will result in a
state as if the transactions were serially executed, yet
will achieve efficiency due to concurrent execution.

56
Characterizing Schedules based on
Serializability

• Serializability is hard to check.


• Interleaving of operations occurs in an operating system
through some scheduler
• Difficult to determine before hand how the operations in a
schedule will be interleaved.

57
Characterizing Schedules based on
Serializability
Practical approach:
• Come up with methods (protocols) to ensure serializability.
• It’s not possible to determine when a schedule begins and
when it ends. Hence, we reduce the problem of checking the
whole schedule to checking only a committed project of the
schedule (i.e. operations from only the committed
transactions.)
• Current approach used in most DBMSs (covered next Lecture):
• Concurrency control techniques
• Examples
• Two-phase locking technique
• Timestamp ordering technique
58
Summary

• Introduction to transaction processing


• Transaction and system concepts
• Desirable properties of transactions
• Serializability of schedules

59

You might also like