You are on page 1of 61

INFO 404

ADVANCED
DATABASE
SYSTEMS
TRANSACTION MANAGEMENT
By the end of this presentation you will be able
to do the following:
-Define what a transaction is
-Describe the states that a transaction goes
through from the time that it is initiated up to
the time that it is completed
-Describe the properties of a transaction
TRANSACTION DEFINED AS:
• A collection of operations that must be
processed as one unit of work
• A logical unit of work on a database
– An entire program
– A portion of a program
– A single command
• The entire series of steps necessary to
accomplish a logical unit of work
• A transaction is seen by the DBMS as a series,
or list, of actions
– Includes read and write of objects
– We’ll write this as R(o) and W(o) (sometimes RT(o)
and WT(o) )
• For example
T1: [R(a), W(a), R(c), W(c)]
T2: [R(b), W(b)]
• In addition, a transaction should specify as its
final action either commit, or abort
example
START TRANSACTION
Display greeting
Get account number, pin, type, and amount
SELECT account number, type and balance
If balance is sufficient then
UPDATE account by posting debit
UPDATE account by posting credit
INSERT history record
Display final message and issue cash
Else
Write error message
End If
On Error: ROLLBACK
COMMIT

Successful transactions change the database from one CONSISTENT STATE to


another
• A transaction is the basic logical unit of execution in an
information system.
• A transaction is a sequence of operations that must be
executed as a whole, taking a consistent (& correct)
database state into another consistent (& correct) database
state;
• A collection of actions that make consistent
transformations of system states while preserving system
consistency
database in a database in a
consistent state consistent state
Account A Tendai Mudavanhu $1000 Transfer $500 Account A Tendai Mudavanhu $500

Account B Sarah Munaye $0 Account B Sarah Munaye $500

begin execution of end Transaction


Transaction Transaction
database may be
temporarily in an
inconsistent state
during execution
STATE TRANSITION DIAGRAM FOR A TRANSACTION

Active state is the state a


transaction is in as long as it A transaction reaches its commit point when all
is in operation issuing read operations accessing the database are completed
and write operations. and the result has been recorded in the log. It then
writes a [commit, transaction-id].
BEGIN END
TRANSACTION TRANSACTION
active partially
committed COMMIT
committed

ROLLBACK ROLLBACK
READ, WRITE

terminated
failed

If a system failure occurs, searching the log and rollback the transactions that
have written into the log a
[start_transaction, transaction-id]
[write_item, transaction-id, X, old_value, new_value]
but have not recorded into the log a [commit, transaction-id]
CONDITIONS FOR A TRANSACTION TO BE IN A
PARTIALLY COMMITTED STATE OR FAILED STATE
PARTIALLY COMMITTED STATE: A transaction is
partially committed if it has violated serializability
or integrity constraints or secondary storage failure
for record updations then the transaction has to be
aborted.
If the transaction has been successful any updates
can be safely recorded and the transaction can go
to the committed state.
FAILED STATE: It occurs if the transaction cannot
be committed or the transaction is aborted
while in the active state perhaps due to the user
aborting the transaction or as a result of the
concurrency control protocol aborting the
transaction to ensure the serializability
PROPERTIES OF A TRANSACTION
ATOMICITY PROPERTY
-It means that a transaction can not be sub-divided; either all the work in the
transaction is completed or nothing is done
• E.g. transaction to transfer $50 from account A to account B:
1. read_from_account(A)
2. A := A – 50
3. write_to_account(A)
4. read_from_account(B)
5. B := B + 50
6. write_to_account(B)
• Atomicity requirement
– if the transaction fails after step 3 and before step 6, money will be “lost” leading to an
inconsistent database state
• Failure could be due to software or hardware
– the system should ensure that updates of a partially executed transaction are not
reflected in the database
• A DBMS ensures atomicity by undoing the
actions of partial transactions (referred to as
backward recovery or rollback)
• Backward recovery is used to reverse the
changes made by transaction that have
aborted or terminated abnormally
• The component of the DBMS responsible for
this is called the recovery manager
CONSISTENCY PROPERTY
- It means that if applicable constraints are true
before the transaction starts, the constraints will
be true after the transaction terminates
- Each user is responsible to ensure that their
transaction (if executed by itself) would leave the
database in a consistent state e.g. if a user’s
account is balanced before a transaction then the
account is balanced after the transaction.
Otherwise the transaction is rejected and no
changes take effect
ISOLATION PROPERTY
Transactions are isolated, or protected, from the effects of
other scheduled transactions
It also means that changes to the database are not revealed to
users until the transaction is committed
-Transactions should not interfere with each other (these can
be effected through locking which will be discussed later)
- If transactions are isolated from one another it means that
concurrent transactions (i.e. several transactions being
executed at a given point in time) all affect the database as
if they were presented to the DBMS in a serial fashion (i.e.
they look like they are executing one after the other)
• Isolation requirement — if between steps 3 and
6, another transaction T2 is allowed to access the
partially updated database, it will see an
inconsistent database (the sum A + B will be less
than it should be).
T1 T2
1. read(A)
2. A := A – 50
3. write(A)
read(A), read(B), print(A+B)
4. read(B)
5. B := B + 50
6. write(B
• Isolation can be ensured trivially by running
transactions serially
– that is, one after the other.
DURABILITY PROPERTY
-Any changes resulting from a transaction are
permanent
-If a transaction changes the database and is
committed, the changes must never be lost
because of subsequent failure
• DBMS uses the log to ensure durability
• If the system crashed before the changes
made by a completed transaction are written
to disk, the log is used to remember and
restore these changes when the system is
restarted
• Again, this is handled by the recovery manager
DBMSs provide two services to ensure that
transactions obey the ACID properties:
Recovery transparency- means that the DBMS
automatically restores a database to a
consistent state after a failure e.g. If a
communication failure occurs during an ATM
transaction, the effects of the transaction are
automatically removed from the database; on
the other hand if the DBMS crashes 3 seconds
after an ATM transaction completes, the
details of the transaction remain permanent
Concurrency Transparency- means that users
perceive the database as a single user system
even though there may be many simultaneous
users e.g. Even though many users may try to
reserve for a flight using a reservation
transaction, the DBMS ensures that users do
not overwrite each other’s work
NON-ACID TRANSACTIONS
• There are application domains where ACID properties are not
necessarily desired or, most likely, not always possible.
• This is the case of so-called long-duration transactions
– Suppose that a transaction takes a lot of time
– In this case it is unlikely that isolation can/should be guaranteed
• E.g. Consider a transaction of booking a hotel and a flight
• Without Isolation, Atomicity may be compromised
• Consistency and Durability should be preserved

• Usual solution for long-duration transaction is to define


compensation action – what to do if later the transaction fails
• In (centralized) databases long-duration transactions are usually not
considered.
• But these are more and more important, specially in the context of
the Web.
CONCURRENCY CONTROL
By the end of this presentation you will be
able to:
-Define what concurrency control is
-Identify and explain the problems associated
with lack of concurrency control
-Describe the concurrency control algorithms
CONCURRENCY CONTROL (CC)
– Most DBMS are multi-user systems and run with
the expectation that users will be able to share
data contained in the database
– If users are only reading data no data integrity
problems will be encountered because no changes
will be made in the database. However if one or
more users are updating data then potential
problems with maintaining data integrity arise
– The concurrent execution of many different
transactions submitted by various users must be
organised such that each transaction does not
interfere with another transaction with one
another in a way that produces incorrect results.
– The concurrent execution of transactions must be
such that each transaction appears to execute in
isolation thus enforcing serializability (where it
seems as if the transactions are being performed
one after the other) in a multi-user database
environment
– The objective of CC is to maximise transaction
throughput while preventing interference among
multiple users
CONCURRENCY CONTROL DEFINED
Concurrency control is defined as:

i)The coordination of simultaneous transaction execution


in a multiprocessing database system
ii)It is the process of managing simultaneous operations
against a database so that data integrity is maintained
and the operations do not interfere with each other in a
multi-user environment
Lack of Concurrency Control can create data integrity and
consistency problems which include:
– Lost Update Problem
– Uncommitted Data Problem
– Inconsistent Retrievals Problem
Lost Update Problem :
A lost update problem occurs when two transactions
that access the same database items have their
operations in a way that makes the value of some
database item incorrect.
In other words, if transactions T1 and T2 both read a
record and then update it, the effects of the first
update will be overwritten by the second update
Example:
Consider the situation given in figure that shows operations
performed by two transaactions, Transaction- A and Transaction- B
with respect to time.
Transaction A TIME Transaction B

----------------- t0 -----------------

Read X t1 -----------------

---------------- t2 Read X

Update X t3 -----------------

t4 Update X

----------------- t5 -----------------

At time t1 , Transactions-A reads value of X.


At time t2 , Transactions-B reads value of X.
At time t3,Transactions-A writes value of X on the basis of the value seen at time t1.
At time t4,Transactions-B writes value of X on the basis of the value seen at time t2.
So,update of Transactions-A is lost at time t4,because Transactions-B overwrites it without looking
at its current value.
Such type of problem is reffered as the Update Lost Problem, as update made by one transaction
is lost here.
Dirty Read/Uncommitted Data Problem
A dirty read problem occurs when one
transaction updates a database item and then
the transaction fails for some reason.The
updated database item is accessed by another
transaction before it is changed back to the
original value.In other words, a transaction T1
updates a record, which is read by the
transaction T2.
Then T1 aborts and T2 now has values which
have never formed part of the stable database.
,
Dirty Read/Uncommitted Data Problem
Transaction A TIME Transaction B

----------------- t0 -----------------

----------------- t1 Update X

Read X t2 -----------------

----------------- t3 Rollback

----------------- t4 -----------------

At time t1 , Transactions-B writes value of X.


At time t2 , Transactions-A reads value of X.
At time t3 , Transactions-B rollbacks.So,it changes the value of X back to that of prior to
t1.
So,Transaction-A now has value which has never become part of the stable database.
Such type of problem is reffered as the Dirty Read Problem, as one transaction reads a
dirty value which has not been commited.
Inconsistent Retrievals Problem
Unrepeatable read (or inconsistent retrievals)
occurs when a transaction calculates some
summary (aggregate) function over a set of data
while other transactions are updating the data.
The problem is that the transaction might read
some data before they are changed and other data
after they are changed, thereby yielding
inconsistent results.
In an unrepeatable read, the transaction T1 reads
a record and then does some other processing
during which the transaction T2 updates the
record. Now, if T1 rereads the record, the new
value will be inconsistent with the previous value.
Inconsistent Retrievals Problem
Example:
Consider the situation given in figure that shows two transactions operating on three accounts :

Account-1 Account-2 Account-3


Balance = 200 Balance = 250 Balance = 150

Transaction- A Time Transaction- B

------------- t0 -------------

Read Balance of Account-1 t1 -------------


sum <-- 200
Read Balance of Account-2

Sum <-- Sum + 250 = 450 t2 -------------

------------- t3 Read Balance of Account-3

------------- t4 Update Balance of Account-3


150 --> 150 - 50 --> 100
------------- t5 Read Balance of Account-1

------------- t6 Update Balance of Account-1


200 --> 200 + 50 --> 250
Read Balance of Account-3 t7 COMMIT

Sum <-- Sum + 100 = 550 t8 -------------


Transaction-A is summing all balances;while,
Transaction-B is transferring an amount 50 from
Account-3 to Account-1.
Here,the result produced by Transaction-A is 550,which
is incorrect. if this result is written in database,
database will be in inconsistent state, as actual sum is
600.
Here,Transaction-A has seen an inconsistent state of
database, and has performed inconsistent analysis.
CONCURRENCY CONTROL ALGORITHMS
There are 2 basic approaches to concurrency control:
i) A pessimistic approach ( which always assumes the
worst); it assumes that every time transactions
execute chances of them accessing and changing the
same data items is very high so it automatically
implements some control to prevent this from
happening.
ii) An optimistic approach- which assumes that chances
of transactions colliding and accessing and changing
the same data items is slim so it will only deal with
conflicts only and when they occur.
PESSIMISTIC APPROACH
LOCKING- it involves the use of locks on a
database item to prevent other transactions
from performing conflicting actions on the
same item
- With locking any data that is retrieved by a
user for updating must be locked or denied to
other users until the update is complete or
aborted so other users must wait if they are
trying to obtain a conflicting lock on the same
part of the database
EXAMPLE OF LOCKING MECHANISM
TIME JACK’S JILL’S BALANCE
TRANSACTION TRANSACTION
T1 BEGIN
TRANSACTION
T2 READ BAL BEGIN 1000
TRANSACTION
T3 LOCK BAL : 1000
T4 BAL=BAL-50 : 950
T5 WRITE BAL : 950
T6 COMMIT : 950
T7 UNLOCK BAL : 950
T8 LOCK BAL 950
T9 BAL = BAL+100 1050
T10 WRITE BAL 1050
T11 COMMIT 1050
T12 UNLOCK BAL 1050
Concurrency is affected by two things i.e. the type
of the lock and the granularity of the lock
There are two types of locks:
i) Shared (S) or read lock- which must be obtained before
reading a database item; the technique allows other
transactions to read but not update a record or other
resource( it prevents another user from placing an
exclusive lock on that record)
ii) Exclusive (X) or write lock- which must be obtained
before writing a database item; the technique prevents
other transactions from reading and therefore updating a
record until it is unlocked
EXAMPLE OF HOW SHARED AND
EXCLUSIVE LOCKS ARE USED
TIME JACK’S TRANSACTION JILL’S TRANSACTION
T1 BEGIN TRANSACTION
T2 PLACE READ LOCK BEGIN TRANSACTION
T3 READ BAL PLACE READ LOCK
T4 BAL=BAL-50 READ BAL
T5 REQUEST WRITE LOCK BAL = BAL + 100
T6 : REQUEST WRITE LOCK
T7 : :
T8 (WAIT) :
T9 : (WAIT)

JACK INTIATES HIS TRANSACTION, THE PROGRAM PLACES A READ LOCK ON HIS RECORD SINCE
HE IS READING THE RECORD TO CHECK THE ACCOUNT BALANCE. WHEN HE REQUESTS A
WITHDRAWAL THE PROGRAM ATTEMPTS TO PLACE A WRITE LOCK. HOWEVER JILL HAD
ALREADY PLACED A SHARED LOCK ON THE RECORD. AS A RESULT HIS REQUEST IS DENIED
SINCE IF A RECORD HAS A READ LOCK ANOTHER USER CANNOT OBTAIN A WRITE LOCK
LOCKING LEVEL (GRANULARITY)
This is the extent of the database resource that
is included with each lock; locks are
implemented at the following levels:
Database- entire database is locked and becomes unavailable to other users
Table- entire table containing a requested record is locked; this level is appropriate
mainly for bulk updates such giving all employees a 5% raise
Block or page- the physical storage block(or page) containing a requested record is
locked
Record- only the requested record or row is locked; all other records are available to
other users
Field- only the particular field or column in a requested record is locked; this level is
appropriate when the updates affect only one or two fields in a records e.g you want
to update price column
PROBLEMS ASSOCIATED WITH
LOCKING

As much as locking ensures that transactions do


not collide it results in deadlocks which is an
impasse that results when two or more
transactions have locked a common resource
and each waits for the other to unlock that
resource
Deadlock may be avoided using either deadlock
prevention or deadlock resolution
DEADLOCK PREVENTION
This is used when the user programs must lock
all records they will require at the beginning of a
transaction
Two Phase locking protocol is used in this case.
It has 2 phases:
i)Growing phase- where all the necessary locks
are acquired
ii)Shrinking phase- where locks are released
DEADLOCK RESOLUTION
It allows deadlocks to occur but builds mechanisms into
the DBMS for detecting and breaking the deadlocks
The DBMS maintains a matrix of resource usage which at
a given instant indicates what users (subjects) are using
what resources (objects); by scanning the matrix the
computer can detect deadlocks as they occur; it then
resolves the deadlocks by “backing up” one of the
deadlocks as they occur; any changes made by that
transaction up to the time of deadlock are removed and
the transaction is restarted when the required resources
become available
OPTIMISTIC APPROACH:
VERSIONING
• There is no form of locking but each transaction is
restricted to a view of the database as of the time
that the transaction started and when a transaction
modifies a record the DBMS creates a new record
version instead of overwriting the old record
• In cases of a conflict, changes made by one of the
users are updated in this case since they are time
stamped the earlier transaction is given priority; the
other user’s transaction is aborted or rolled back
EXAMPLE OF VERSIONING
TIME JACK’S TRANSACTION JILL’S TRANSACTION
T1 BEGIN TRANSACTION
T2 READ BAL BEGIN TRANSACTION
T3 READ BAL
T4 BAL = BAL-50 BAL = BAL+100
T5 WRITE BAL WRITE BAL
T6 COMMIT ROLL BACK
T7 RESTART TRANSACTION
T8
T9
DATABASE RECOVERY
By the end this presentation you will be able
to do the following:
-Explain what data recovery is and the reasons
for enforcing database recovery on a database
-Describe the database recovery techniques
that can be used to enforce database recovery
-Describe the ARIES algorithm for database
recovery
DATABASE RECOVERY
A technique used to restore a database to its last
known original state before the system failure
occurred.
To see where the problem has occurred, failure is
generalized into various categories, as follows –
Transaction failure:
• Logical errors: transaction cannot complete due
to some internal error condition
• System errors: the database system must
terminate an active transaction due to an error
condition (e.g., deadlock)
System crash: a power failure or other hardware or
software failure that causes the system to crash.
• Fail-stop assumption: non-volatile storage
contents are assumed to not be corrupted by the
system crash; database systems have numerous
integrity checks to prevent corruption of disk data
which include:
• Mirroring- aka data replication; it maintains two
or more copies of the same data in the storage
device, integrity checks can be made by
comparing the copies. An integrity violation in
one of the copies can be easily detected using
this method.
• RAID (Redundant Array of Independent Disks)
parity- RAID is a disk subsystem that increases
performance or provides fault tolerance or both.
In computers, parity (from the Latin paritas,
meaning equal or equivalent) is a technique that
checks whether data has been lost or written
over when it is moved from one place in storage
to another or when it is transmitted between
computers.
• Check summing- a checksum is a single value
that is computed for a block of data (pretty
simple idea); to determine if the data is
corrupted (i.e. it has changed) a checksum is
computed on the data and a check is done to
see whether it matches the original stored
checksum (assuming the original stored
checksum is correct); there are a number of
check sum algorithms used in this case. (you
can research on these)
When a DBMS recovers from a crash, it should
maintain the following −
• It should check the states of all the transactions,
which were being executed.
• A transaction may be in the middle of some
operation; the DBMS must ensure the atomicity
of the transaction in this case.
• It should check whether the transaction can be
completed now or it needs to be rolled back.
• No transactions would be allowed to leave the
DBMS in an inconsistent state.
LOG BASED RECOVERY
• Log-based recovery works as follows −
• The log file is kept on a stable storage media.
• When a transaction enters the system and starts
execution, it writes a log about it.
<Tn, Start>
• When the transaction modifies an item X, it write
logs as follows −
<Tn, X, V1, V2>
• When the transaction finishes, it logs −
• <Tn, commit>
DATABASE MODIFICATION
APPROACHES
Deferred database modification
• The deferred database modification scheme records
all modifications to the log, but defers all the writes
to after partial commit; redo only for recovery

HOW IS DATABASE RECOVERY ENFORCED WHEN THE


DEFERRED MODIFICATION TECHNIQUE HAS BEEN
USED?
• During recovery after a crash, a transaction
needs to be redone if and only if both <Ti
start> and <Ti commit> are there in the log
• Redoing a transaction Ti (redo Ti) sets the
value of all data items updated by the
transaction to the new values.
Immediate database modification
• The immediate database modification scheme
allows database updates of an uncommitted
transaction to be made as the writes are issued;
since undoing may be needed, update logs must
have both old value and new value.
Recovery procedure has two operations instead
of one:
a) undo(Ti) restores the value of all data items
updated by Ti to their old values, going
backwards from the last log record for Ti
b) redo(Ti ) sets the value of all data items
updated by Ti to the new values, going
forward from the first log record for Ti
When recovering after failure:
• Transaction Ti needs to be undone if the log
contains the record <Ti start>, but does not
contain the record <Ti commit>.
• Transaction Ti needs to be redone if the log
contains both the record <Ti start> and the
record <Ti commit>.
N.B. Undo operations are performed first, then
redo operations
SHADOW PAGING
This is the method where all the transactions are
executed in the primary memory or the
shadow copy of database. Once all the
transactions completely executed, it will be
updated to the database. Hence, if there is any
failure in the middle of transaction, it will not
be reflected in the database. Database will be
updated after all the transaction is complete.
SHADOW PAGING TECHNIQUE
RECOVERY WITH CONCURRENT
TRANSACTIONS
ARIES RECOVERY ALGORITHM
• Algorithms for Recovery and Isolation Exploiting Semantics
• ARIES retraces all actions of the DB prior to the crash to
reconstruct the database state when the crash occurred.
• Unlike the recovery algorithm described earlier, ARIES
Uses log sequence number (LSN) to identify log records
• Stores LSNs in pages to identify what updates have already
been applied to a database page
• Each page contains a PageLSN which is the LSN of the last log
record whose effects are reflected on the page
• Special redo-only log record called compensation log record
(CLR) used to log actions taken during recovery that never
need to be undone
ARIES recovery involves three passes
• Analysis pass: Determines
– Which transactions to undo
– Which pages were dirty (disk version not up to date) at time of crash
– RedoLSN: LSN from which redo should start
• Redo pass:
• The REDO phase reapplies updates from the log to the database.
– Repeats history, redoing all actions from RedoLSN
• RecLSN and PageLSNs are used to avoid redoing actions already
reflected on page
• Undo pass:
– Rolls back all incomplete transactions
• Transactions whose abort was complete earlier are not undone
– Key idea: no need to undo these transactions: earlier undo actions
were logged, and are redone as required
Analysis, redo and undo passes
Analysis determines where redo should start
Undo has to go back till start of earliest incomplete transaction

Last End of
checkpoint Log
Time

Analysis
Log
Redo pass
pass
Undo
pass
Remote Backup Systems
• Remote backup systems provide high availability by allowing transaction
processing to continue even if the primary site is destroyed.
• Detection of failure: Backup site must detect when primary site has failed
– to distinguish primary site failure from link failure maintain several
communication links between the primary and the remote backup.
– Heart-beat messages
• Transfer of control:
– To take over control backup site first perform recovery using its copy of the
database and all the log records it has received from the primary.
• Thus, completed transactions are redone and incomplete transactions
are rolled back.
– When the backup site takes over processing it becomes the new primary
– To transfer control back to old primary when it recovers, old primary must
receive redo logs from the old backup and apply all updates locally.

Alternative to remote backup: distributed database


with replicated data

You might also like