You are on page 1of 47

Module V: Data Recovery and

Protection
Contents
• Recovery
• Concurrency control techniques
• Locking
• Deadlock
• Serializability
• Security
Database Recovery
1 Purpose of Database Recovery
– To bring the database into the last consistent state,
which existed prior to the failure.
– To preserve transaction properties (Atomicity,
Consistency, Isolation and Durability).
• Example:
– If the system crashes before a fund transfer
transaction completes its execution, then either one or
both accounts may have incorrect value. Thus, the
database must be restored to the state before the
transaction modified any of the accounts.
Database Recovery
2 Types of Failure
– The database may become unavailable for
use due to
• Transaction failure: Transactions may fail
because of incorrect input, deadlock, incorrect
synchronization.
• System failure: System may fail because of
addressing error, application error, operating
system fault, RAM failure, etc.
• Media failure: Disk head crash, power disruption,
etc.
Database Recovery
3 Transaction Log
– For recovery from any type of failure data values prior to
modification (BFIM - BeFore Image) and the new value
after modification (AFIM – AFter Image) are required.
– These values and other information is stored in a
sequential file called Transaction log. A sample log is given
below. Back P and Next P point to the previous and next
log records of the same transaction.
Database Recovery
4 Data Update
– Immediate Update: As soon as a data item is modified in
cache, the disk copy is updated.
– Deferred Update: All modified data items in the cache is
written either after a transaction ends its execution or after
a fixed number of transactions have completed their
execution.
– Shadow update: The modified version of a data item
does not overwrite its disk copy but is written at a separate
disk location.
– In-place update: The disk version of the data item is
overwritten by the cache version.
Database Recovery
5 Data Caching
– Data items to be modified are first stored into
database cache by the Cache Manager (CM)
and after modification they are flushed
(written) to the disk.
– The flushing is controlled by Modified and
Pin-Unpin bits.
• Pin-Unpin: Instructs the operating system not to
flush the data item.
• Modified: Indicates the AFIM of the data item.
Database Recovery
6 Transaction Roll-back (Undo) and Roll-
Forward (Redo)
– To maintain atomicity, a transaction’s
operations are redone or undone.
• Undo: Restore all BFIMs on to disk (Remove all
AFIMs).
• Redo: Restore all AFIMs on to disk.
– Database recovery is achieved either by
performing only Undos or only Redos or by a
combination of the two. These operations are
recorded in the log as they happen.
Database Recovery
Write-Ahead Logging
• When in-place update (immediate or deferred) is used
then log is necessary for recovery and it must be
available to recovery manager. This is achieved by
Write-Ahead Logging (WAL) protocol. WAL states that
– For Undo: Before a data item’s AFIM is flushed to the
database disk (overwriting the BFIM) its BFIM must be
written to the log and the log must be saved on a stable
store (log disk).
– For Redo: Before a transaction executes its commit
operation, all its AFIMs must be written to the log and the
log must be saved on a stable store.
Database Recovery
7 Checkpointing
– Time to time (randomly or under some criteria) the
database flushes its buffer to database disk to minimize
the task of recovery. The following steps defines a
checkpoint operation:
1. Suspend execution of transactions temporarily.
2. Force write modified buffer data to disk.
3. Write a [checkpoint] record to the log, save the log to disk.
4. Resume normal transaction execution.
– During recovery redo or undo is required to transactions
appearing after [checkpoint] record.
Database Recovery
Steal/No-Steal and Force/No-Force
– Possible ways for flushing database cache to database
disk:
1. Steal: Cache can be flushed before transaction commits.
2. No-Steal: Cache cannot be flushed before transaction
commit.
3. Force: Cache is immediately flushed (forced) to disk.
4. No-Force: Cache is deferred until transaction commits
– These give rise to four different ways for handling
recovery:
• Steal/No-Force (Undo/Redo)
• Steal/Force (Undo/No-redo)
• No-Steal/No-Force (Redo/No-undo)
• No-Steal/Force (No-undo/No-redo)
Database Recovery
8 Recovery Scheme
• Deferred Update (No Undo/Redo)
– The data update goes as follows:
– A set of transactions records their updates in
the log.
– At commit point under WAL scheme these
updates are saved on database disk.
– After reboot from a failure the log is used to
redo all the transactions affected by this
failure. No undo is required because no AFIM
is flushed to the disk before a transaction
commits.
Database Recovery
• Deferred Update in a single-user system
There is no concurrent data sharing in a single user
system. The data update goes as follows:
– A set of transactions records their updates in the log.
– At commit point under WAL scheme these updates are
saved on database disk.
• After reboot from a failure the log is used to redo all the
transactions affected by this failure. No undo is required
because no AFIM is flushed to the disk before a
transaction commits.
Database Recovery
Deferred Update with concurrent users
• Two tables are required for implementing this protocol:
– Active table: All active transactions are entered in this
table.
– Commit table: Transactions to be committed are entered
in this table.

• During recovery, all transactions of the commit table are


redone and all transactions of active tables are ignored
since none of their AFIMs reached the database. It is
possible that a commit table transaction may be redone
twice but this does not create any inconsistency because
of a redone is “idempotent”, that is, one redone for an
AFIM is equivalent to multiple redone for the same AFIM.
Database Recovery
Recovery Techniques Based on Immediate
Update
• Undo/No-redo Algorithm
– In this algorithm AFIMs of a transaction are
flushed to the database disk under WAL
before it commits.
– For this reason the recovery manager undoes
all transactions during recovery.
– No transaction is redone.
– It is possible that a transaction might have
completed execution and ready to commit but
this transaction is also undone.
Database Recovery
Recovery Techniques Based on Immediate
Update
– Undo/Redo Algorithm (Single-user
environment)
• Recovery schemes of this category apply undo
and also redo for recovery.
• In a single-user environment no concurrency
control is required but a log is maintained under
WAL.
• Note that at any time there will be one transaction
in the system and it will be either in the commit
table or in the active table.
• The recovery manager performs:
– Undo of a transaction if it is in the active table.
Concurrency Control

• Concurrency Control is the process of managing / controlling


simultaneous operations on the database. Concurrency control is
required because actions from different users / applications taking
place upon the database must not interfere.
Interleaving operations can lead to the database being left in an
inconsistent state. Three potential problems which should be
addressed by successful concurrency control are as follows:
- Lost update problem
- Uncommitted Dependency Problem
- Inconsistent Analysis Problem
The two main concurrency control techniques are Locking and
Timestamping
• A transaction is a very small unit of a
program and it may contain several
lowlevel tasks. A transaction in a database
system must
maintain Atomicity, Consistency, Isolation,
and Durability − commonly known as ACID
properties − in order to ensure accuracy,
completeness, and data integrity.
• Atomicity − This property states that a transaction must be treated as an atomic unit, that is,
either all of its operations are executed or none.
• Consistency − The database must remain in a consistent state after any transaction. No
transaction should have any adverse effect on the data residing in the database.
• Durability − The database should be durable enough to hold all its latest updates even if the
system fails or restarts.
• Isolation − In a database system where more than one transaction are being executed
simultaneously and in parallel, the property of isolation states that all the transactions will be
carried out and executed as if it is the only transaction in the system. No transaction will affect
the existence of any other transaction.
• Serializability
• When multiple transactions are being executed by the operating
system in a multiprogramming environment, there are possibilities
that instructions of one transactions are interleaved with some other
transaction.
• Schedule − A chronological execution sequence of a transaction is
called a schedule. A schedule can have many transactions in it,
each comprising of a number of instructions/tasks.
• Serial Schedule − It is a schedule in which transactions are aligned
in such a way that one transaction is executed first. When the first
transaction completes its cycle, then the next transaction is
executed. Transactions are ordered one after the other. This type of
schedule is called a serial schedule, as transactions are executed in
a serial manner.
Lock-Based Protocols
• A lock is a mechanism to control concurrent access to a
data item
• Data items can be locked in two modes :
1. exclusive (X) mode. Data item can be both read as
well as
written. X-lock is requested using lock-X instruction.
2. shared (S) mode. Data item can only be read. S-lock
is
requested using lock-S instruction.
• Lock requests are made to concurrency-control
manager. Transaction can proceed only after request is
granted.
Lock-Based Protocols (Cont.)
• Lock-compatibility matrix

• A transaction may be granted a lock on an item if the requested


lock is compatible with locks already held on the item by other
transactions
• Any number of transactions can hold shared locks on an item,
– but if any transaction holds an exclusive on the item no other
transaction may hold any lock on the item.
• If a lock cannot be granted, the requesting transaction is made to
wait till all incompatible locks held by other transactions have
been released. The lock is then granted.
Lock-Based Protocols (Cont.)
• Example of a transaction performing locking:
T2: lock-S(A);
read (A);
unlock(A);
lock-S(B);
read (B);
unlock(B);
display(A+B)
• Locking as above is not sufficient to guarantee serializability — if A
and B get updated in-between the read of A and B, the displayed
sum would be wrong.
• A locking protocol is a set of rules followed by all transactions
while requesting and releasing locks. Locking protocols restrict the
set of possible schedules.
Pitfalls of Lock-Based
• Consider the partial schedule
Protocols

• Neither T3 nor T4 can make progress — executing lock-S(B)


causes T4 to wait for T3 to release its lock on B, while executing
lock-X(A) causes T3 to wait for T4 to release its lock on A.
• Such a situation is called a deadlock.
– To handle a deadlock one of T3 or T4 must be rolled back
and its locks released.
Pitfalls of Lock-Based
Protocols (Cont.)

• The potential for deadlock exists in most locking


protocols. Deadlocks are a necessary evil.
• Starvation is also possible if concurrency control
manager is badly designed. For example:
– A transaction may be waiting for an X-lock on an
item, while a sequence of other transactions
request and are granted an S-lock on the same
item.
– The same transaction is repeatedly rolled back
due to deadlocks.
• Concurrency control manager can be designed to
prevent starvation.
The Two-Phase Locking Protocol
• This is a protocol which ensures conflict-serializable
schedules.
• Phase 1: Growing Phase
– transaction may obtain locks
– transaction may not release locks
• Phase 2: Shrinking Phase
– transaction may release locks
– transaction may not obtain locks
• The protocol assures serializability. It can be proved that
the transactions can be serialized in the order of their
lock points (i.e. the point where a transaction acquired
its final lock).
The Two-Phase Locking
Protocol (Cont.)
• Two-phase locking does not ensure freedom from
deadlocks
• Cascading roll-back is possible under two-phase
locking. To avoid this, follow a modified protocol
called strict two-phase locking. Here a transaction
must hold all its exclusive locks till it commits/aborts.
• Rigorous two-phase locking is even stricter: here
all locks are held till commit/abort. In this protocol
transactions can be serialized in the order in which
they commit.
The Two-Phase Locking
Protocol (Cont.)
• There can be conflict serializable schedules that cannot
be obtained if two-phase locking is used.
• However, in the absence of extra information (e.g.,
ordering of access to data), two-phase locking is
needed for conflict serializability in the following sense:
Given a transaction Ti that does not follow two-phase
locking, we can find a transaction Tj that uses two-phase
locking, and a schedule for Ti and Tj that is not conflict
serializable.
Database Concurrency Control
Two-Phase Locking Techniques: The algorithm
Database Concurrency Control
Database Concurrency Control
Dealing with Deadlock and Starvation

Deadlock
T’1 T’2
read_lock (Y); T1 and T2 did follow
two-phase
read_item (Y); policy but they are
deadlock
read_lock (X);
read_item (Y);
write_lock (X);
(waits for X) write_lock (Y);
(waits for Y)
Deadlock (T’1 and T’2)
Database Concurrency Control

Dealing with Deadlock and Starvation

Deadlock prevention
A transaction locks all data items it refers to before it begins
execution. This way of locking prevents deadlock since a
transaction never waits for a data item. The conservative two-
phase locking uses this approach.
Database Concurrency Control

Dealing with Deadlock and Starvation

Deadlock detection and resolution


In this approach, deadlocks are allowed to happen. The
scheduler maintains a wait-for-graph for detecting cycle. If a
cycle exists, then one transaction involved in the cycle is selected
(victim) and rolled-back.
A wait-for-graph is created using the lock table. As soon as a
transaction is blocked, it is added to the graph. When a chain
like: Ti waits for Tj waits for Tk waits for Ti or Tj occurs, then this
creates a cycle. One of the transaction of the cycle is selected
and rolled back.
Database Concurrency Control

Dealing with Deadlock and Starvation

Deadlock avoidance
There are many variations of two-phase locking algorithm. Some
avoid deadlock by not letting the cycle to complete. That is as
soon as the algorithm discovers that blocking a transaction is
likely to create a cycle, it rolls back the transaction. Wound-Wait
and Wait-Die algorithms use timestamps to avoid deadlocks by
rolling-back victim.
More Deadlock Prevention
Strategies
• Following schemes use transaction timestamps for the sake of
deadlock prevention alone.
• wait-die scheme — non-preemptive
– older transaction may wait for younger one to release data item.
Younger transactions never wait for older ones; they are rolled
back instead.
– a transaction may die several times before acquiring needed data
item
• wound-wait scheme — preemptive
– older transaction wounds (forces rollback) of younger transaction
instead of waiting for it. Younger transactions may wait for older
ones.
– may be fewer rollbacks than wait-die scheme.
Database Concurrency Control

Dealing with Deadlock and Starvation

Starvation
Starvation occurs when a particular transaction consistently waits
or restarted and never gets a chance to proceed further. In a
deadlock resolution it is possible that the same transaction may
consistently be selected as victim and rolled-back. This limitation
is inherent in all priority based scheduling mechanisms. In
Wound-Wait scheme a younger transaction may always be
wounded (aborted) by a long running older transaction which may
create starvation.
Deadlock Detection (Cont.)

Wait-for graph without a cycle Wait-for graph with a cycle


Database Security
• Database Security - protection from
malicious attempts to steal (view) or
modify data.
Introduction to DB Security
• Secrecy: Users should not be able to see
things they are not supposed to.
– E.g., A student can’t see other students
grades.
• Integrity: Users should not be able to
modify things they are not supposed to.
– E.g., Only instructors can assign grades.
• Availability: Users should be able to see
and modify things they are allowed to.
Database Security Issues

• Types of Security
– Legal and ethical issues
– Policy issues
– System-related issues
– The need to identify multiple security levels 
Introduction to Database
Security Issues (2)
Threats to databases
- Loss of integrity
- Loss of availability
- Loss of confidentiality

To protect databases against these types of


threats four kinds of countermeasures can be
implemented : access control, inference control,
flow control, and encryption.
Introduction to Database
Security Issues (3)
A DBMS typically includes a database security and authorization
subsystem that is responsible for ensuring the security portions
of a database against unauthorized access.
• A security policy specifies who is authorized to do what.
• A security mechanism allows us to enforce a chosen security
policy.

Two types of database security mechanisms:

• Discretionary security mechanisms


•  Mandatory security mechanisms
Discretionary Access Control Based
on Granting and Revoking Privileges

The typical method of enforcing discretionary access


control in a database system is based on the
granting and revoking privileges.
Types of Discretionary Privileges

• The account level: At this level, the DBA specifies the


particular privileges that each account holds
independently of the relations in the database.
• The relation (or table level): At this level, the DBA can
control the privilege to access each individual relation or
view in the database.
Mandatory Access Control and
Role-Based Access Control
for Multilevel Security

The discretionary access control techniques of granting


and revoking privileges on relations has traditionally
been the main security mechanism for relational
database systems.
This is an all-or-nothing method: A user either has or
does not have a certain privilege.
In many applications, and additional security policy is
needed that classifies data and users based on
security classes. This approach as mandatory
access control, would typically be combined with the
discretionary access control mechanisms.
Mandatory Access Control and
Role-Based Access Control
for Multilevel Security(2)

Typical security classes are top secret (TS), secret


(S), confidential (C), and unclassified (U), where TS
is the highest level and U the lowest: TS ≥ S ≥ C ≥ U

The commonly used model for multilevel security,


known as the Bell-LaPadula model, classifies each
subject (user, account, program) and object
(relation, tuple, column, view, operation) into one of
the security classifications, T, S, C, or U: clearance
(classification) of a subject S as class(S) and to the
classification of an object O as class(O).

You might also like