You are on page 1of 48

Database Recovery

Database recovery is the process of restoring the database to a


correct state following a failure. The failure may be the result of a
system crash due to hardware or software errors, a media failure, such as
a head crash, or a software error in the application, such as a logical error
in the program that is accessing the database. It may also be the result of
unintentional or intentional corruption or destruction of data. Whatever
the underlying cause of the failure, the DBMS must be able to recover
from the failure and restore the database to a consistent state. It is the
responsibility of DBMS to ensure that the database is reliable and
remains in a consistent state in the presence of failures. In general,
backup and recovery refers to the various strategies and procedures
involved in protecting your database against data loss and
reconstructing the data such that no data is lost after failure.
Thus, recovery scheme is an integral part of the database system that is
responsible for the restoration of the database to a consistent state that
existed prior to the occurrence of the failure.
Data Storage

The storage of data generally includes four different types of


media with an increasing degree of reliability. These are:

♦ Main memory
♦ Magnetic disk
♦ Magnetic tape
♦ Optical disk.
Stable Storage

There is a Stable Storage in which information is never lost. Stable


storage devices are the theoretically impossible to obtain. But, we
must use some technique to design a storage system in which the
chances of data loss are extremely low.

The most important information needed for whole recovery process


must be stored in stable storage.
Implementation of Stable Storage
∀ ♦ RAID (Redundant Array of Independent Disk) is commonly
used for implementation of stable storage. In case of RAID,
information is replicated on several disks in form of array and each
disk has independent failure modes. So, in this case failure of single
disk does not result in loss of data. RAID system however cannot
prevent data loss in case disasters like fires or flooding.

∀ ♦ In order to prevent from disasters like fire or flooding, the
system storage on archival backup of tapes sent to off site however,
since tapes cannot be sent off site continuously and hence some data
may be lost in disasters like fire or flooding.

∀ ♦ In order to remove above problem, more secure system keep a
copy of each block of a stable storage at a remote site writing it out
over a computer network in addition to storing the block on a local
disk.
Causes of failures

∀♦ System Crashes
∀♦ User Error
∀♦ Carelessness
∀♦ Sabotage (intentional corruption of data)
∀♦ Statement Failure
∀♦ Application software errors
∀♦ Network Failure
∀♦ Media Failure
∀♦ Natural Physical Disasters
Recovery from failures

∀♦ Actions taken during normal


transaction processing to ensure enough
information exists to allow recovery from
failure.
∀♦ Actions taken following a failure to
recover the database to a state that is known
to be correct.

Recovery and Atomicity of a Transaction
In order to understand the concept of atomicity of a transactions, consider again a
simplified banking transactions that transfer Rs 50 from account A to account B with
initial values of A and B being Rs 1000 and Rs 2000 respectively.
T1
Read(A,a)
a=a-50
Write(A,a)
Output(AX)

Read(B,b)
b=b+50
Write(B,b)
Output(BX)
In this example we suppose that after each write operation output
operation is performed or in other-words we can say that this is an
example of Force-Output operation.

Suppose that a system crash occurs during the execution of transaction


Ti after Output (AX) has been taken place but before output (BX) was
executed, here AX and BX are the buffer block which contains the values
of data item A and B. Since output (AX) operation performed
successfully it means that data item A has value 950 in disk and output
(BX) does not performed successfully so its value remains 2000. It means
that now system enters into inconsistent state because there is a loss of
Rs 50 after execution of transaction T1.
In order to recover for above problem there are two simple means:
(i) Re-execute T1
If transaction T1 is re-executed successfully then value of A is Rs
900 and B is Rs 2050. It results again incorrect information because
value A is Rs 900 rather than Rs 950.
(ii) Do not Re-execute T1
The current state of system has A =950 and B=2000 which again an
inconsistent state.

Thus, in either case database is left in an inconsistent state, so this simple


recovery scheme does not work. The reason for this difficulty is that we
have modified the database without having assurance that the transaction
will be indeed commit or completed.
We have to perform either all or no database modification made by Ti.
This idea gives the concept of Atomicity of Transaction. It means that
either 100% or 0% of database modification should be performed by the
transaction. If the transaction is not completed successfully then some
modification performed by transaction stored in database and others not,
then database becomes inconsistent. Thus in order to achieve the
consistency of database transaction should have the property of
Atomicity.
To achieve our goal of atomicity, we must first output information
regarding the modification to a stable storage without modifying the
database. There are two ways to perform such outputs. These are:

♦ Log Based Recovery


♦ Shadow Paging
Log Based Recovery

It is most used structure for recording database modification. In log


based recovery a log file is maintained for recovery purpose.

Log file is a sequence of log records. Log Record maintain a record of all
the operations (update) of the database. There are several types of log
record.
<Start> Log Record:
Contain information about the start of each transaction. It has
transaction identification. Transaction identifier is the unique
identification of the transaction that starts.Representation
<Ti , start>
<Update> Log Record:
It describes a single database write and has the following fields:
< Ti, Xj, V1,V2 >
Here, Ti is transaction identifier, Xj is the data item, V1 is the old value
of data item and V2 is the modified or new value of the data item Xj.
For example < T0, A, 1000, 1100 >
Here, a transaction T0 perform a successful write operation on data Item
A whose old value is 1000 and after write operation A has value 1100.
<Commit> Log Record

When a transaction Ti is successfully committed or completed a <Ti,


commit> log record is stored in the log file.

<Abort> Log Record

When a transaction Ti is aborted due to any reason, a <Ti, abort>


log record is stored in the log file.

Whenever a transaction performs a write A, it is essential that the log


record for that write be created before the database is modified. Once a
log record exists, we have the ability to undo a transaction.
Example Suppose that a system crash occurs during the execution of
transaction Ti after Output (AX) has been taken place but before output
(BX) was executed, here AX and BX are the buffer block which contains
the values of data item A and B.
Since output (AX) operation performed successfully it means that data
item A has value 950 in disk and output (BX) does not performed
successfully so its value remains 2000. It means that now system enters
into inconsistent state because there is a loss of Rs 50 after execution
of transaction Ti.
Then in order to recover the data log record play very important role.
We just copy the old value of A i.e. 1000 from the log record to the
database stored in disk, the operation of taking the old value of the data
item from the log record is called Undo operation.
Since log record plays a vital role for the recovery of the database so the
log must reside in stable storage. Here, we assume that every log record
is written to stable storage as soon as it is created.
There are two techniques for log-based recovery:

♦ Deferred Database Modification

♦ Immediate Database Modification


Deferred Database Modification
It ensures transaction atomicity by recording all database modifications
in the log, but deferring the execution of all write operations of a
transaction until the transaction partially commits.
A transaction is said to be partially committed once the final action of
the transaction has been executed. When a transaction has performed all
the actions, then the information in the log associated with the
transaction is used in executing the deferred writes. In other words, at
partial commits time logged updates are “replayed” into database item.
It means that in this case during write operation the modified values of
local variable are not copied into database items, but the corresponding
new and old values is stored in the log record. But when the transaction
successfully performed all the operations, then the information stored at
log record is used to set the value of data items, but if the transaction fail
to complete, then the value of data item retain their old values and hence
in that case information on the log record is simply ignored.
The execution of transaction Ti proceeds as follows:
< Ti, Start > Before Ti starts its execution, a log record is
written to the log file.
< Ti, A, V2> The write operation by Ti results in the writing of new
records to the log. This record indicates the new value of A i.e. V2 after
the write operation performed by Ti.
< Ti, Commit > When Ti partially commits this record is written to
the log.
It is important to note that only the new value of the data item is required
by the deferred modification technique because if the transaction fails
before the commit then log record is simply ignored because data item
already contain old values due to deferred write operation. But if the
transaction commit the new value present in the log record are copied to
the data base item. Thus log record contain only the new values of data
item, not the old values of data item. This approach save the stable
storage space and reduce the complexity of log record.
Example
We reconsider our simplified banking system. Let T1 be a transaction
that transfers Rs 100 from account A to account B.This transaction is
defined as follows:
T1
Read (A,a)
a=a-100
write(A,a)
Read(B,b)
b=b+100
Write(B,b)
Let T2 be a transaction that withdraws Rs 200 from account C. This
transaction can be defined as follows:

T2
Read (C,c)
c=c-200
Write(C,c)
Suppose that these transaction are executed serially, in the order T1
followed by T2 and the value of account A,B and C before the
execution takes place were Rs 1000, Rs 2000 and Rs 3000
respectively.
Recovery procedure
The recovery procedure of deferred database modification is based on
Redo operation as explain below:
Redo(Ti)
It sets the value of all data items updated by transaction Ti to the new
values from the log of records.
After a failure has occurred the recovery subsystem consults the log to
determine which transaction need to be redone. Transaction Ti needs to
be redone if an only if the log contain both the record <Ti, start> and the
record <Ti, commit>. Thus, if the system crashes after the transaction
completes its execution, then the information in the log is used in
restoring the system to a previous consistence state.
Conclusion
Redo: If the log contain both records <Ti, start> and
<Ti, commit>
This transaction may be or may not be stored to disk physically. So use
Redo operation to get the modified values from the log record.

Re-execute: In all other cases ignore the log record and re-execute
the transaction.
Case a:
Crash occurs just after write (B,b), then the log record is as shown in case (a) of above
figure 11.7. In this case, since no commit record appear in the log , then it mean that no
write operation performs on the database item. So, database items just retain their old
values. The values of accounts A and B are Rs1000 and Rs2000 respectively. Thus,
when the system comes back no redo actions need to be taken, the log records of the
incomplete transaction T1 can be deleted from the log.
Case b:
Crash occurs just after write(C,c), then the log record state is shown in case (b) of
above figure 11.7. Since, transaction T1 is completed successfully before the crash,
when the system comes back then the new value of items must be copied from the log
record to the data items. It means that redo (T1) must be performed to recover the
system after the crash. The transaction T2 is not committed, so there is no need of
redo(T2) operation. Transaction T2 is re-executed. As before the log record of the
incomplete transaction T2 can be deleted for the log.
Recovery: redo (T1) is performed since the record <T1,commit> appear in the
log which results A=1100, B=2100.
T2 transaction must be re-executed; the value of C is 3000.
Case c:
Crash occurs just after <T2, commit>, then log record state is shown in
case(c) of above figure 11.7. When the system comes back, two commit
records are in the log, one from T1 and another from T2 transaction, the
operation redo (T1) and redo (T2) must be performed. After these
operations are executed the values of accounts A,B and C becomes
Rs1100, Rs2100 and Rs2800 respectively.
Recovery: redo (T1) and redo (T2)
Result A=1100,B=2100 and C=2800
Case d:
Crash occurs during recovery action. If the crash occurs during the
recovery action, then some transaction may recover and other
transactions may not recover. Thus, in order to remove above problems
recovery actions are restarted from the beginning. Because, the result of
a successful second attempt at redo is same as though redo had succeed
the first time.
Immediate Database Modification
The immediate database modification technique allows database
modification to be output to the database while the transaction is still in
the active state. The data modification written by active transactions are
called “uncommitted modification”.
If the system crash or transaction aborts, then the old value field of the
log records is used to restore the modified data items to the value they
had prior to the start of the transaction. This restoration is accomplished
through the undo operation. In order to understand undo operations, let
us consider the format of log record.
<Ti, Xj, Vold, Vnew>

Here, Ti is transaction identifier, Xj is the data item, Vold is the old value
of data item and Vnew is the modified or new value of the data item Xj.
Undo (Ti):

It restores the value of all data items updated by transaction T1 to the old
values.
Before a transaction T1 starts its execution the record <T1, start> is
written to the log. During its execution, any write (x) operation by T1 is
performed by writing of the apropriate new update record to the log.
When T1 partially commits the record <T1, commit>is written to
the log.
Example:
Let us reconsider our simplified banking system. Let T1 be a transaction
that transfers Rs100 from account A to account B and T2 be a
transaction that withdraws Rs200 from account C. Suppose these
transactions are executed serially as in previous example with initial
values of A, B and C as Rs1000, Rs2000 and Rs3000 respectively.
After a failure has occurred, the recovery scheme consults the log record
to determine which transaction needs to be undone. Transaction Ti needs
to be undone if the log contains the record <Ti, start> but does not
contain the record <Ti, commit>. This transaction is crashed during
execution. Thus, transaction Ti needs to be undone.
Redo (Ti):
Sets the values of all data items updated by transaction Ti to the new
values. These new values can be found in log record. After a failure has
occurred log record is consulted to determine which transaction need to
be redone.

Transaction Ti needs to be redone if the log contains both the record <Ti,
start> and the record <Ti, commit>. This transaction is crashed just after
partially committed. Thus transaction Ti needs to be redone.
Case b:
Crash occurs just after write (C,c), when the system comes back two recovery actions
needs to be taken, the operation undo (T2) must be performed , since the record <T2
,start> appears in the log, but there is no record <T2,commit>.The operation redo(T1)
must be performed, since the log contains both the records <T1,start> and record <T1,
commit>
Recovery: redo (T1) and undo (T2)
A=1100,B=2100 and C=3000. T2 transaction must be re-executed.
Case c:
Crash occurs just after <T2, commit>
When the system comes back both transactions T1 and T2 need to be redone, since the
records <T1,start> and <T1,commit> appears in the log as do the records <T2,start>
and <T2,commit>
Recovery: redo (T1) and redo (T2)
A=1100, B=2100, C=2800.
Checkpoints
To recover the database after some failure we must consult the
log record to determine which transaction needs to be undone and
redone. For this we need to search the entire log to determine this
information. There are two major problems with this approach. These
are:
♦ The search process is time-consuming.

♦ Most of the transactions need to redone have already written their


updates into the database. Although redoing them will cause no harm,
but it will make recovery process more time consuming.
To reduce these types of overhead, we introduce Checkpoints.
The system periodically performs Checkpoints.
Actions Performed During Checkpoints
♦ Output onto stable storage all log records currently residing in
main memory.
♦ Output on the disk all modified buffer blocks.
♦ Output onto stable storage a log record <checkpoint>.
The presence of a <checkpoint> record makes recovery process more
streamline. Consider a transaction Ti that committed prior to the
Checkpoint, means that <Ti, Commit> must appear in the log before the
<checkpoint> record. Any database modifications made by Ti must have
been written to the database either prior to the checkpoint or as part of
checkpoint itself. Thus, at recovery time, there is no need to perform a
redo operation on Ti.
The checkpoint record gives a list of all transactions that were in
progress at the time the checkpoint was taken. Thus, the checkpoints
help the system to provide information at restart time which transaction
to undo and which to redo.
• A system failure has occurred at time tf.

• The most recent checkpoint prior to time tf was taken at time tc.

• Transactions of type T1 completed (successfully) prior to time tc.

• Transactions of type T2 started prior to time tc and completed


(successfully) after time tc and before time tf

• Transactions of type T3 also started prior to time tc but did not


complete by time tf.

• Transactions of type T4 started after time tc and completed


(successfully) before time tf.

• Finally, transactions of type T5 also started after time tc, but did
not complete by time tf.
It should be clear that, in case of immediate modification technique those
transactions that have <Ti, start> and <Ti, commit> must be redo and
those transactions that have only <Ti, start> and no <Ti, commit> must
be undo.

Thus, when the system is restarted in case of immediate database


modification, transactions of types T3 and T5 must be undone, and
transactions of types T2 and T4 must be redone. Note, however that
transactions of type T1 do not enter in the restart process at all, because
their updates were forced to the database at time tc as part of the
checkpoint process.
Log-Record Buffering
As, we assumed earlier that every log record is output to stable storage
at the time it is created. This assumption imposes a high overhead on
system execution for the following reasons:
• Output to stable storage is performed in units of blocks. In most
cases, a log record is much smaller than a block. Thus, the output of
each log record translates to a much larger output at the physical
level.
• The cost of performing the output of a block to storage is
sufficiently high that it is desirable to output multiple log records at
once.
To do so, we write log records to a log buffer in main memory, where
they stay temporarily until they are output to stable storage. Multiple
log records can be gathered in the log buffer and output to stable
storage in a single output operation. The order of log records in the
stable storage must be exactly the same as the order in which they
Due to the use of log buffering a log record may reside in only main
memory (volatile storage) for a considerable time before it is output
to stable storage. Since such log records are lost if the system
crashes, we must impose additional requirements on the recovery
techniques to ensure transaction atomicity.
• Transaction Ti enters the commit state after the <Ti commit> log
record has been output to stable storage.
• Before the <Ti commit> log record can be output to stable storage,
all log records pertaining to transaction Ti must have been output to
stable storage.
• Before a block of data in main memory can be output to the
database (in nonvolatile storage), all log records pertaining to data in
that block must have been output to stable storage.
The latter rule is called the write-ahead logging (WAL) rule.
The Write-Ahead Log Protocol
Before writing a page to disk, every update log record that describes a
change to this page must be forced to stable storage. This is
accomplished by forcing all log records to stable storage before writing
the page to disk.
WAL is the fundamental rule that ensures that a record of every change
to the database is available while attempting to recover from a crash. If a
transaction makes a change and committed, the no-force approach means
that some of these changes may not have been written to disk at the time
of a subsequent crash. Without a record of these changes, there would be
no way to ensure that the changes of a committed transaction survive
crashes. Note that the definition of a committed transaction is effectively
“a transaction whose log records, including a commit record, have all
been written to stable storage”.
When a transaction is committed, the log detail is forced to stable
storage, even if a no-force approach is being used.
Example

The database is stored in nonvolatile storage (disk) and blocks of data


are brought into main memory as needed. Since main memory is
typically much smaller than the entire database, it may be necessary to
overwrite a block B1 in main memory when another block B2 needs to
be brought into memory. If B1 has been modified B1 must be output
prior to the input of B2.

The rules for the output of log records limit the freedom of the system to
output blocks of data. If the input of block B2 causes block B1 to be
chosen for output, all log records pertaining to data in B1 must be output
to stable storage before B1 is output.
Thus, the sequence of actions by the system would be as follows:
• Output log records to stable storage until all log records pertaining
to block B1 have been output.
• Output block B1 to disk.
• Input block B2 from disk to main memory.

It is important that no writes to the block B1 be in progress while the


preceding sequence of actions is carried out.
Failure with Loss of Nonvolatile Storage
Until now, we have considered only the case where a failure results in
the loss of information residing in volatile storage while the content of
the nonvolatile storage remains intact. Although failures in which the
content of nonvolatile storage is lost are rare we nevertheless need to be
prepared to deal with this type of failure. In this section, we discuss only
disk storage.
The basic scheme is to dump the entire content of the database to stable
storage periodically-say, once per day. For example we may dump the
database to one or more magnetic tapes. If a failure occurs that results in
the loss of physical database blocks, the most recent dump is used in
restoring the database to a previous consistent state. Once this restoration
has been accomplished, the system uses the log to bring the database
system to the most recent consistent state.
More precisely, no transaction may be active during the dump procedure
and a procedure similar to check point must take place:

3. Output all log records currently residing in main memory onto stable
storage.

5. Output all buffer blocks onto the disk.

7. Copy the contents of the database to stable storage.

9. Output a log record <dump> onto the stable storage.


Steps 1,2 and 4 correspond to the three steps used for checkpoints.

To recover from the loss of nonvolatile storage, we restore the database


to disk using the most recent dump. Then the log is consulted and all the
transactions that have committed since the most recent dump occurred
are redone. Notice that no undo operations need to be executed, because
dump already contain old values of data items.

A dump of the database contents is also referred to as an archival dump,


since we can archive the dumps and use them later to examine old states
of the database. Dumps of a database and check pointing of buffers are
similar.
Problems with simple dump operation:

• The entire database must be copied to stable storage, resulting in


considerable data transfer.

• Since transaction processing is halted during the dump procedure,


CPU cycles are wasted.

Fuzzy dump schemes have been developed, which allow transactions to


be active while the dump is in progress.

You might also like