You are on page 1of 30

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/220420693

Disaster Recovery Techniques for Database Systems

Article  in  Communications of the ACM · November 2000


DOI: 10.1145/352515.352521 · Source: DBLP

CITATIONS READS
18 2,444

3 authors, including:

Manhoi Choy Man Hon Wong


My World NHL Stenden University of Applied Sciences
29 PUBLICATIONS   331 CITATIONS    72 PUBLICATIONS   1,336 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Protein-DNA Interaction View project

All content following this page was uploaded by Manhoi Choy on 17 July 2015.

The user has requested enhancement of the downloaded file.


Recovery Techniques For Database Systems*
JOOST S. M. VERHOFSTAD
BNSR, 522 Unwersity Avenue, Toronto, Ontarm M5G 1 W7 Canada

A survey of techniques and tools used in filing systems, database systems, and operating
systems for recovery, backing out, restart, the mamtenance of consistency, and for the
provismn of crash resistance is given.
A particular view on the use of recovery techmques in a database system and a
categorization of different kinds of recovery and recovery techmques and basic principles
are presented. The purposes for which these recovery techniques can be used are
described Each recovery techmque is illustrated by examples of its application in exmtlng
systems described in the literature.
A main conclusion from this survey is that the recovery techmques described are all
useful; they are applied for different purposes and m different envLronments. However, a
certain trend in the increasing use of specific techniques during the past few years can be
noted. Another main conclusion is that there are still enormous integrity and recovery
problems to be solved for parallel processes and distributed processing.
Keywords and Phrases: audit trail, backup, checkpoint, database, differentml files, filing
system, incremental dumping, recovery.
CR Categories: 1.3, 3.73, 4.33, 4.9

INTRODUCTION state of the system is processed by algo-


rithms of the system. T h e t e r m error is, in
Recovery techniques can be used to restore
this context, used for t h a t part of the state
data in a system to a usable state. Such
which is "incorrect." An error is thus a
techniques are widely used in filing systems
piece of information which can cause a fail-
and database systems in order to cope with
ure [MELt77].
failures. A failure is an event at which the
In order to cope with failures, additional
system does not perform according to spec-
components and abnormal algorithms can
ifications. Some failures are caused by
be added to a system. T h e s e components
hardware faults (e.g., a power failure or disk
and algorithms a t t e m p t to ensure t h a t oc-
failure), software faults {e.g., bugs in pro-
currences of erroneous states do not result
grams or invalid data), or h u m a n errors
in later system failures; ideally, they re-
(e.g., the operator m o u n t s a wrong tape on
move these errors and restore t h e m to "cor-
a drive, or a user does something uninten-
tional). A failure occurs when an erroneous rect" states from which normal processing
can continue. T h e s e additional components
and abnormal algorithms, called recovery
* The research work described here was done while t e c h n i q u e s , are the subject of this survey.
the author was at the Computing Laboratory, Clare-
mont Tower, Claremont Road, University of Newcas- T h e r e are m a n y kinds of failures and
tle upon Tyne, UK. therefore m a n y kinds of recovery. T h e r e is

General pernussion to make fatr use m teaching or research of all or part of this material is granted to individual
readers and to non-profit libraries acting for them provided that ACM's copyright notice m given and that
reference is made to the publication, to its date of issue, and to the fact t h a t reprinting privileges were granted
by permission of the Association for Computmg Machinery. To otherwise reprint a figure, table, other substantial
excerpt, or the entire work requires specific permission as does republication, or systematic or multiple
reproduction.
© 1978 ACM 0010-4892/78/0300-0167 $00.75

Computing Surveys, Vol. 10, No 2, June 1978


168 • J. S. M . Verhofstad
CONTENTS rather, acceptance that such recovery data
cannot be not totally reliable).
Techniques and utilities that can be used
for recovery, crash resistance, and main-
taining consistency after a crash are de-
scribed in this survey. Included are descrip-
tions of how data structures should be con-
INTRODUCTION
structed and updated, and how redundancy
OVERVIEW OF RECOVERY TECHNIQUES should be retained to provide recovery fa-
SALVATION PROGRAMS cilities. This survey deals with recovery for
INCREMENTAL DUMPING
AUDIT TRAIL data structures and databases, not with
DIFFERENTIAL FILES other issues that are also important when
BACKUP AND CURRENT VERSIONS
MULTIPLE COPIES
processes operate on data, such as locking,
CAREFUL REPLACEMENT security, and protection [LIND76].
DIRECTIONS FOR FUTURE RESEARCH AND RELATED One possible approach to recovery is to
WORK
SUMMARY AND CONCLUSIONS
distinguish different kinds of failures based
ACKNOWLEDGMENTS on two criteria: 1) the extent of the failure,
REFERENCES and 2) the cause of the failure. The three
kinds of failures typically distinguished in
many database systems are: a failure of a
program or transaction; a failure of the total
system; or a hardware failure. Different
"recovery procedures" are then used to
T cope with the different failures. A good
description of such an approach to recovery
has been given by Gray [GRAY77].
always a limit to the kind of recovery that However, recovery is approached from a
can be provided. If a failure not only cor- different angle in this paper. To be able to
rupts the ordinary data, but also the recov- recover, two kinds (i.e., functionally differ-
ery data--redundant data maintained to ent kinds) of data are distinguished: 1) data
make recovery possible--complete recov- structures to keep the current values and 2)
ery may be impossible. As described by recovery data to make the restoration of
RandeU [RAND78], a recovery mechanism previous values possible. This paper exam-
will only cope with certain failures. It may ines:
not cope with failures, for example, that are • how these data structures and recovery
rare, that have not been thought of, that data can be structured, organized and
have no effects, or that would be too expen- manipulated to make recovery possible
sive to recover from. For example, a head (all this is referred to as the recovery
crash on disk may destroy not only the data technique);
but also the recovery data. It would there- • how the data structure can interact
fore be preferable to maintain the recovery with the structure of the recovery data;
data on a separate device. However, there • what kinds of failures can be coped
are other failures which may affect the sep- with by the different organizations;
arate device as well--for example, failures • what kind of recovery {e.g., restoration
in the machinery that writes the recovery of the state at time of failure, the pre-
data to that storage device. vious state, and so on) can be provided
Recovery data can itself be protected using these organizations;
from failures by yet further recovery data • and how different techniques can be
which allow restoration of the primary re- combined in one system to cope with
covery data in the event of its corruption. different failures or to provide different
This progression could go on indefinitely. kinds of recovery (e.g., one technique
In practice, of course, there must be reli- may be used as a fall back for another
ance on some ultimate recovery data (or one).

Computing Surveys, Vol 10, No. 2, June 1978


Recovery Techniques for Database Systems • 169
T h e first approach used in most commer- directory. Objects are m a p p e d onto records
cially available data base systems is based which comprise secondary storage.
on the r e q u i r e m e n t t h a t the system be able Users m a y add, delete, and update data.
to undo, redo, or complete transactions. A T h e database is in a correct state if the
transaction is the unit of locking and re- information in it consists of the most recent
covery; it therefore appears to the user as copies of data put in the database by users
an atomic action. T h e recovery techniques and contains no data deleted by users. A
considered in this paper can be used to database is in a valid state if its information
implement this atomic p r o p e r t y for user is part of the information in a correct state.
actions on the database, but the techniques This implies t h a t there are no spurious
are described from the data structuring data, although some information m a y have
point of view. been lost. A database is in a consistent state
For the purpose of this survey the notions if it is in a valid state, and the information
of filing system and database system are it holds satisfies the users' consistency con-
treated as synonymous. T h e definition of straints.
the notion of database given by Martin It is assumed here t h a t a correct state is
[MART76] is used here: A database is a also a consistent state. For example, if a
collection of related storage objects to- machine is suddenly halted, and a database
gether with controlled r e d u n d a n c y to serve state is defined at t h a t time, t h e n this state
one or more applications; the data in the is called the correct state. If no state is
database is stored so as to be independent defined for the database when the machine
of programs using them; a single approach is halted, a salvager can be run to delete
is used to add, modify, or retrieve in the parts of the database in order to restore the
database. system to a defined state: a valid state. If
A database m a y consist of a n u m b e r of this salvager also makes sure t h a t user's
files. A file is a logical unit in the database, consistency constraints are maintained,
used to group data. T h e data t h a t can be t h e n the restored state is a consistent state.
retrieved by users from the database forms Consistency will have to be a well-defined
the information in the database. Thus, if notion for every database. Different sorts
some of the data stored in the database of consistencies (possibly at different levels
becomes irretrievable, some information is of abstraction) or degrees of consistencies
lost. [GRAY76] m a y be defined. No more precise
T h e concepts of database, file, and infor- definition of consistency will be given here.
mation are logical. T h e physical database T o illustrate these definitions with an-
is held in secondary storage. Secondary other example, consider a user who main-
storage is nonvolatile storage space, which tains a source file, and an object file which
retains the database w h e t h e r or not it is has been produced by compiling the source
online (mounted on a storage device unit file. T h e database will t h e n be in a correct
and readable by computer). Secondary state if the most recent source and object
storage consists of physical records, which files are available. T h e database will be in
are the smallest accessible units. Records a valid state if a source file and an object
are read or written by a storage device unit file, but not necessarily the most recent
at the request of the computer. ones, are available. T h e database will be in
A database is an abstraction of secondary a consistent state only if corresponding
storage provided to the user by the data- source and object files are available.
base system. T h e database system imple- A failure of the system occurs when t h a t
ments the user operations on the database system does not m e e t its specifications. Re-
and implements the data structures on sec- covery is the restoration of the database
o n d a r y storage. Objects are the substruc- after a failure to a state t h a t is acceptable
tures from which these data structures are to the users. T h e notion of "acceptable" is
built. Examples of objects are: a logical different for different environments; in gen-
record, a h e a d e r of a file, a linked list of eral, "acceptable" will m e a n correct, valid
pages or logical records, and an e n t r y in a or consistent. A recovery technique pro-

Coraputmg Surveys, Vol. 10, No 2, June 1978


170 • J. S . M . Verhofstad

vides recovery from certain kinds of fail- marion. An example of this is given below
ures. Within a single system, there may be and illustrated in Figure 1.
several different recovery techniques cor- A recovery technique maintains recovery
responding to different kinds of failures. d a t a to make recovery possible. It provides
However, it is common to structure the recovery from any failure which does not
techniques into a hierarchy; the most gen- affect the recovery data or the mechanisms
eral ones deal with the largest set of fail- used to maintain these data and to restore
ures, but are the most expensive. The re- the states of the data in the database. Fail-
covery technique for the smallest set of ures are classified into two groups with
failures is usually the most efficient tech- respect to a recovery technique. A failure
nique and involves minimal loss of infor- with which a recovery technique can cope

The states assumed durlng


processlng without
failures.

The states that can be


assumed in B.

C , / % The states that can


be assumed in C.

D ~ / The states that can


F be assumed In D.

The states after


fallures from whlch
no recovery is
posslble.
FIGURE 1. T h e s t a t e space ~ r a d a t a b a s e s y s t e m with severM recovery t e c h m q u e s , coping with s u b s e q u e n t l y
larger s e t of h f l u r e s .

CompuUng Surveys, Vol 10, No 2, June 1978


Recovery Techniques for Database Systems 171
is a crash of the system with respect to that OVERVIEW OF RECOVERY TECHNIQUES
recovery technique. A failure with which a
recovery technique cannot cope is called a Different kinds of recovery are possible for
catastrophe with respect to that technique. a database. The "kinds of recovery" that
A system using three recovery techniques we consider are in fact "qualities of recov-
could, for example, consist of the following ery," which can be useful in comparing and
subsystems (see also Figure 1): evaluating different recovery techniques.
A) The database system without any re- The kinds of recovery considered are:
covery techniques. 1) Recovery to the correct state.
B) "A" plus a recovery technique that 2) Recovery to a correct state which ex-
uses built-in redundant pointers in isted at some moment in the past (i.e.,
data structures to be able to recover a checkpoint).
from certain failures causing partic- 3) Recovery to a possible previous state;
ular errors in the data structures. this would allow, for example, resto-
C) "B" plus a recovery technique that ration of a set of previously existing
does not use built-in redundancy in states of files that may not have ex-
data structures, but maintains isted simultaneously before.
backup copies of (parts of) the data 4) Recovery to a valid state.
structures. 5) Recovery to a consistent state.
D) "C" plus a recovery technique that 6) Crash resistance (explained below).
keeps a complete backup copy of the Crash resistance is provided if the nor-
database on a separate device. mal algorithms of the system operate on
These systems could be built using an the data in such a manner that after certain
approach similar to the "safe-program- failures the system will always be in a cor-
ming" approach described by Anderson rect state, i.e., the state the system was in
[ANDE75]. The bigger the damage, the cru- before the last operation on the data was
der the recovery technique used. Restora- started (or possibly the last series of oper-
tion of the correct state is most desirable ations). Thus, crash resistance obviates the
and can be done, say, in B. However, if the need for recovery techniques to cope with
damage is such that recovery in B is not a certain class of failures.
possible, then restoration to a consistent, Crash resistance differs from other kinds
but not necessarily the correct, state may of recovery. Whereas other kinds of recov-
be the only alternative in C. ery explicitly restore states, crash resist-
No single recovery technique or series of ance maintains correct states by the way
recovery techniques can cope with every data are manipulated and maintained dur-
possible failure. Many different kinds of ing normal processing. Thus, in a sense,
recovery procedures have been developed, crash resistance restores states implicitly.
each technique with its own particular ad- These differences are fully explained later
vantages and disadvantages, but each ena- in this survey. The notion of crash resist-
bling the system to cope with different ance cannot be made more precise than the
kinds of failures in different environments. notion of consistency, for the two are re-
In the following sections the categories of lated. However, it is not necessary to be
recovery techniques known and used at more precise to achieve the objectives of
present are briefly described; the kinds of this survey.
recoveries they provide and the relation- A checkpoint is a (presumably correct}
ships among the techniques are given. Next, past state; it may have been made by re-
the different techniques are defined and cording the past state explicitly. Check-
described in detail, along with a considera- points are used by recovery techniques of
tion of the purposes for which they can be kinds 1, 2 and 3 (but not necessarily 4 and
employed, and of the systems which use 5, see definitions). Checkpoints can be es-
them. Finally, some conclusions are drawn tablished either for files or for the whole
and some recent trends are described. database. The creation of a checkpoint is

Computing Surveys, Vol 10, No 2, June 1978


172 • J. S. M. Verhofstad
called checkpointing. Establishing a check- 2) Incremental dumping Incre-
point explicitly creates a backup version, mental dumping involves the copying
which is a complete copy of the check- of updated files onto archival storage
pointed file (or database). The term back- (usually tape) after a job has finished
ing up means restoring the state of the or at regular intervals. It creates
previous checkpoint. checkpoints for updated files. Backup
"Backing up" should not be confused copies of fries can be restored after a
with a different term, "backing out." The crash.
term backing out is related to processes or 3) A u d i t t r a i l An audit trail records
transactions. A process is backed out if all sequences of actions on fries. It can be
the effects of the operations performed by used to restore fries to their states
that process are undone. This means that prior to a crash or to back out partic-
only the files affected by the process need ular processes. It can also be used for
to be restored. Backing out of some proc- certifying that rules and laws are ob-
esses may be required, for example, to re- eyed in the system. An audit trail
solve a deadlock or to undo the operations provides the means to back out a proc-
of a failing process. Backing out is a special ess whereas incremental dumping
sort of recovery of kind 3: only the data merely provides the means to restore
affected by the programs that are backed fries to previous consistent states.
out are restored. So the total database is 4) D i f f e r e n t i a l files A file can con-
restored to a state which has been termed sist of two parts: the main file which
a "possible previously existing state." is unchanged, and the differential file
For the purpose of this paper, techniques which records all the alterations re-
used for recovery, restart, and maintenance quested for the main file. The main
of consistency are divided into seven cate- fries are regularly merged with the
gories. (This categorization is, of course, not differential files, thereby emptying the
the only possible one.) The remainder of differential fries. Records in the differ-
the survey deals with these seven recovery ential files can be stored with the
techniques. Systems described in the com- process identifier, a time stamp, and
puter literature are used to illustrate how other identification information to
the recovery techniques have been imple- provide such special facilities as au-
mented. Some of the systems described diting, recovery, or crash resistance. A
may have been changed over the past few differential file is a type of audit trail,
years, so the descriptions of all systems may but the actual updates have not yet
not be up to date anymore. However, the occurred. The differential file can also
purpose of the examples is to illustrate how be used to implement crash resistance.
the techniques have been used, rather than 5) Backup/current version The
to give accurate up-to-date descriptions of fries containing the present values of
actual systems. existing files form the current version
1) S a l v a t i o n p r o g r a m A salvation of the database. Files containing pre-
program is run after a crash to restore vious values form a consistent, backup
the system to a valid state. It uses no version of the database. Backup ver-
recovery data. (It is the only tech- sions can be used to restore files to
nique considered here which does not previous values.
use recovery data.) It is used after a 6) Multiple copies More than one
crash if other recovery techniques (us- copy of each file is held. The different
ing recovery data) fail or are not used, copies are identical except during up-
or if no crash resistance is provided. date. A "lock bit" can be used to pro-
This program scans the database after tect a file during updating, while its
a crash to assess the damage and to state is inconsistent. If there is an odd
restore the database to some valid number of fries, comparison can be
state. It rescues the information that done to select a consistent version.
is still recognizable. This technique provides crash resist-

Computing Surveys, Vol. 10, No 2, June 1978


Recovery Techniques for Database Systems • 173
ance; it may be used to detect faults tween the techniques are apparent:
if the different copies are kept on dif- • The differential file technique makes
ferent devices or handled by different incremental dumping very easy to im-
processors. plement. Incremental dumping, in gen-
The difference between multiple eral, copes with failures that the differ-
copies and backup/current version is ential file technique cannot handle
like the difference between T M R and {this is not apparent from Figure 2).
standby: with multiple copies all cop- However, the kinds of recovery pro-
ies are active, while with backup/ vided by the differential File techniques
current version there is only one ac- are preferable to those provided by in-
tive copy {other copies could even be cremental dumping (as shown in Fig-
off-line). ure 2). Thus, the two techniques may
7) C a r e f u l r e p l a c e m e n t The prin- complement each other very well.
ciple of the careful replacement • The audit trail technique is an alter-
scheme avoids updating any part of a native to differential files, careful re-
data structure "in place." Altered placement, or multiple copies; it can be
parts are put in a copy of the original; used to restore the correct state after
the original is deleted only after the a crash. It is therefore seldom used in
alteration is complete and has been practice in conjunction with these
certified. The difference between this other methods, although it may be
and the other methods is that two used in combination with one of them
copies exist only during update. The in order to provide (the same kind of)
technique is used to provide crash re- recovery from different failures. The
sistance, for the original will always audit trail can thus be used to provide
be available in case a crash occurs recovery from the failures for which
during update. the other techniques are also used,
A cross-reference table between the cat- even though audit trail may be less
egories of recovery and the recovery tech- efficient.
niques is given in Figure 2. Strictly speak- • Multiple copies and careful replace-
ing, the table is incomplete because, for ment may be used either as alterna-
example, an audit trail or differential files tives or as complements which provide
can be used to restore a valid state. How- crash resistance against similar types
ever, missing cross-references indicate tech- of failures. (We will return to this
niques that would never be used for those shortly.)
purposes since one can always do better • Also the incremental dumping, the au-
(e.g., restore the correct state rather than a dit trail, the differential files, and the
valid state). backup/current version techniques can
From the description of the techniques be used as alternative techniques to
in Figure 2 the following relationships be- provide recovery from particular fail-
1)Cor- 2) Pre- 3) Pos. 4) 5) Con- 6)
rect wous Prev Vahd slstent Crash
State State State State State Res~st-
ence

Salvation Program * *
Incremental Dumping * *
Audit Trail * * *
Diff. F i l e s * * *
Backup Current * *
Multiple Copies *
Careful Replacement * *

FIGURE 2. A cross-reference table indicating for what purposes the various recovery techniques can be used

Computing Surveys, Vol. 10, No. 2, J u n e 1978


174 • J. S. M. V e r h o f s t a d

ures or to c o m p l e m e n t each other to SALVATION PROGRAMS


provide (the same kind o0 recovery
from different failures. A salvation program in a database system
is used after a crash to restore the database
• T h e salvation program as a recovery to some consistent state. T h e salvation pro-
technique is a last resort, used if all gram tries to restore the state of the data-
othe~ techniques fail. It cannot bring base as it was before or at the time of the
the database back to a previous state. failure. However, some files or data m a y be
It merely rescues what is left. However, lost. A salvation program scans through the
a salvation program can be used as a data structures and tries to reconstruct the
recovery technique for recovery data database or restore consistency, possibly at
r a t h e r t h a n for the database. For ex- the cost of deleting some files or data.
ample, a salvation program can be used A salvation program is needed after a
to restore the audit trail immediately crash if the data kept on secondary storage
after a failure of the system. T h e re- is not kept in a consistent state all the time,
stored audit trail can t h e n be used to or if no other recovery technique is avail-
restore the database. In this case the able to cope with the failure. Otherwise
salvation program is used r a t h e r early. there is no need for such a program.
T h e seven techniques u n d e r discussion One reason the data on secondary storage
provide recovery, crash resistance, and might be inconsistent after a crash would
maintenance of consistency in one of three be the loss of buffers kept in main storage.
ways: Some inconsistent files m a y have to be
• T h e way in which the data is struc- deleted because of [SMIT72]: violation of
tured. T h e multiple copies, differential standard error checks on reading a file;
files and backup techniques are part of conflict resulting from the same storage
the structure of the database. having been assigned to more t h a n one file;
• T h e way in which the data is u p d a t e d or conflict (e.g., on the file length) deter-
and manipulated. T h e careful replace- mined from r e d u n d a n t information (e.g.,
m e n t technique is a crash-resistant from a file header).
way of updating complex data struc- A system m a y use data buffers (for the
tures. It has been shown [VERH77b] database) and audit buffers (for audit
t h a t this also sets special constraints tapes). After a crash there m a y be no way
and requirements for the data struc- to tell which updates recorded in the audit
tures. trail have been written to the database and
which were still in data buffers; there m a y
• T h e provision of utilities. T h e salvation be no way to tell which successful updates
program, incremental d u m p e r and au- were recorded on audit trail tape or were
dit trail facility are utilities which have still in audit buffers at the time of the crash.
nothing to do with the way in which Thus, the audit trail m a y not succeed in
the data is structured or updated. T h e y restoring the database to its state at the
could be regarded as external utilities point of failure.
which can usually be added to any Several systems such as IMS [IBM] or
database system without great diffi- the CMIC system [GIoR76], when running
culty. on machines using core storage, first use a
Unfortunately, this division of the tech- salvation program which tries to rescue the
niques into three groups is too coarse: it contents of the buffers in main storage in
can be misleading in cases where different order to close the audit trail tapes properly.
techniques in one group c o m p l e m e n t each However, if the contents of main storage
other or different techniques from different are lost, restoration of the correct state is
groups are alternatives. T h e seven tech- not possible.
niques are therefore discussed separately, At present LSI memories are widely
and examples are given of systems on which used. However, the contents of LSI mem-
t h e y are implemented. ories generally do not survive power fail-

Computing Surveys, Vol 10, No 2, June 1978


Recovery Techniques for Database Systems • 175
ures. IMS therefore has implemented, in previous state that could be restored with-
the last few years, the "log tape write ahead out the use of the cap-files.

i! protocol." This procedure forces the audit


trail to be written before the object in the
database is written. Thus, buffers can still
be used as long as the system conforms to
This state can be yet further enhanced
by the use of checkpoint files. The check-
point files can be used to restore files to a
checkpoint state; thus, the processing of
the "protocol." transactions can be restarted from check-
A system in which a salvation program is points. For each version of each file (several
of great importance is the HIVE system versions of each file are maintained) the
[TAYL76] (here the program is called the check sums are evaluated to detect partially
recovery procedure). The system consists updated and corrupted pages; where possi-
of a fLxed number of virtual processors ble, the appropriate updating and back-
(VPs) which are assigned permanently to tracking from other versions is carried out.
execute cyclically particular functional ap- (Only one version can be corrupted during
plication programs. A processor cycle, per- a crash because different versions were kept
forming such a particular function, is trig- on different disks, and only one version is
gered by a message received from another updated during a transaction. The other
¢
VP or from outside the network of VPs. versions are updated only after transactions
Capabilities (descriptions for resources are completed and the updated version is
¢:
available) for the necessary code and per- in a new correct state. A corrupted state
manent data areas are given to the VPs at can be detected using the check sums. Only
system build time, and the message routes a catastrophe, such as a fire in the computer
%
between VPs are also set up permanently. center, could corrupt more than one ver-
Only the files for which permanent ca- sion.)
pabilities have been created at system-ini- Other systems in which a salvation pro-
tialization can be restored after a crash. gram is used to recover the disk contents
Thus, a possible previous state of the sys- after system failure have been described,
tem is restored: the files that existed at for example, by Lockemann and Knutsen
system initialization time are restored to [LocK68], Daley and Neumann [DALE65]
the states they were in before the current (salvage procedure), Fraser [FRAS69]
transactions started. Also: (start-up procedure), and EMAS [EMAS
• transaction checkpoints can be made 74]. (See also the surveys in [TONI75] and
I by writing the data of each transaction [MASC73].)
into a common, permanent, safe-
guarded checkpoint file, which can be INCREMENTAL DUMPING
accessed and recovered after a crash; Incremental dumping is used to copy up-
and dated files onto archive storage (usually
• files may be created dynamically and tape); it checkpoints files that have been
capabilities for them may be put in altered. Incremental dumping is normally
special files called cap-files. done after a job is finished, but can also be
The recovery procedure run after a crash done at regular intervals, while continued
restores in main memory the read-only im- use is made of the files, thereby providing
age, which also contains recovery code. The more frequent checkpoints. After a crash
main task of the recovery procedure is a has occurred the incremental dump tapes
garbage collection by scanning all files in can be used to bring all the files to their
the database. Files whose capabilities are previous consistent state, so that jobs com-
kept in cap-files are processed first. These pleted before the crash will not be lost. All
files were created after system initialization updates performed by jobs running at the
and can be restored (using the cap-files) to time of the crash may not be restored com-
the states they were in before the current pletely by the processing of the incremental
transactions were started. The system is dump tape after a crash, because some ac-
thus restored to a possible previous state tive files may not have been dumped in
which is an enhancement of the possible time.

Computing Surveys, Vol 10, No 2, June 1978


176 • J.S.M. Verhofstad

Fraser [FRAS69] gives a very good de- correct, detect, and report wherever
scription of the technique used at Cam- possible any inconsistencies in the file
bridge, which makes complete copies of hierarchy and storage tables. The sys-
updated disk files every 20 minutes. (See tem can be made available immedi-
a l s o [WILK75].) ately after this; some files may still
In the original MULTICS system [DALE need to be reloaded, but they are
65] all disk files updated or created by the marked as such. Directories and sys-
user are copied when the user signs off. All tem files are reloaded first in order to
newly created or modified files, which have make the system available to users
not previously been dumped, are also cop- again as quickly as possible. There are
ied to tapes once per hour. The original parallel processing capabilities for the
MULTICS scheme provides high reliabil- reloading operation; and no unneces-
ity. However, overheads in resources and sary searches on dump tapes are made
processing time are far too high; recovery by the reloader. Only missing files are
time after failure is too high; and the system reloaded.
must be shut down periodically for backup • The dumper can avoid unnecessary
purposes. It is most discouraging that the searching in the tree because of the use
situation steadily worsens with system of dates and times in the directories in
growth. the tree. Also the reloader avoids un-
The design of a greatly improved backup necessary searches and file reloads.
mechanism has been described by Stern The currently used recovery techniques
[STER74]. It is based upon the original in MULTICS are similar to those described
backup mechanisms contained in the MUL- by Stern. However, since the publication of
TICS system. Compared with the original Stem's thesis a major change has been
system, this new design lessens overhead, made to the MULTICS filing system: a new
drastically reduces recovery time from sys- and very different storage system has been
tem failures, eliminates the need to inter- incorporated. The backup system has also
rupt system operation for backup purposes, been changed to deal with the new storage
and scales up significantly better with on- system. The concept of logical volumes has
line storage growth. Some of the major been introduced. A logical volume encom-
features of this new scheme are: passes a set of real devices. The directory
• Incremental dumping is used to keep hierarchy is on a separate volume, and any
the backup system up to date. directory in the hierarchy can be placed on
• A complete secondary dump super- a new volume, with all its decendants on
sedes the complete incremental dump- the same volume. For dumping onto tape a
ing history of the system: Rather than volume dumper is used which dumps one
dumping the entire secondary storage, volume at a time; no tree walk is used. Each
the procedure updates a checkpoint of physical device has a contents table which
the total system. A partial secondary is a list of pages of the segments on the
dump supersedes part of the incremen- device. Incremental dumping is done for
tal dumping history. Secondary dumps pages on volumes rather than files. Thus,
are similar to change accumulation sets the scheme used is basically the same as
in IMS [GRAY77]. the one proposed by Stern, but used at a
• A shadow copy of a file can be created lower level (for volumes and pages rather
to make sure that the incremental than for the hierarchy of files).
dumper dumps a consistent version. The EMAS system [EMAS74] provides
• Each storage device holds a complete an automatic checkpointing facility for
subtree or several subtrees of the fries. Files are part of the user's virtual
file hierarchy (the MULTICS file sys- memory and cannot be accessed through
tem is organized as a tree of files [DALE the paging mechanisms until they have
65]). This minimizes the effects of the been connected {i.e., the virtual memory
loss of one device. disk address mapping has been set up).
• After a failure a "salvager" is used to When a user process is created, its virtual

Computing Surveys, Vol 10, No 2, June 1978


Recovery Techniques for Database Systems • 177
memory space is created, initialized, and for example, in Figure 3, pages 13, 6, and 21
copied to disk. When the process is run, the are updated and therefore replaced (for rea-
working set is in main memory, and pages sons described later), in this case by pages
are transferred back and forth between core 5, 3, and 19. The shadow bits and cumula-
and drum. A page may be forced to disk tive shadow bits for those pages are, there-
because the drum gets full, or the process fore, set in the current page table. The
becomes dormant again. If a page is forced cumulative shadow bits for pages 8, 21 (now
to disk, all of the updated virtual memory replaced by 19), 10, and 4 were set already.
pages are forced to disk at the same time. When the current state of the segment is
This mechanism is required by the "con- saved (i.e., replaces the old copy), the
sistency" rule in the EMAS system. There- shadow bits are switched off, and the old
fore, a suitable restart copy of the virtual pages of the backup version, having been
memory of a process (which includes the replaced by the new versions from the cur-
fries) is provided on the disk. The problem rent copy, are released. Checkpoints of all
of inconsistencies between the state of the the segments are taken regularly. This in-
process and the states of its associated files volves the copying of all the page tables for
is avoided because the filing system uses which at least one cumulative shadow bit is
the resources provided by the paging sys- switched on; the cumulative bits are copied
tem. The paging system assumes complete into the long term bits and then switched
responsibility for maintaining a consistent off. A process P is started periodically.
backup copy of all the state variables of the Process P is a system process which copies
process (including files). Consequently, if onto tape all of the pages of all segments in
the consistency rule is always obeyed, au- the systems for which the cumulative
tomatic checkpointing is provided. shadow bit is on at checkpoint time. The
Incremental dumping can be done as part long term checkpoint bits are used to make
of an audit trail scheme [MAsC71, RAND 70, sure that subsequent saves will not release
MASC73]. An audit trail gives only the the pages before P has copied them.
changes made to files from given states If in Figure 3 the transaction were closed
onwards. These states are redefined regu- in the situation shown, the contents of the
larly for reasons of efficiency (so that audit current page table would be copied into the
trail journals do not become too long). For backup page table. If, subsequently, a
example, in a system described by Wim- checkpoint of the segment page table were
brow [WIMB71], files are dumped when taken, the cumulative shadow bits would
they have to be reorganized because they be copied into the long term shadow bits,
have become disorderly as a result of the involving the setting of the long term
operations performed. In the CMIC system shadow bits for pages 4, 10, 19, 3, 8, an d 5.
[GIOR76] all fries are checkpointed regu- All of the cumulative shadow bits would
larly at moments when no user has the then be switched off.
database open. A special checkpoint file is used in HIVE
Another scheme designed for System R [TAYL76] to record transactions. Informa-
[LoaI77], but not implemented, works as tion put in the checkpoint files can be re-
follows (see Figure 3): Each segment (which covered after a crash to restart those trans-
is similar to the concept of file) consists of actions. Individual transactions can also be
a page table with pointers to the data pages. reprocessed using this file.
Associated with each pointer in the page
table are three bits: a shadow bit, a cumu- AUDIT TRAIL
lative shadow bit, and a long term shadow An audit trail records the sequence of ac-
bit. When a segment is updated a backup tions performed on a file [BJOR75]. The
and a current copy are maintained (in a audit trail contains information about the
way to be described later). For every page effects of the operations, the times and
which is updated, the shadow bit and cu- dates at which the operations occurred, and
mulative bit are set in the page table entry the identification codes of the user (or user
of the segment containing the page. Thus, programs) issuing the operation.

Computing Surveys, VoL 10, No 2, June 1978


178 J. S. M. Verhofstad
DATA PAGES

SEGMENT Sp
current page table
MASTER
!
4 .. 7 [0 I19 3 8 Fs 27
0 0 0 I ]- 1 0 1 0 s.b.

1 0 1 1 1 i !l 0 c.s.b.

0 0 0 0 0 0 0 0 l.t.s.b.

backup page table

4 7 [0 21 6 8 13 27

0 0 0 0 0 0 0 0

1 0 1 1 0 1 0 0

0 0 0 0 0 0 0 0

I __

1 2 3 4 5 6 7 8 9 i0 ii 12 13

I0 0 0 1 0 1 1 1 0 1 0 0 1 o • m
bl~
map

bit
map
FIGURE 3. The implementation of segments in System R.

Audit trails can be used for several dif- started. The audit trail can be pro-
ferent purposes, such as: cessed backwards for backing out. Also,
• Crash recovery. If backup versions of a single transaction (or job) can be
files are reinstalled, an audit trail can backed out in case a deadlock occurs or
be used to perform operations on them, the transaction fails. The data affected
thereby restoring their states at the by the transaction can be restored to
time of the crash [CURT77]. their state before the transaction (or
• Backing out. If a system crashes (with- job) was started.
out damaging secondary storage) the Certifying system integrity. The audit
files affected by the processes running trail can verify that rules and policies
during the failure can be restored to dictated by laws, business agreements,
their states before those processes and the like, are being followed [BJOR

Computmg Surveys, Vol 10, No 2, June 1978


Recovery Techniques for Database Systems • 179
75]. Bjork has concluded that audit after a crash the database will, in general,
trails will be the major integrity tools be in an inconsistent state, and the audit
for shared data usage beginning in the trail will be incomplete; so it cannot be used
late 1970s. However, there are reasons to restore the correct state (as illustrated in
to believe that other techniques will Figure 4). (However, another possibility
also be used. For example, the tech- would be to salvage the buffers from main
niques of differential files and incre- storage after a system crash, thus making
mental dumping could provide the possible the proper closing of the audit trail
same facilities, although in a more tape. This has been tried, not always with
complicated way. success, in IMS [IBM] systems using core
A very extensive description of the use of memory and in the CMIC system [GIoR
audit trails for crash recovery and backing 76]. The buffers cannot be salvaged if the
out has been given by Gray [GRAY77]. contents of main storage are lost after the
Traditional recovery techniques for filing crash.) The log tape write ahead protocol
systems [FRAS69, WILK75] may be insuffi- avoids this problem.
cient to prevent the loss of the changes The audit trail can be used to back out a
caused by the most recent operations per- process which may have interacted with a
formed in the filing system. The usual second process in such a way that the sec-
method, incremental dumping, checkpoints ond process will have to be backed out. The
the files at regular intervals, but operations audit trail can then be used to back out
performed on files after the last checkpoint that second process which may have inter-
will be lost if a crash occurs. This does not acted with a third process, and so on. Thus,
matter in many operating systems because using an audit trail to back out unfinished
jobs can be resubmitted or operations can transactions performed by interacting proc-
be redone. However, in systems where up- esses, can lead to a "domino" effect [RAND
dates are made online from different 75, GIOR76, CURT77].
sources, such as in banking or airline res- A locking scheme, such as the one used
ervations [MASC71, RAND70, WIMB71, in system R [ASTR76] and many commer-
TONI75], this method may be unacceptable: cially available systems [CURT77], can be
one cannot afford to lose any update should used to avoid these problems by making
such systems fail. these interactions impossible. This solution
Another reason that traditional recovery can only be applied if these interactions are
techniques for filing systems may be insuf- not necessary (i.e., they are accidental
ficient is that data management systems rather than deliberate). The backing
are physically organized very differently scheme will make it possible to use an audit
than filing systems {some implementations trail (or a "log" as it is called in many
of relational database systems might be systems) to undo partially finished trans-
said to be exceptions to this general rule). actions.
The difference in design parameters would Special checkpoints made at moments
make a scheme such as Fraser's unsuitable when no user is active [Gloa76], have been
for most data structures used in existing proposed, since it is not possible to back up
database management systems. In systems past such checkpoints, and thus the domino
like these an audit trail can provide a solu- effect would be halted at these points. How-
tion. Before a transaction is performed in ever, such occasions occur too infrequently
the database, it is recorded on the auditing in busy systems for this scheme to be help-
tape; this procedure is carried out without ful; frequent forcing of all users to become
buffers [WIMB71] or by implementing a inactive would be impractical.
"log tape write ahead protocol" [GRAY77] Another form of audit trails appears in a
to protect against crashes that destroy the system described by Lampson and Sturgis
mainstore contents. [LAMP76], under the name of intention
Buffers (see Figure 4) may lead to incon- lists. An intention list specifies the opera-
sistencies between the database and the tions to be performed by a processor. A
audit trail [GIoR76]. If the buffers are lost processor, which is a node in a network,

Computing Surveys, Vol 10, No. 2, June 1978


180 • J.S.M. Verhofstad
CURRENT DATA BASE

D a t a B a s e on.
Data Base
Secondary
Buffers
Storage

Users processes

CURRENT AUDIT TRAIL

. f

The c u r r e n t d a t a b a s e is a l w a y s c o n s i s t e n t w l t h the
c u r r e n t a u d i t trail, h o w e v e r the d a t a b a s e on s e c o n d a r y
s t o r a g e is, in g e n e r a l , not c o n s l s t e n t w l t h the a u d l t
t r a i l on s e c o n d a r y storage.

FIGURE 4, A general database system using audit trad.

may receive an intention list, containing database updates, intention lists contain
the specifications of the operations to be operations not yet performed (although
performed on its local database. Intention physically the audit trail may be written
lists, like the audit trail used in the CMIC first, as is the case when log tape write
system [GIoR76], can be reprocessed with- ahead is used). In other words, issuing an
out backing out the interrupted process. intention list is an event that could be reg-
Intention lists, once received and accepted, istered in the audit trail. Processing the
cannot be lost unless a catastrophe occurs, intention list is similar to processing an
such as a head crash. They are safeguarded audit trail during crash recovery or backing
because they are stored on disk at a fixed out.
place and are not altered when processed.
Unless the processor malfunctions, the op- DIFFERENTIAL FILES
erations specified in the intention list will
always be done. Under the differential file (also called
Whereas audit trails record completed "change set") scheme the main files are

Computing Surveys, Vol. 10, No 2, June 1978


Recovery Techniques for Database Systems • 181
kept unchanged until reorganization. All the bits for a record are set or not before
changes that would be made to a main file accessing that record. If the bits are set the
as a result of transactions performed are record is probably in the differential file. It
registered in a differential file. The differ- is shown analytically how to keep the prob-
ential file will always be searched first when ability of a filtering error low; this error
data is to be retrieved. Data not found in occurs only when the bit map suggests
the differential file is retrieved from the wrongly that a record is in the differential
main database. The most recent entry for file. The hashing function maps the record
a given record in the differential file must address onto a number of bits in the bit
always be retrieved. map (see Figure 5). The bits for a particular
Severance and Lohman [SEVE76] fully record may be set because each of them
describe the technique, and also an efficient occurs in at least one set of bits associated
hashing method to implement it (see also with another record in the differential file.
Figure 5). They use a small associative Severance and Lohman [SEVE76] also
memory, in the form of a bit map accessed describe the advantages of differential files
by the hashing scheme, to reduce the prob- for recovery, integrity, the implementation
ability of making an unnecessary search in of incremental dumping schemes, and the
the differential file. The database system simplicity of software. Another advantage
checks the bit map (see Figure 5) to see if claimed is the possibility of performing

The data base


system o p e r a t i n g
on the data b a s e

The data base

read-only data base


read/wrlte
differential
file

blt map

[01
i O lIO l O ....... 1 .... 1 .... 0

hashing functlon (record r) = -bit pattern-

The blt map suggests that record r is in the d i f f e r e n t i a l


file, b e c a u s e the blts set in the bit patterns p r o d u c e d
by the h a s h i n g function are set in the bit map.
FIGURE 5. A ~ f f e r e n t i ~ Me techmque using a h ~ h i n g s c h e m e .

Computing Surveys, Vol 10, No. 2, June 1978


182 • J. S. M. V e r h o f s t a d

queries which do not need the exact values 4) If the number is neither in
of all files; such queries access a suitable TRNSDONE nor is the number of the
(current) view of the data without locking current transaction, then the previous
out the update transactions. version of the record is taken (the one
The disadvantages of the approach using pointed to by the retrieved entry from
differential files are [Lore77]: the MODFILE, or in the main file),
• An access to a data element appears because this means that the record
slow: if an initial search of the differ- was put in the MODFILE by an un-
ential file reveals that the data element completed transaction before a fail-
is not among the modifications, the ure.
required element must be fetched from 5) MODFILE entries and system varia-
the database. However, Severance and bles are forced out to disk at the end
Lohman show that this problem can be of each transaction. Thus the entire
almost completely overcome with MODFILE mechanism is check-
hashing for many systems and appli- pointed after each completed trans-
cations; they also show how to con- action.
struct a good hashing scheme for par- Differential files are used in a system,
ticular systems. described by Titman [TITM74], for both
• Eventually the differential file must be efficiency and reliability. The way in which
merged with the main database--an ordinary files are kept makes insertions or
operation which can be slow. This will deletions very expensive. In Titman's sys-
certainly be a big problem if the system tem the files are binary relations which are
needs to be available without interrup- stored in highly compressed fixed-length
tion. blocks. Elements are identified by a block
• Since an update can affect an element number and the sequence number of the
which has already been modified, the element in the block. An insertion or dele-
differential files must be suitably or- tion requires the complete reorganization
ganized. A hashing scheme [SEvE76, of the file giving the elements new identi-
RAPP75] ameliorates this problem. fiers. For reasons of efficiency, an "add set"
Differential files are, for example, used in and a "delete set" of elements are kept for
the VADIS system, where they are called each file. For reasons of reliability, a
MODFILEs [RAPP75]. For every file in the "change set" is also kept for each file; it is
system there is a MODFILE. The system used to register the changed records. The
has been developed to facilitate recovery add set, delete set, and change set together
after power failure. Completed transactions form the implementation of a differential
will never have to be undone after a power file. The main files are kept on a separate
failure. Transactions not completed before device which is never written on except
the failure are not undone explicitly; but during reorganization. These files can be
their effects are ignored using the MOD- duplicated on tapes for recovery. Check-
FILEs and a TRNSDONE file as follows: pointing is carried out by saving the add,
1) Each entry in a MODFILE has a delete, and change sets.
header with: record type, transaction
code, pointer to previous modification
BACKUP AND CURRENT VERSIONS
of the same record, time, transaction
number and some other identification Backup versions of files or databases can
codes. be kept in order to make possible the res-
2) There is a TRNSDONE file which toration of the files to a previous state.
contains the numbers of the com- For example, many file-editors produce
pleted transactions. a complete new version of a file while a user
3) For every record fetched from a is editing a file. The original file remains
MODFILE the transaction number is unchanged during the edit-session. The
compared with the TRNSDONE new version is a complete new copy; it is
numbers and the current transaction not achieved, for example, by using a dif-
number. ferential file. If a user notices a blunder

Computing Surveys, Vol 10, No 2, June 1978


Recovery Techniques for Database Systems • 183
during the edit session, the original copy of to be updated so that bits 13, 6, and 21 are
the file will not be lost. reset, and 5, 3 and 19, are set. Two copies
Incremental dumping (of current ver- of the bit map are maintained. A MASTER
sions) can be used as a utility to maintain table points to the current map. The cur-
a backup version of a filing system or da- rent bit map always reflects a checkpoint
tabase. Copies of altered files can be used state of the system (i.e., all of the pages
to update a backup version of the whole pointed to by the backup versions of the
system or database. This is done in the page tables). The SAVE.SEGMENT oper-
Cambridge filing system [FRAs69], where ations can be done at random moments in
two processes are used: one makes incre- time.
mental dumps and the other creates ver- This current and backup versions recov-
sions of the system. ery technique is used in System R for re-
Similarly, complete copies of the data- covery from system failures to restore a
base can be made regularly in order to checkpoint state. The log (audit trail) is
make possible the restoration of the data- then used to redo transactions completed
base to an earlier state. For example, the before the failure. In addition the log is
original version of MULTICS [DALE65] used to back out transactions which were
prepared a weekly dump; it included all incomplete at the checkpoint time. The log
files which had been used within the last is also used in System R to back out trans-
several weeks plus all the system files. actions when a transaction fails or a dead-
An optimized version of the lock occurs.
backup/current versions has been designed The backup/current version technique
for System R [LoRI77] and is used in other described above, is not designed in System
systems for segments (synonymous to the R for recovery after a failure where second-
notion of files). Figure 3 illustrates this ary storage is destroyed; incremental dump-
mechanism. (Figure 3 is not complete; for ing is used for this purpose instead. After
example, the MASTER is duplicated. How- the SAVE.SEGMENT operation the MAS-
ever, sufficient details are shown to illus- T E R table is made to point to the up-to-
trate the mechanism discussed here.) For date bit map. This scheme provides the
each segment a page table is used to locate possibility of restoring a segment to its last
the data pages. There are two copies of consistent state {held in the backup ver-
each page table, which are identical after a sion} and of restoring consistency after a
SAVE.SEGMENT. If a page of a segment system failure. The operations of unfin-
is altered for the first time after a ished transactions, performed before the
SAVE.SEGMENT or OPEN.SEGMENT failure will be lost; these transactions will
operation, its new value is put in a newly have to be restarted.
allocated page, and the current version of Physically separate backup and current
the page table is updated to point to the versions of the page table provide backup
new page. and current versions of the segment under
For example, in Figure 3, page 13 in seg- this scheme. The logically different versions
ment Sp is updated. The new value of the of a segment overlap (physically) in their
page is therefore, in this case, placed in implementation where they are the same.
page 5, and the current page table is made
to point to page 5 instead of page 13. The MULTIPLE COPIES
backup version remains unaltered. When a
SAVE.SEGMENT is issued the buffers are The technique of multiple copies includes
forced to disk, modified pages are released two techniques:
(i.e. the old versions), and the current ver- • Keeping an odd number of copies of
sion is copied into the backup version. So, the data. If a majority of the copies
for example in Figure 3, pages 13, 6, and 21 have the same value, then that value is
will be released. taken as the correct one. This tech-
This releasing of pages causes the bit nique is then called majority voting.
map used to indicate the free pages to be • Holding two copies with flags to indi-
updated. In Figure 3, the bit map will have cate "update-in-progress." An incon-

Computing Surveys, Vol 10, No 2, June 1978


184 • J. S. M. V e r h o f s t a d

sistent copy (or suspicious copy) is al- The technique of two copies with flags is
ways recognizably inconsistent, be- used in recovery for segments in System R
cause of the flags used; if the system [LORI77]. A M A S T E R table is used to in-
crashes during update the flag will still dicate which segments are open or closed
be set after the crash. and which bit map (two copies are held) is
Except during update, the multiple cop- up-to-date (see Figure 3). Two copies of the
ies must always have the same value. If the M A S T E R table, both containing the same
different copies are updated by the same information, are kept to ensure that, if the
processor then an "update-in-progress" flag system crashes while the M A S T E R table is
(or "damage flag" [CURT77]), used if there being updated, only one copy will be left
are only two copies, provides crash resist- behind in an inconsistent state; the other
ance. A consistent copy can always be re- copy has either the new state or the state
trieved after a system restart; this copy will the table was in before the update started.
have either the value it had before the (Only one copy of the M A S T E R table is
update in progress during the crash, or the drawn in Figure 3, but two copies are used
new value. The inconsistent copy can al- as described above.) The copy that is in an
ways be recognized as such and discarded. inconsistent state can always be identified.
Majority voting may also be used to detect Similarly, two copies of the M A S T E R
incorrectly performed operations; this is es- directory in the filing system of GEORGE
pecially useful if different processors update 3 are maintained [NEWE72], and two bits
the different copies. Faulty processors can to make possible the distinction between a
then be detected and ignored or discon- valid and invalid copy after a crash during
nected. update.
The important difference between the System HIVE [TAYL76] maintains two
multiple-copies technique and other tech- read-write versions for every file. This pro-
niques, such as backup/current version or vides one of the characteristics oZ a cycle
careful replacement, is that with the mul- (see the section on salvation programs):
tiple-copies technique the different copies The local effects can be undone as long as
always have the same value except during the cycle has not yet finished. During a
update of the actual physical data struc- cycle one of the two versions is updated. At
ture. Thus, if the two copies hold consistent the end of the cycle this version is copied
values they must be equal. Backup or in- onto the other version. The system knows
cremental dumping schemes could also be which of the two versions is the one up-
used to keep different copies that hold the dated during a cycle. Crash resistance is
same value except during a job or transac- therefore provided for individual files.
tion. In these cases there is always a pri- Apart from this, a checksum is maintained
mary copy and a backup copy, which may for each version. This, generally, enables
hold different, but consistent, values from partially updated or corrupted files to be
the database point of view. The technique detected. In general two copies and two
of copying onto tape at the completion of flags (bits) are sufficient to provide crash
each updating, is different from the multi- resistence. The flag indicates whether a
ple copies technique. The multiple copies copy is suspicious or not. If the two copies
must exist all the time; physically, they do are updated immediately after each other,
not overlap, and they have equal status. as in System R, then a copy is likely to be
Schemes based on different backup or ar- inconsistent if its flag is set. In system
chival versions, are definitely not examples HIVE, however, the two copies are kept on
of the multiple copies technique. separate storage devices, so the checksum
Majority voting data has been used ex- allows detection of incorrectly performed
tensively for space flight applications, such operations. Although selective redundancy
as in the space shuttle system where four is not uncommon, system HIVE is one of
computers are configured to receive the the few existing systems in which more
same input data and calculate the same than one complete copy for every safeguard
outputs [SKLA76]. file is maintained to provide crash resist-

Computing Surveys, Vol 10, No 2, June 1978


Recovery Techniques for Database Systems • 185
ance. One copy is updated during a cycle, ment has its own disadvantages:
and the second copy is updated in pages • The file or data structure must be
which correspond to the changed pages of tractable. For example, implementing
the first copy. a file as a hst of linked pages could be
prohibitively expensive. This expense
CAREFUL REPLACEMENT arises because replacing a page re-
quires updating the link in the page
The objective of careful replacement is to pointing to it. Careful replacement re-
avoid updating data structures "in place." quires updating that link in a copy of
{Some bit or pointer must always be up- the page. Thus, the change in one page
dated "in place" [NEWE72, VERH77a]; how- can propagate replacements through
ever, this feature will not be elaborated the whole list. The constraints and re-
further in describing the careful replace- quirements that careful replacement
ment technique.) The update is performed sets for the data structures are fully
on a copy of a component (record, page, described elsewhere [VERH77a,
disk-block), which replaces the original VERH77b].
only if the update is successful; and the • Overhead costs are incurred in disk
copy is kept until after the replacement is accesses. In GEORGE 3, where this
made successfully. The same is done for technique is used for fries, but not for
objects in the data structure which point to the much more heavily used MAS-
those objects. There are two instances of TER-directory, the measured over-
the data structure only during update; head reported by NeweU is surprisingly
otherwise there is just one copy, which low. The method is also used in the
contains the current value. MU5 system [GAMB73]. So the over-
This technique differs from differential head can be a disadvantage which does
files, which accumulate update requests for not outweigh the advantages.
unaltered originals. However, careful re- Files in the system described by Lampson
placement could be used to merge the main and Sturgis [LAMP76] are updated using
file with the differential file. With the care- careful replacement during the processing
ful replacement technique the two "virtual" of the intention lists. System R uses the
copies are held only during an update (or basic idea to update segments.
perhaps within a recovery scope specified The CMIC system also uses this tech-
by the programmer [VERH77a]); this makes nique [GIoR76]. The storage structure used
the update or sequences of updates as safe is similar to B-trees [KNUT73]. If an inser-
as possible by reducing the chance of being tion is made in a full track, two new tracks
left with an inconsistent copy or mutually are obtained and the contents of the full
inconsistent files. The two instances of the track plus the new entry are put in these
data structure are referred to as two "vir- two new tracks (see Figure 6). The same is
tual copies" because they overlap in iden- done for the index table which contains the
tical objects. There are two different de- pointers to the data tracks.
scriptors to access the two different in- This method also uses the leaf-first rule,
stances. which states that copying of information to
This technique is fully explained by Gam- slower memory {e.g., from main storage to
ble [GAMB73] for a filing system. Files con- drum, or from drum to disk) is done in such
sist of data pages pointed to by a tree of a way that no descriptor or pointer can ever
directory pages. A master directory points reference a block at a faster level of the
to each top directory page of the files. If a device hierarchy. This ensures that the
file is updated using careful replacement, it (sub-)structure on mass storage is always
can always be restored to its state prior to valid, for it will always be a valid and con-
update. sistent tree. Every pointer or descriptor on
This approach avoids the three disadvan- mass storage always points to valid struc-
tages of differential files [LORI77] men- tures on that mass storage: no parts of that
tioned above. However, careful replace- data structure will still be on faster mem-

Computing Surveys, Vol 10, No 2, June 1978


18 6 • J. S. M. Verhofstad

index index
free free s p a c e
T1 space T1 stack
stack T7
T2 T2
T13
T5
T9
T7
TI3

T1 T2 T1 T5 T9 T2
A0 B0 A0
A0 A3 B0
A1 B1 A1 A1 A4 B1

A2 B2 A2 A2 ! A5 B2
i
A3 B3 A3 i A6 B3

A4 B4 A4
I B4

A5 A5 B5

. . . . . . . . . . . . . . . . . M . . . . . . . . . . . . . . . . . . . .

index index
free free s p a c e
T5 space T5 stack
stack
T9 T1
T7 T7
T2 TI3 T2 TI3

T1 -'-~T5 NT9 T2 ;T5 ~9 4,


T2
A0 A0 A3 B0 A0 A3 B0

A1 A1 A4 B1 A4 B1

A2 A2 A5 B2 A5 B2

A6 B3 A6 B3

A4 B4 B4
i
i A5

The free space stack contalns the a d d r e s s e s of the


free r e c o r d s .

FIGURE 6. T h e m s e ~ l o n of a n e n t r y A5 m a ~ H t r a c k m a s t o r a g e s t r u c t u r e a s u s e d i n CMIC, in ~ u r s t e p s

Computing Surveys, Vol 10, No 2, June 1978


Recovery Techniques for Database Systems • 187
ory. It also ensures that no data on mass present; they are not removed and put back
storage is ever removed from the structure during updating.
during update. Instead, replacement-is used The careful replacement technique is of-
by using new tracks when necessary. Data ten used in filing systems using a hierarchy
structures, pointers, and descriptors in the of devices by employing the leaf-first rule
tree structure on mass storage are always and the root-segment rule (see Figure 7).

1 . H
2 3

1 1 1

DISK

3 ¸

26

15 21

1 1 1

DRUM

%
3 9 3 ~9

1 1 1

CORE

U
Three possible subsequent sltuatlons.
Where: Vl = a value of a data page
FIGURE 7. The c a r e ~ l replacement technique in a hierarchy of devices.

Computing Surveys, Vol 10, No 2, June 1978


188 • J. S. M. V e r h o f s t a d

T h e root-segment rule states that if a data ory copy but not consolidated in the drum
page is on a particular level of storage in copy yet, will be lost. If the contents of the
the hierarchy, every directory page be- drum are also lost, the updates which were
tween it and the root of the file is on that consolidated in the drum copy, but not yet
level or a faster level (the root is the top in the disk copy, will be lost as well.
directory in the tree of directory pages). A system using the leaf-first and root-
These two rules mean that careful replace- segment rules has been described by
ment is used at every level in the hierarchy. Schwartz [SCHw73]. Files are trees of pages
Were the contents of the main memory to which are either index pages (i.e., directory
be lost after a crash, the drum and disk pages) or data pages (as in Figure 7). A file
would still have two copies of the file: the descriptor is the root of the index table; a
disk copy contains an old value, the drum directory file contains the descriptors. In
copy contains a newer value of the file that system, and in CMIC too, the directory
{some pages of the file are on drum and file is updated "in place." If absolute crash
others on disk). resistance were to be provided in this sys-
Figure 7 shows how updates on a main- tem, the multiple copies technique could be
memory version of a file are subsequently used for the directory, as is done in
made to the copies of that file held on drum GEORGE 3 [NEWE72].
and disk using these two rules. T he example If the recovery is to be provided for trans-
in Figure 7 shows the pages of a file as held actions consisting of more than one update,
on disk, drum, and in core during process- replaced pages can be updated "in place."
ing, at three different subsequent moments This technique has been used at Newcastle
in time: time 1, 2, and 3. At time 1, the copy [VERH77a] to provide recovery for files
on disk shows that the change made to page within user defined scopes. Scopes can be
8 and the addition of page 26 into directory nested, as with "spheres of control"
page 3 have not yet been consolidated in [BJOR72]. These scopes are termed "recov-
the disk copy of the file. T he copy in core ery blocks" [RAND75]. A recovery block
shows that page 8 has been updated again, consists of an acceptance test and a set of
and that the update has not yet been con- alternative algorithms. T he first algorithm
solidated on the drum. Similarly, at time 2 is executed first, followed by the evaluation
the copies on disk, drum, and core show of the acceptance test. If an error occurs
that the update of page 8 and the addition during the execution of an alternative, or
of page 26, consolidated on drum at time 1, the acceptance test is not done successfully
are now consolidated on disk. T he update {either it fails or it is never done), all the
of page 8 in core at time 1 is now consoli- operations performed so far are undone,
dated on drum. In the meantime other up- and the next alternative is invoked, if pos-
dates and additions have been made to the sible. (A power failure still causes all the
file in core. Finally, the copies at time 3 operations to be undone [VERH77a] how-
show yet more updates, and show how they ever.)
are consolidated. T he same is now done for the second
Although pages are updated "in place," alternative. If all the alternatives of a re-
this technique can be classified as being a covery block are exhausted, an error is gen-
careful replacement technique because up- erated in the outer recovery block, or the
dates are made in fast memory, and the program fails ff there is no outer recovery
replacement takes place on slow memory block. A copy of a file is maintained for
containing the files. It is a careful replace- every level of nesting in which the file has
ment technique because every valid tree been operated upon; these copies contain
structure is consistent. If the contents of identical nonupdated pages. T he current
both main memory and drum are lost, there value of the file is in the latest copy. Recov-
will remain a consistent copy on disk. If the ery means restoring the copy as it was
contents of main memory are lost, the up- before the current recovery block was en-
dates which were started on the main mem- tered. On exiting a recovery block success-

Cornputmg Surveys, Vol. 10, No 2, June 1978


Recovery Techniques for Database Systems • 189

fully, the updated part is substituted in the Process A Process B Process C

original that existed just before the recov- Time


ery block was entered.
DIRECTIONS FOR FUTURE RESEARCH AND
RELATED WORK
Recovery among interacting processes has
1 read x l

not been dealt with explicitly in this survey.


These problems can be significant. For ex- update x2
ample, the designers of System R encoun-

J
tered major difficulties when trying to use read
the shadow mechanism (as in the x2

backup/current version technique de-


scribed previously) for recovery in a multi- read
x2
user environment. The scheme is now used
only for system recovery after a total sys-
tem failure. Some of the problems of pro-
viding recovery for interacting processes FIGURE 8. A generalization of Russell's diagrams
have been described elsewhere [RAND75, [Russ77].
GIOR76, ASTR76, RAND78, CURT77]. This
topic is a subject of ongoing research. A It seems that in our diagram (Figure 8) a
major difficulty is that processes being process could be checkpointed each time it
backed out past interactions performed has updated some global data, and before
may require other processes to be backed some other process can read that data. This
out, creating a "domino" effect [RAND75]. could be done using techniques similar to
The problems with the domino effect cacheing as discussed by Randell
could be explained by generalizing Russell's [RAND75]. However, the system still has to
diagrams [Russ77] for producer-consumer keep track of the interactions that have
systems. Vertical lines, in such diagrams taken place (i.e., see Figure 8) to determine
are then used to denote the progress made which other processes to back out to which
by processes in time. An arrow from one specific points in time.
process A to another process B occurs in The cacheing technique could be used to
such diagrams if A last updated data that reset a process; a recovery technique can be
is read by B. (See Figure 8.) This arrow used to back out the database; and the
now denotes that effectively a message was same or some "companion" technique could
sent by A when it updated the data and be used for each process "to put messages
was received by B when it read that data. back." This latter technique would have to
Each time the data is read a message has provide the facility to reread the same value
been sent and received. without restoring the shared data, since
As shown by Randell [RAND75], it is not that may have undesired effects on other
known which processes to checkpoint at processes.
which moments in time unless this diagram The main problems are to keep track of
is known in advance. Russell [Russ77] the diagram, and given the diagram, to
makes use of that knowledge for his pro- decide upon which processes to back out to
ducer and consumer systems. The diagram which points in the event of a process fail-
is not known in advance in a database sys- ure. These problems are at present being
tem, since it is not known which operations addressed at the University of Newcastle
(programs), sharing the same database, will upon Tyne.
be performed in parallel. Even if it were There are two approaches to the problem
possible to predict which programs will run at present:
concurrently, it might not be possible to • Prevent the interactions. Preventing
predict which interactions will occur. interactions is feasible only when the

Computing Surveys, Vol 10, No 2, June 1978


190 • J.S.M. Verhofstad

interactions are not required. In this bilities can, therefore, in fact be a


case processes can be executed one at means of implementing locking. The
a time, in some sequence which can use of capabilities can prevent un-
be implemented by explicit locking wanted interactions like locking
[ G R A Y 7 6 ] o r implicit locking [ B A N A schemes, and therefore limit the risk
77]. The explicit locking schemes are that errors will lead to much damage
widely used in database systems, most before being detected.
of which abandoned implicit locking a • S y n c h r o n i z e t h e p r o c e s s e s w i t h respect
few years ago. to recovery. To avoid arbitrary backing
A user or a program can request out among processes that must inter-
various modes of access to an object; act, severe constraints are needed.
these modes include exclusive or RandeU suggests a "conversation"
shared access. Some access modes are [RAND75] which is a recovery block
incompatible: if one user has access to with locks that make all processes en-
an object in a certain mode, then an- ter and exit together. Processes in a
other user is refused access to the same conversation may not interact with
object in an incompatible mode. For those outside. In case of failure, proc-
example, multiple reads of a file are esses need to be backed out only to the
allowed, but writing precludes other beginning of their conversation. "Re-
file operations. Such locking schemes covery lines" are needed to back out all
prevent unwanted interactions, while the processes [RAND78]. A recovery
allowing shared access when it will pro- line is a set of consistent checkpoints.
duce no interference among programs.
Implicit locking occurs in monitors These approaches are designed to over-
[HOAR74], which are used to imple- come the domino effect of backing out in-
ment resource allocation algorithms. teracting processes. At worst, all the proc-
Programs seeking to acquire or release esses that have interacted with a failing
resources invoke monitor procedure process will have to be backed out to their
calls. The monitor locking scheme en- initial states. The two approaches above
sures that only one process at a time force the creation of recovery points (i.e.,
will update an object; hence uniprocess checkpoints) in such a way that recovery
recovery schemes are possible. Other lines are formed to minimize backing out in
processes are prevented from updating case a failure occurs.
that object until the process which is The first approach is well understood,
updating that object concludes a unit and many systems use successful schemes
of recovery, such as a transaction based on it. One efficient way to implement
[GRAY77] or a recovery block the first approach is used in IMS. The
[BANA77]. However, locking by using system queues all modified records, be-
monitors (or "serially reusable pro- tween checkpoints; this is similar to con-
grams" as they are called in OS/360) structing an intention list [LAMP76]. Thus,
has been shown to be disastrous in if a program updates records, other pro-
database systems, for high concur- grams are prevented from accessing these
rency. updated records until that program termi-
A similar way in which the restric- nates or issues a checkpoint. The program
tions can be enforced is by using a can be backed out to its last checkpoint,
capability architecture [GRAY70, without undoing effects to other programs,
DENN76] to implement a high degree before it terminates.
of error confinement [LIND76]. Capa- The second approach, however, is not yet
bilities permit sharing, but will, when well understood. If the first approach has
used wisely, limit the number of pro- not been used, it is generally, in existing
cesses to be backed out. A capability is systems [CURT77], up to the operator or
a protected key or password for using data administrator to analyze the situation
storage objects and procedures. Capa- and take appropriate action. Whenever the

Computing Surveys, Vo| 10, No 2, June 1978


Recovery Techniques for Database Systems 191
first solution is not possible, many systems decrease of $900 could be given as com-
accept the possibility of the domino effect. pensation. This could be cheaper than
These systems must fall back on check- backing out the transaction on the da-
pointing the total system at regular inter- tabase.
vals [GIOR76], or synchronizing the check- Little has been said here about the costs
points of the various programs so that roll- of recovery itself. A recovery procedure
back can be performed to a common point may put the data structures back into con-
in time [CURT77]. This solution may be too sistent states; or it may restore the data to
cumbersome to be acceptable in many dis- a previously existing state. There are few
tributed systems. papers which have examined the most com-
This is one of the major problems of mon failures or the degrees of recovery
integrity in distributed database systems. suitable for different environments. This
Also, the implementation of the first ap- survey indicates guidelines rather than de-
proach (i.e., prevent the interactions) in tailed analyses of some degrees of recovery.
distributed systems, such as systems using The cost of the error detection scheme
intelligent terminals for example, is not must also be taken into account. There
trivial. Gray [GRAY77] describes a "recov- have been few papers reporting systematic
ery protocol" which makes recovery possi- approaches to software error detection.
ble by implementing two important re- Yet, error detection is absolutely essential
quirements: the decision to commit the op- to make recovery techniques useful. Many
erations performed during the transaction system structures, system concepts, and
on data at different nodes in the distributed language concepts have been developed to
data base system {i.e., the successful com- make the inclusion of tests for error detec-
pletion of the transaction), has been cen- tion easier. They provide a framework
tralized in a single place (i.e., the decision which allows the user or programmer to
is taken by one processor). embed such tests in a structured manner.
This survey has discussed recovery by Examples of this are capability systems
state restoration. Whenever a failure oc- [DENN76], recovery blocks [RAND75], and
curs, a state which is believed to be error- security and access control as provided in
free is restored before attempting to con- recently designed languages. However,
tinue further operation. Other recovery these concepts only provide a framework in
techniques are still under investigation, for which to embed tests, rather than the ac-
example: tual error detection mechanisms them-
selves. Error detection schemes that em-
• Error diagnosis and repair. Instead of ploy tests could be designed using two ap-
restoring a state when an error is de- proaches: 1) Test if algorithms perform
tected, an attempt could be made to completely according to their specifica-
identify and repair the cause of the tions; and 2) Distinguish certain types of
error [RAND78]. This may be very dif- errors, and test for their presence (or ab-
ficult, not only because different errors sence}.
may be caused by one fault, but differ- Testing the validity of all of the input
ent faults may cause the same error. data and parameters of procedures is an
• Compensation. Rather than undoing example of a systematic error detection
operations by state restoration, it could scheme based on the second approach.
be attempted to nullify the impacts of However, few systematic ways in which
these operations by compensating for tests can be constructed, using either
their effects. This can be achieved by approach, are reported in literature
providing supplementary corrective in- [RAND78]; little is known about the costs
formation [DAvI72, RAND77]. For ex- of error detection schemes. Error detection
ample, if a database had been updated in system software is generally done in an
to indicate that an employee has been ad hoc fashion using the second approach.
given a $1000 wage increase, instead of Some initial work on the construction of
an intended $100 increase, then a wage (run time) tests, using the first approach,

Computing Surveys, Vol 10, No 2, June 1978


192 • J. S. M. V e r h o f s t a d

has been described elsewhere [VERH77b]. may be needed to be sure data is consistent
The techniques described in this survey after a crash; its use may cause some data
provide recovery for files based on second- to be lost.
ary storage. Verhofstad has some prelimi- This combination can be improved by an
nary extensions to data types more complex audit trail, a safeguard against losing up-
than files, and recovery procedures for dif- dates; this is done in IMS. The recovery
ferent levels of a system [VERH77b]. Ran- facilities in IMS are extensive but not sys-
dell has discussed "nested recovery" using tematic; there is neither a general approach
recovery blocks [RAND75]. The relation to nor a dominant technique, as in the Cam-
careful replacement has been discussed by bridge system or VADIS. IMS provides an
Verhofstad [VERH77b]. Beyond such pre- enormous range of facilities; 50% of the
liminary works, little is published about the code was said to be for recovery purposes
systematic recovery of program or data ob- [INFO75], although a more recent source
jects. stated that this figure was around 17% in
1978 (J.N. Gray of IBM Research Labora-
SUMMARY AND CONCLUSIONS
tory, San Jose, Calif., supplied this infor-
mation in a private communication in Feb-
This paper has described many of the tech- ruary 1978). However, the application pro-
niques used to implement backing out, grammer, it seems, needs to build his own
crash recovery, crash resistance, and con- mechanisms and utilities, certainly if high
sistency. These techniques can be used in integrity is required. The programmer also
different environments, for different pur- has to make explicit checkpoints if they are
poses; they can complement each other. required.
Figure 9 is a cross-reference table; it shows The loss of any completed update can
which techniques discussed in this survey also be avoided by using careful replace-
are used in which particular systems. This ment or multiple copies as in GEORGE 3,
table may be incomplete for the systems it HIVE, CMIC and System R. It may also
covers (e.g., System R may have some sort avoid the need for a salvation program (as
of salvation program); however it summa- in GEORGE 3). Also the differential files
rizes the most important features of the technique is very powerful and can be used
systems as reported in the literature. to provide recovery facilities and crash re-
It appears that for filing systems, where covery.
short term losses are not considered serious, Audit trail with backup, or incremental
the combination of incremental dumping, a dumping with backup, or audit trail with
complete backup version of the system, and incremental dumping and backup, or mul-
a salvation program suffices for a high de- tiple copies, are all techniques for recovery
gree of reliability. This approach has been from more serious failures, which other re-
successful in MULTICS, the Cambridge covery techniques cannot handle.
system, and EMAS. A salvation program It is difficult to make a comparison be-
System R GEORGE
[AsTn76] IMS 3 HIVE MULTICS Cambrufge EMAS CMIC VADIS Newcastle
[Loaff7] [IBM] [NEwE72] [TAr1 76] [DALE65] [FR^~69] [EM^~74] [GIoR76] [RAPP75] [VERH77a]

~alvatlofl
Pro-
~am
|ncremental * * *
l~umpmg
Audit Trail
Differential
Fdes
Backup * * *
Current
Multiple * *
Copies
Careful Re-
placement

FIGURE 9. A cross-reference table of systems and recovery techniques.

Computing Surveys, Vol 10,No 2,June 1978


Recovery Techniques for Database Systems 193
tween costs and overheads for the various SEVE76] has many very nice features which
techniques. However, some general state- have recently received much attention.
ments can be made: The attention that has been paid to these
• If failures do not occur often, the dif- techniques during the last few years makes
ferential file and careful replacement it reasonable to assume that they will be
techniques give extra overhead. The used more widely in the future. The tech-
reason is that other recovery tech- niques ensure that data is unlikely to be
niques, such as incremental dumping lost through failures. The cost of data in-
or backup/current version or a salva- tegrity is lower, and its value higher, than
tion program, are usually needed any- a number of years ago. Data integrity is
way for failures with which these two becoming a more important issue than ef-
techniques cannot contend. However, ficiency.
the two techniques do cope with the
particular failures in a much better ACKNOWLEDGMENTS
way: the database is crash resistant,
I would like to thank Brian RandeU for insisting that
maintaining the correct state; it is more
I document the results of my search for information
efficient because no separate tapes about current techniques for recovery in emsting sys*
need to be mounted and processed. tems, and for his many very useful suggestions and
• The multiple copies technique {e.g., in comments.
HIVE [TAYL76]) has very high over- I would also like to thank Stephen Todd, John Owlett
head. Nonetheless, it meets HIVE's ob- (both of IBM, Peterlee, England), Ellis Cohen, Tom
jective of very high integrity. Anderson, Pete Lee and Phil Treleaven for reading
• The overhead with audit trails may be earlier versions of this paper and giving me useful
high, because every operation on the comments and suggestions. I appreciate Jim Gray's
database may also necessitate an audit careful review of the manuscript and current Infor-
mation on System R.
trail entry. This technique may be jus-
M y thanks are also due to the refereesand to Peter J.
tiffed for recovery only if the audit trail Denning, the Editor-in-Chief of COMPUTING SUR-
is already required for certifying integ- VEYS, for theiruseful comments and suggestions.
rity or if it is absolutely required that
almost all crashes are not catastrophic.
• The incremental dumping and REFERENCES
backup/current version techniques are [ANDE75] ANDERSON, T. Provably safe pro-
grams, Tech. Rep. 70, ComputingLab-
the best for recovery from highly dam- oratory, Univ. Newcastle upon Tyne,
aging failures such as a head crash on UK, Feb. 1975.
disk. These techniques may not restore [ASTR76] ASTRAHAN,M. M., et al. "SystemR:
Relational approach to data base man-
the correct state, but only a consistent agement," in ACM Trans. Data Base
state. The overhead of these tech- Syst. 1, 2 (June 1976), 97-137.
niques is tolerable, because their [BANA77] BANATRE, Z. P.; AND SHR1VASTAVA,S.
K. Rehable resource allocation be-
checkpoints are not too frequent. tween unrehable processes, Tech. Rep.
• The cost of a salvation program com- 99, Computing Laboratory, Univ. New-
castle upon Tyne, UK, April 1977.
pletely depends on the number of [BJOR72] BJORK, L. A.; AND DAVIES, C . T . The
crashes; overhead is accumulated only semantws of the presentatwn and recov-
when the program is used. ery of integrity ~n a data base system,
IBM Tech. Rep. T R 02.540, Dec. 1972.
The careful replacement technique is [BJOR75] BJORK, L . A . "Generahsed audit trail
used increasingly in multiuser or multima- requirements and concepts for data base
chine environments [NEWE72, GAMB73, applications," I B M Syst J. 14, 3 (1975),
229-245.
LAMP76, GIOR76, LORI77]. It is implied by [CURT77] CURTICE, R.M. "Integrity in data base
the root-segment rule and leaf-first rule in systems," Datamation 23, 5 (May 1977),
systems using a hierarchy of devices 64-68.
[DALE65] DALEY, D. C.; ANDNEUMANN,P.G. "A
[SCHW73]. The combination of careful re- general purpose file system for secondary
placement and multiple copies is also im- storage," in 1965 A F I P S Fall Jt. Com-
puter Conf., Vol. 27, Part 1, Spartan
portant [GIoR76, ASTR76, VERH77a]. The Books, Washington, D.C., pp. 213-229.
differential file technique [RAPP75, [DAVI72] DAVIES, C. T. A recovery~integrity ar-

Computing Surveys, Vol. 10, No 2, June 1978


194 • J . S. M . V e r h o f s t a d

chitecture for a data system, IBM Tech. system failure," Commun. A C M 11, 8
Rep. TR 02.528, May 1972. (Aug. 1968), 542.
[DENN76] DENNING, P. J. "Fault-tolerant oper- [LORI77] LORIE, R. A. "Physmal integmty in a
ating systems," Comput. Surv. 8, 4 (Dec. large segmented database," A C M Trans
1976), 359-389. Database Syst. 2, 1 (March 1977),
[EMAS74] REES, D. J. The E M A S directory, 91-104
(EMAS report 2), MILLARD,G. E ; REES, [MART76] MARTIN, J Prsnc~ples of data-base
D. J.; AND WHITFIELD, H. The stan- management, Prentice-Hall, Inc., Engle-
dard E M A S subsystem, (EMAS report wood Cliffs, N J., 1976, p. 4.
3); SHELNESS, N. A.; STEPHENS P. D.; [MASC71] MASCALL,A. J Studies of the rehabd-
AND WHITFIELD, H. The Edinburgh sty and performance of computsng sys-
multi-access system scheduhng and al- tems at Barclays Bank, Internal memo
locatmn procedures ~n the ressdent su- SRM/10, Computing Laboratory, Univ.
pervssor, (EMAS report 4); Dept. Com- Newcastle upon Tyne, UK, 1971
puter Science, Univ. Edinburgh, Edin- [MASC73] MASCALL, A. J. Checkpoint, backup
burgh, UK, April 1974. and restart ~n a "rehable'" system, In-
[FRAs69] FRASER, A. G. "Integrity of a mass ternal memo SRM/37, Computing Lab-
storage filing system," Comput. J. 12, 1 oratory, Umv. Newcastle upon Tyne,
(Feb. 1969), 1-5. UK, April 1973.
[GAMB73] GAMBLE, J S. "A file storage system [MELL77] MELLIAR-SMITH,P. M.; AND RANDELL,
for a mult2-machme enwronment," PhD B. "Software rehability: the role of pro-
Thesis, Victorm Univ., Manchester, UK, grammed exception handhng," m Proc.
Oct. 1973. A C M Conf. on Language Design for Re-
[GmR76] GIORDANO, N. J.; AND SCHWARTZ, M. hable Software, 1977, ACM, New York,
S. "Data base recovery at CMIC," m pp. 95-100.
Proc. 1976 SIGMOD Int. Conf. on Man- [NEWE72] NEWELL, G.B. Security and resd~ence
agement o f Data, ACM, New York, pp. sn large scale operating systems, 1900
33-42. Series Operating Systems Division, In-
[GRAY70] GRAY, J. N. "Locking," in Record of ternational Computers Ltd, London,
the Project M A C Conf. on Concurrent 1972.
Systems and Parallel Computatmn, [RANDT0] RANDELL, B. Vssst to BOAC, Internal
1970, Jack Dennis (Ed.), ACM, New memo SRM/5, Computing Laboratory,
York, pp. 97-112. Univ. Newcastle upon Tyne, UK, 1970
[GRAY76] GRAY, J. N.; LORIE, R. A.; PUTZOLU,G. [RAND75] RANDELL, B. "System structure for
R.; ANDTRAIGER,J.L. "Granularity of software fault tolerance," I E E E Trans.
locks and degrees of consistency m a Softw. Eng. SEol, 2 (June 1975),
shared data base," m Modelling sn data 220-232.
base management systems, G. M. [RAPP75] RAPPAPORT, R.L. "File structure de-
Nijssen (Ed.), Elsevier North-Holland, sign to facilitate on-line instantaneous
Inc., New York, 1976, pp. 365-394 updating," m Proc. 1975 A C M SIGMOD
[GRAY77] GRAY, J . N . "Notes on data base op- Conf., ACM, New York, pp. 1-14.
erating systems," in Advanced course on [RAND78] RANDELL, B., LEE, P. A;AND TRE-
operating systems, Technical Univ. Mu- LEAVEN, P. C. "Reliability msues m
nich, 1977, Elsevier North-Holland, Inc., computing system demgn," see pp
New York. 123-165 this issue Comput Surv
[HOAR74] HOARE, C. A.R. "Monitors: an oper- [Russ77] RUSSELL, D. L. "Process backup m
ating system strncturmg concept," Com- producer-consumer systems," m Proc.
mun. A C M 17, 10 (Oct 1974), 549-557. Syrup. on Operating Systems Prtnc~ples
[IBM] IBM Information Management System 1977; Operating Syst. Rev. 11, 5 (1977),
reference manuals. I M S / V S , Utd~t~es 151-157.
reference manual, SH20-9029; I M S / VS, [ScHw73] SCHWARTZ, M S. "A storage hierar-
Operators reference manual, chical addressing space for a computer
SH20-9028; I M S / V S , System program- file system," PhD Thesis, Case Western
mer reference manual, SH20-9027, IBM, Umv., Cleveland, Ohio, 1973
White Plains, N.Y. [SEVE76] SEVERANCE, e . G.; AND LOHMAN, G.
[INFo75] Infotech state o f the art report: data M. "Differential files: their application
base systems, Infotech Informatmn Ltd., to the maintenance of large databases,"
Maidenhead, UK, 1975. A C M Trans. Database Syst. 1, 3 (Sept.
[KNUT73] KNUTH, D.E. The art o f computer pro- 1976), 256-267.
grammmg, Vol. 3" sortsng and search- [SKLA76] SKLAROFF, J. R. "Redundancy man-
rag, Addison-Wesley Publ. Co., Reading, agement technique for space shuttle
Mass., 1973. computers," I B M J . Res. Dev 20, 1 (Jan
[LAMP76] LAMPSON, B.; AND STURGIS, H. Crash 1976), 20-28.
recovery sn a dsstr$buted data storage [SMIT72] SMITH, J. L.; AND HOLDEN, T.
system, Computer Science Laboratory, S. "Restart of an operating system hav-
Xerox Palo Alto Research Center, Palo ing a permanent file structure," Comput.
Alto, Cahf, 1976 J 15, 1 (1972), 25-32.
[LIND76] LINDEN, T. A. "Operating system [STER74] STERN, J . A . Backup and recovery of
structures to support security and relia- on-hne mformatmn tn a computer utd-
ble software," Comput. Surv 8, 4 (Dec. sty, Report MAC-TR-116, ProJect MAC,
1976), 4O9-445 MIT, Cambridge, Mass., Jan. 1974
fLock68] LOCKEMANN, P. C.; AND KNUTSEN, W [TAYL76] TAYLOR, J. M. Redundancy and re-
D. "Recovery of disk contents after covery tn the H I V E wrtual machzne,

CompuUngSurveys,Vol 10, No 2, June 1978


Recovery Techniques for Database Systems • 195

Rep no. 76010, Royal Signal and Radar crash resistance In a filing system," in
Establishment, Christchurch, UK, May Proc 1977ACM SIGMOD Int Conf on
1976. Management of Data, ACM, New York,
[TITM74] TITMAN, P. J "An experimental data pp. 158-167.
base system using binary relations," in [VERH77b] VERHOFSTAD, J. S. M "The construc-
Data base management, J. W. Klimble, tion of recoverable multi-level systems,"
and K. L. Koffeman (Eds.), Elsevier P h D Thesis, Univ. Newcastle upon
North-Holland, Inc., New York, 1974, Tyne, UK, August 1977.
pp. 351-360. [WIMB71] WIMBROW, J. H "A large-scale inter-
[TONI75] TONIK, A . S . "Checkpoint, restart and active administrative system," IBM Syst
recovery: Selected annotated bibliog- J. 10, 4 (1971), 260-282.
raphy," FDT, Bull. ACM SIGMOD 7, [WILK75] WILKES, M.V. T~meshartng computer
3-4 (1975), 72-76. systems, Elsevier North-Holland, Inc.,
[VERH77a] VERHOFSTAD, J. S.M. "Recovery and New York, 1975

RECEIVED SEPTEMBER 8, 1977; FINAL REVISION ACCEPTED MARCH 7, 1978

Computing Surveys, Vol 10, No 2, June 1978

View publication stats

You might also like