Professional Documents
Culture Documents
net/publication/220420693
CITATIONS READS
18 2,444
3 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Manhoi Choy on 17 July 2015.
A survey of techniques and tools used in filing systems, database systems, and operating
systems for recovery, backing out, restart, the mamtenance of consistency, and for the
provismn of crash resistance is given.
A particular view on the use of recovery techmques in a database system and a
categorization of different kinds of recovery and recovery techmques and basic principles
are presented. The purposes for which these recovery techniques can be used are
described Each recovery techmque is illustrated by examples of its application in exmtlng
systems described in the literature.
A main conclusion from this survey is that the recovery techmques described are all
useful; they are applied for different purposes and m different envLronments. However, a
certain trend in the increasing use of specific techniques during the past few years can be
noted. Another main conclusion is that there are still enormous integrity and recovery
problems to be solved for parallel processes and distributed processing.
Keywords and Phrases: audit trail, backup, checkpoint, database, differentml files, filing
system, incremental dumping, recovery.
CR Categories: 1.3, 3.73, 4.33, 4.9
General pernussion to make fatr use m teaching or research of all or part of this material is granted to individual
readers and to non-profit libraries acting for them provided that ACM's copyright notice m given and that
reference is made to the publication, to its date of issue, and to the fact t h a t reprinting privileges were granted
by permission of the Association for Computmg Machinery. To otherwise reprint a figure, table, other substantial
excerpt, or the entire work requires specific permission as does republication, or systematic or multiple
reproduction.
© 1978 ACM 0010-4892/78/0300-0167 $00.75
vides recovery from certain kinds of fail- marion. An example of this is given below
ures. Within a single system, there may be and illustrated in Figure 1.
several different recovery techniques cor- A recovery technique maintains recovery
responding to different kinds of failures. d a t a to make recovery possible. It provides
However, it is common to structure the recovery from any failure which does not
techniques into a hierarchy; the most gen- affect the recovery data or the mechanisms
eral ones deal with the largest set of fail- used to maintain these data and to restore
ures, but are the most expensive. The re- the states of the data in the database. Fail-
covery technique for the smallest set of ures are classified into two groups with
failures is usually the most efficient tech- respect to a recovery technique. A failure
nique and involves minimal loss of infor- with which a recovery technique can cope
Salvation Program * *
Incremental Dumping * *
Audit Trail * * *
Diff. F i l e s * * *
Backup Current * *
Multiple Copies *
Careful Replacement * *
FIGURE 2. A cross-reference table indicating for what purposes the various recovery techniques can be used
Fraser [FRAS69] gives a very good de- correct, detect, and report wherever
scription of the technique used at Cam- possible any inconsistencies in the file
bridge, which makes complete copies of hierarchy and storage tables. The sys-
updated disk files every 20 minutes. (See tem can be made available immedi-
a l s o [WILK75].) ately after this; some files may still
In the original MULTICS system [DALE need to be reloaded, but they are
65] all disk files updated or created by the marked as such. Directories and sys-
user are copied when the user signs off. All tem files are reloaded first in order to
newly created or modified files, which have make the system available to users
not previously been dumped, are also cop- again as quickly as possible. There are
ied to tapes once per hour. The original parallel processing capabilities for the
MULTICS scheme provides high reliabil- reloading operation; and no unneces-
ity. However, overheads in resources and sary searches on dump tapes are made
processing time are far too high; recovery by the reloader. Only missing files are
time after failure is too high; and the system reloaded.
must be shut down periodically for backup • The dumper can avoid unnecessary
purposes. It is most discouraging that the searching in the tree because of the use
situation steadily worsens with system of dates and times in the directories in
growth. the tree. Also the reloader avoids un-
The design of a greatly improved backup necessary searches and file reloads.
mechanism has been described by Stern The currently used recovery techniques
[STER74]. It is based upon the original in MULTICS are similar to those described
backup mechanisms contained in the MUL- by Stern. However, since the publication of
TICS system. Compared with the original Stem's thesis a major change has been
system, this new design lessens overhead, made to the MULTICS filing system: a new
drastically reduces recovery time from sys- and very different storage system has been
tem failures, eliminates the need to inter- incorporated. The backup system has also
rupt system operation for backup purposes, been changed to deal with the new storage
and scales up significantly better with on- system. The concept of logical volumes has
line storage growth. Some of the major been introduced. A logical volume encom-
features of this new scheme are: passes a set of real devices. The directory
• Incremental dumping is used to keep hierarchy is on a separate volume, and any
the backup system up to date. directory in the hierarchy can be placed on
• A complete secondary dump super- a new volume, with all its decendants on
sedes the complete incremental dump- the same volume. For dumping onto tape a
ing history of the system: Rather than volume dumper is used which dumps one
dumping the entire secondary storage, volume at a time; no tree walk is used. Each
the procedure updates a checkpoint of physical device has a contents table which
the total system. A partial secondary is a list of pages of the segments on the
dump supersedes part of the incremen- device. Incremental dumping is done for
tal dumping history. Secondary dumps pages on volumes rather than files. Thus,
are similar to change accumulation sets the scheme used is basically the same as
in IMS [GRAY77]. the one proposed by Stern, but used at a
• A shadow copy of a file can be created lower level (for volumes and pages rather
to make sure that the incremental than for the hierarchy of files).
dumper dumps a consistent version. The EMAS system [EMAS74] provides
• Each storage device holds a complete an automatic checkpointing facility for
subtree or several subtrees of the fries. Files are part of the user's virtual
file hierarchy (the MULTICS file sys- memory and cannot be accessed through
tem is organized as a tree of files [DALE the paging mechanisms until they have
65]). This minimizes the effects of the been connected {i.e., the virtual memory
loss of one device. disk address mapping has been set up).
• After a failure a "salvager" is used to When a user process is created, its virtual
SEGMENT Sp
current page table
MASTER
!
4 .. 7 [0 I19 3 8 Fs 27
0 0 0 I ]- 1 0 1 0 s.b.
1 0 1 1 1 i !l 0 c.s.b.
0 0 0 0 0 0 0 0 l.t.s.b.
4 7 [0 21 6 8 13 27
0 0 0 0 0 0 0 0
1 0 1 1 0 1 0 0
0 0 0 0 0 0 0 0
I __
1 2 3 4 5 6 7 8 9 i0 ii 12 13
I0 0 0 1 0 1 1 1 0 1 0 0 1 o • m
bl~
map
bit
map
FIGURE 3. The implementation of segments in System R.
Audit trails can be used for several dif- started. The audit trail can be pro-
ferent purposes, such as: cessed backwards for backing out. Also,
• Crash recovery. If backup versions of a single transaction (or job) can be
files are reinstalled, an audit trail can backed out in case a deadlock occurs or
be used to perform operations on them, the transaction fails. The data affected
thereby restoring their states at the by the transaction can be restored to
time of the crash [CURT77]. their state before the transaction (or
• Backing out. If a system crashes (with- job) was started.
out damaging secondary storage) the Certifying system integrity. The audit
files affected by the processes running trail can verify that rules and policies
during the failure can be restored to dictated by laws, business agreements,
their states before those processes and the like, are being followed [BJOR
D a t a B a s e on.
Data Base
Secondary
Buffers
Storage
Users processes
. f
The c u r r e n t d a t a b a s e is a l w a y s c o n s i s t e n t w l t h the
c u r r e n t a u d i t trail, h o w e v e r the d a t a b a s e on s e c o n d a r y
s t o r a g e is, in g e n e r a l , not c o n s l s t e n t w l t h the a u d l t
t r a i l on s e c o n d a r y storage.
may receive an intention list, containing database updates, intention lists contain
the specifications of the operations to be operations not yet performed (although
performed on its local database. Intention physically the audit trail may be written
lists, like the audit trail used in the CMIC first, as is the case when log tape write
system [GIoR76], can be reprocessed with- ahead is used). In other words, issuing an
out backing out the interrupted process. intention list is an event that could be reg-
Intention lists, once received and accepted, istered in the audit trail. Processing the
cannot be lost unless a catastrophe occurs, intention list is similar to processing an
such as a head crash. They are safeguarded audit trail during crash recovery or backing
because they are stored on disk at a fixed out.
place and are not altered when processed.
Unless the processor malfunctions, the op- DIFFERENTIAL FILES
erations specified in the intention list will
always be done. Under the differential file (also called
Whereas audit trails record completed "change set") scheme the main files are
blt map
[01
i O lIO l O ....... 1 .... 1 .... 0
queries which do not need the exact values 4) If the number is neither in
of all files; such queries access a suitable TRNSDONE nor is the number of the
(current) view of the data without locking current transaction, then the previous
out the update transactions. version of the record is taken (the one
The disadvantages of the approach using pointed to by the retrieved entry from
differential files are [Lore77]: the MODFILE, or in the main file),
• An access to a data element appears because this means that the record
slow: if an initial search of the differ- was put in the MODFILE by an un-
ential file reveals that the data element completed transaction before a fail-
is not among the modifications, the ure.
required element must be fetched from 5) MODFILE entries and system varia-
the database. However, Severance and bles are forced out to disk at the end
Lohman show that this problem can be of each transaction. Thus the entire
almost completely overcome with MODFILE mechanism is check-
hashing for many systems and appli- pointed after each completed trans-
cations; they also show how to con- action.
struct a good hashing scheme for par- Differential files are used in a system,
ticular systems. described by Titman [TITM74], for both
• Eventually the differential file must be efficiency and reliability. The way in which
merged with the main database--an ordinary files are kept makes insertions or
operation which can be slow. This will deletions very expensive. In Titman's sys-
certainly be a big problem if the system tem the files are binary relations which are
needs to be available without interrup- stored in highly compressed fixed-length
tion. blocks. Elements are identified by a block
• Since an update can affect an element number and the sequence number of the
which has already been modified, the element in the block. An insertion or dele-
differential files must be suitably or- tion requires the complete reorganization
ganized. A hashing scheme [SEvE76, of the file giving the elements new identi-
RAPP75] ameliorates this problem. fiers. For reasons of efficiency, an "add set"
Differential files are, for example, used in and a "delete set" of elements are kept for
the VADIS system, where they are called each file. For reasons of reliability, a
MODFILEs [RAPP75]. For every file in the "change set" is also kept for each file; it is
system there is a MODFILE. The system used to register the changed records. The
has been developed to facilitate recovery add set, delete set, and change set together
after power failure. Completed transactions form the implementation of a differential
will never have to be undone after a power file. The main files are kept on a separate
failure. Transactions not completed before device which is never written on except
the failure are not undone explicitly; but during reorganization. These files can be
their effects are ignored using the MOD- duplicated on tapes for recovery. Check-
FILEs and a TRNSDONE file as follows: pointing is carried out by saving the add,
1) Each entry in a MODFILE has a delete, and change sets.
header with: record type, transaction
code, pointer to previous modification
BACKUP AND CURRENT VERSIONS
of the same record, time, transaction
number and some other identification Backup versions of files or databases can
codes. be kept in order to make possible the res-
2) There is a TRNSDONE file which toration of the files to a previous state.
contains the numbers of the com- For example, many file-editors produce
pleted transactions. a complete new version of a file while a user
3) For every record fetched from a is editing a file. The original file remains
MODFILE the transaction number is unchanged during the edit-session. The
compared with the TRNSDONE new version is a complete new copy; it is
numbers and the current transaction not achieved, for example, by using a dif-
number. ferential file. If a user notices a blunder
sistent copy (or suspicious copy) is al- The technique of two copies with flags is
ways recognizably inconsistent, be- used in recovery for segments in System R
cause of the flags used; if the system [LORI77]. A M A S T E R table is used to in-
crashes during update the flag will still dicate which segments are open or closed
be set after the crash. and which bit map (two copies are held) is
Except during update, the multiple cop- up-to-date (see Figure 3). Two copies of the
ies must always have the same value. If the M A S T E R table, both containing the same
different copies are updated by the same information, are kept to ensure that, if the
processor then an "update-in-progress" flag system crashes while the M A S T E R table is
(or "damage flag" [CURT77]), used if there being updated, only one copy will be left
are only two copies, provides crash resist- behind in an inconsistent state; the other
ance. A consistent copy can always be re- copy has either the new state or the state
trieved after a system restart; this copy will the table was in before the update started.
have either the value it had before the (Only one copy of the M A S T E R table is
update in progress during the crash, or the drawn in Figure 3, but two copies are used
new value. The inconsistent copy can al- as described above.) The copy that is in an
ways be recognized as such and discarded. inconsistent state can always be identified.
Majority voting may also be used to detect Similarly, two copies of the M A S T E R
incorrectly performed operations; this is es- directory in the filing system of GEORGE
pecially useful if different processors update 3 are maintained [NEWE72], and two bits
the different copies. Faulty processors can to make possible the distinction between a
then be detected and ignored or discon- valid and invalid copy after a crash during
nected. update.
The important difference between the System HIVE [TAYL76] maintains two
multiple-copies technique and other tech- read-write versions for every file. This pro-
niques, such as backup/current version or vides one of the characteristics oZ a cycle
careful replacement, is that with the mul- (see the section on salvation programs):
tiple-copies technique the different copies The local effects can be undone as long as
always have the same value except during the cycle has not yet finished. During a
update of the actual physical data struc- cycle one of the two versions is updated. At
ture. Thus, if the two copies hold consistent the end of the cycle this version is copied
values they must be equal. Backup or in- onto the other version. The system knows
cremental dumping schemes could also be which of the two versions is the one up-
used to keep different copies that hold the dated during a cycle. Crash resistance is
same value except during a job or transac- therefore provided for individual files.
tion. In these cases there is always a pri- Apart from this, a checksum is maintained
mary copy and a backup copy, which may for each version. This, generally, enables
hold different, but consistent, values from partially updated or corrupted files to be
the database point of view. The technique detected. In general two copies and two
of copying onto tape at the completion of flags (bits) are sufficient to provide crash
each updating, is different from the multi- resistence. The flag indicates whether a
ple copies technique. The multiple copies copy is suspicious or not. If the two copies
must exist all the time; physically, they do are updated immediately after each other,
not overlap, and they have equal status. as in System R, then a copy is likely to be
Schemes based on different backup or ar- inconsistent if its flag is set. In system
chival versions, are definitely not examples HIVE, however, the two copies are kept on
of the multiple copies technique. separate storage devices, so the checksum
Majority voting data has been used ex- allows detection of incorrectly performed
tensively for space flight applications, such operations. Although selective redundancy
as in the space shuttle system where four is not uncommon, system HIVE is one of
computers are configured to receive the the few existing systems in which more
same input data and calculate the same than one complete copy for every safeguard
outputs [SKLA76]. file is maintained to provide crash resist-
index index
free free s p a c e
T1 space T1 stack
stack T7
T2 T2
T13
T5
T9
T7
TI3
T1 T2 T1 T5 T9 T2
A0 B0 A0
A0 A3 B0
A1 B1 A1 A1 A4 B1
A2 B2 A2 A2 ! A5 B2
i
A3 B3 A3 i A6 B3
A4 B4 A4
I B4
A5 A5 B5
. . . . . . . . . . . . . . . . . M . . . . . . . . . . . . . . . . . . . .
index index
free free s p a c e
T5 space T5 stack
stack
T9 T1
T7 T7
T2 TI3 T2 TI3
A1 A1 A4 B1 A4 B1
A2 A2 A5 B2 A5 B2
A6 B3 A6 B3
A4 B4 B4
i
i A5
FIGURE 6. T h e m s e ~ l o n of a n e n t r y A5 m a ~ H t r a c k m a s t o r a g e s t r u c t u r e a s u s e d i n CMIC, in ~ u r s t e p s
1 . H
2 3
1 1 1
DISK
3 ¸
26
15 21
1 1 1
DRUM
%
3 9 3 ~9
1 1 1
CORE
U
Three possible subsequent sltuatlons.
Where: Vl = a value of a data page
FIGURE 7. The c a r e ~ l replacement technique in a hierarchy of devices.
T h e root-segment rule states that if a data ory copy but not consolidated in the drum
page is on a particular level of storage in copy yet, will be lost. If the contents of the
the hierarchy, every directory page be- drum are also lost, the updates which were
tween it and the root of the file is on that consolidated in the drum copy, but not yet
level or a faster level (the root is the top in the disk copy, will be lost as well.
directory in the tree of directory pages). A system using the leaf-first and root-
These two rules mean that careful replace- segment rules has been described by
ment is used at every level in the hierarchy. Schwartz [SCHw73]. Files are trees of pages
Were the contents of the main memory to which are either index pages (i.e., directory
be lost after a crash, the drum and disk pages) or data pages (as in Figure 7). A file
would still have two copies of the file: the descriptor is the root of the index table; a
disk copy contains an old value, the drum directory file contains the descriptors. In
copy contains a newer value of the file that system, and in CMIC too, the directory
{some pages of the file are on drum and file is updated "in place." If absolute crash
others on disk). resistance were to be provided in this sys-
Figure 7 shows how updates on a main- tem, the multiple copies technique could be
memory version of a file are subsequently used for the directory, as is done in
made to the copies of that file held on drum GEORGE 3 [NEWE72].
and disk using these two rules. T he example If the recovery is to be provided for trans-
in Figure 7 shows the pages of a file as held actions consisting of more than one update,
on disk, drum, and in core during process- replaced pages can be updated "in place."
ing, at three different subsequent moments This technique has been used at Newcastle
in time: time 1, 2, and 3. At time 1, the copy [VERH77a] to provide recovery for files
on disk shows that the change made to page within user defined scopes. Scopes can be
8 and the addition of page 26 into directory nested, as with "spheres of control"
page 3 have not yet been consolidated in [BJOR72]. These scopes are termed "recov-
the disk copy of the file. T he copy in core ery blocks" [RAND75]. A recovery block
shows that page 8 has been updated again, consists of an acceptance test and a set of
and that the update has not yet been con- alternative algorithms. T he first algorithm
solidated on the drum. Similarly, at time 2 is executed first, followed by the evaluation
the copies on disk, drum, and core show of the acceptance test. If an error occurs
that the update of page 8 and the addition during the execution of an alternative, or
of page 26, consolidated on drum at time 1, the acceptance test is not done successfully
are now consolidated on disk. T he update {either it fails or it is never done), all the
of page 8 in core at time 1 is now consoli- operations performed so far are undone,
dated on drum. In the meantime other up- and the next alternative is invoked, if pos-
dates and additions have been made to the sible. (A power failure still causes all the
file in core. Finally, the copies at time 3 operations to be undone [VERH77a] how-
show yet more updates, and show how they ever.)
are consolidated. T he same is now done for the second
Although pages are updated "in place," alternative. If all the alternatives of a re-
this technique can be classified as being a covery block are exhausted, an error is gen-
careful replacement technique because up- erated in the outer recovery block, or the
dates are made in fast memory, and the program fails ff there is no outer recovery
replacement takes place on slow memory block. A copy of a file is maintained for
containing the files. It is a careful replace- every level of nesting in which the file has
ment technique because every valid tree been operated upon; these copies contain
structure is consistent. If the contents of identical nonupdated pages. T he current
both main memory and drum are lost, there value of the file is in the latest copy. Recov-
will remain a consistent copy on disk. If the ery means restoring the copy as it was
contents of main memory are lost, the up- before the current recovery block was en-
dates which were started on the main mem- tered. On exiting a recovery block success-
J
tered major difficulties when trying to use read
the shadow mechanism (as in the x2
has been described elsewhere [VERH77b]. may be needed to be sure data is consistent
The techniques described in this survey after a crash; its use may cause some data
provide recovery for files based on second- to be lost.
ary storage. Verhofstad has some prelimi- This combination can be improved by an
nary extensions to data types more complex audit trail, a safeguard against losing up-
than files, and recovery procedures for dif- dates; this is done in IMS. The recovery
ferent levels of a system [VERH77b]. Ran- facilities in IMS are extensive but not sys-
dell has discussed "nested recovery" using tematic; there is neither a general approach
recovery blocks [RAND75]. The relation to nor a dominant technique, as in the Cam-
careful replacement has been discussed by bridge system or VADIS. IMS provides an
Verhofstad [VERH77b]. Beyond such pre- enormous range of facilities; 50% of the
liminary works, little is published about the code was said to be for recovery purposes
systematic recovery of program or data ob- [INFO75], although a more recent source
jects. stated that this figure was around 17% in
1978 (J.N. Gray of IBM Research Labora-
SUMMARY AND CONCLUSIONS
tory, San Jose, Calif., supplied this infor-
mation in a private communication in Feb-
This paper has described many of the tech- ruary 1978). However, the application pro-
niques used to implement backing out, grammer, it seems, needs to build his own
crash recovery, crash resistance, and con- mechanisms and utilities, certainly if high
sistency. These techniques can be used in integrity is required. The programmer also
different environments, for different pur- has to make explicit checkpoints if they are
poses; they can complement each other. required.
Figure 9 is a cross-reference table; it shows The loss of any completed update can
which techniques discussed in this survey also be avoided by using careful replace-
are used in which particular systems. This ment or multiple copies as in GEORGE 3,
table may be incomplete for the systems it HIVE, CMIC and System R. It may also
covers (e.g., System R may have some sort avoid the need for a salvation program (as
of salvation program); however it summa- in GEORGE 3). Also the differential files
rizes the most important features of the technique is very powerful and can be used
systems as reported in the literature. to provide recovery facilities and crash re-
It appears that for filing systems, where covery.
short term losses are not considered serious, Audit trail with backup, or incremental
the combination of incremental dumping, a dumping with backup, or audit trail with
complete backup version of the system, and incremental dumping and backup, or mul-
a salvation program suffices for a high de- tiple copies, are all techniques for recovery
gree of reliability. This approach has been from more serious failures, which other re-
successful in MULTICS, the Cambridge covery techniques cannot handle.
system, and EMAS. A salvation program It is difficult to make a comparison be-
System R GEORGE
[AsTn76] IMS 3 HIVE MULTICS Cambrufge EMAS CMIC VADIS Newcastle
[Loaff7] [IBM] [NEwE72] [TAr1 76] [DALE65] [FR^~69] [EM^~74] [GIoR76] [RAPP75] [VERH77a]
~alvatlofl
Pro-
~am
|ncremental * * *
l~umpmg
Audit Trail
Differential
Fdes
Backup * * *
Current
Multiple * *
Copies
Careful Re-
placement
chitecture for a data system, IBM Tech. system failure," Commun. A C M 11, 8
Rep. TR 02.528, May 1972. (Aug. 1968), 542.
[DENN76] DENNING, P. J. "Fault-tolerant oper- [LORI77] LORIE, R. A. "Physmal integmty in a
ating systems," Comput. Surv. 8, 4 (Dec. large segmented database," A C M Trans
1976), 359-389. Database Syst. 2, 1 (March 1977),
[EMAS74] REES, D. J. The E M A S directory, 91-104
(EMAS report 2), MILLARD,G. E ; REES, [MART76] MARTIN, J Prsnc~ples of data-base
D. J.; AND WHITFIELD, H. The stan- management, Prentice-Hall, Inc., Engle-
dard E M A S subsystem, (EMAS report wood Cliffs, N J., 1976, p. 4.
3); SHELNESS, N. A.; STEPHENS P. D.; [MASC71] MASCALL,A. J Studies of the rehabd-
AND WHITFIELD, H. The Edinburgh sty and performance of computsng sys-
multi-access system scheduhng and al- tems at Barclays Bank, Internal memo
locatmn procedures ~n the ressdent su- SRM/10, Computing Laboratory, Univ.
pervssor, (EMAS report 4); Dept. Com- Newcastle upon Tyne, UK, 1971
puter Science, Univ. Edinburgh, Edin- [MASC73] MASCALL, A. J. Checkpoint, backup
burgh, UK, April 1974. and restart ~n a "rehable'" system, In-
[FRAs69] FRASER, A. G. "Integrity of a mass ternal memo SRM/37, Computing Lab-
storage filing system," Comput. J. 12, 1 oratory, Umv. Newcastle upon Tyne,
(Feb. 1969), 1-5. UK, April 1973.
[GAMB73] GAMBLE, J S. "A file storage system [MELL77] MELLIAR-SMITH,P. M.; AND RANDELL,
for a mult2-machme enwronment," PhD B. "Software rehability: the role of pro-
Thesis, Victorm Univ., Manchester, UK, grammed exception handhng," m Proc.
Oct. 1973. A C M Conf. on Language Design for Re-
[GmR76] GIORDANO, N. J.; AND SCHWARTZ, M. hable Software, 1977, ACM, New York,
S. "Data base recovery at CMIC," m pp. 95-100.
Proc. 1976 SIGMOD Int. Conf. on Man- [NEWE72] NEWELL, G.B. Security and resd~ence
agement o f Data, ACM, New York, pp. sn large scale operating systems, 1900
33-42. Series Operating Systems Division, In-
[GRAY70] GRAY, J. N. "Locking," in Record of ternational Computers Ltd, London,
the Project M A C Conf. on Concurrent 1972.
Systems and Parallel Computatmn, [RANDT0] RANDELL, B. Vssst to BOAC, Internal
1970, Jack Dennis (Ed.), ACM, New memo SRM/5, Computing Laboratory,
York, pp. 97-112. Univ. Newcastle upon Tyne, UK, 1970
[GRAY76] GRAY, J. N.; LORIE, R. A.; PUTZOLU,G. [RAND75] RANDELL, B. "System structure for
R.; ANDTRAIGER,J.L. "Granularity of software fault tolerance," I E E E Trans.
locks and degrees of consistency m a Softw. Eng. SEol, 2 (June 1975),
shared data base," m Modelling sn data 220-232.
base management systems, G. M. [RAPP75] RAPPAPORT, R.L. "File structure de-
Nijssen (Ed.), Elsevier North-Holland, sign to facilitate on-line instantaneous
Inc., New York, 1976, pp. 365-394 updating," m Proc. 1975 A C M SIGMOD
[GRAY77] GRAY, J . N . "Notes on data base op- Conf., ACM, New York, pp. 1-14.
erating systems," in Advanced course on [RAND78] RANDELL, B., LEE, P. A;AND TRE-
operating systems, Technical Univ. Mu- LEAVEN, P. C. "Reliability msues m
nich, 1977, Elsevier North-Holland, Inc., computing system demgn," see pp
New York. 123-165 this issue Comput Surv
[HOAR74] HOARE, C. A.R. "Monitors: an oper- [Russ77] RUSSELL, D. L. "Process backup m
ating system strncturmg concept," Com- producer-consumer systems," m Proc.
mun. A C M 17, 10 (Oct 1974), 549-557. Syrup. on Operating Systems Prtnc~ples
[IBM] IBM Information Management System 1977; Operating Syst. Rev. 11, 5 (1977),
reference manuals. I M S / V S , Utd~t~es 151-157.
reference manual, SH20-9029; I M S / VS, [ScHw73] SCHWARTZ, M S. "A storage hierar-
Operators reference manual, chical addressing space for a computer
SH20-9028; I M S / V S , System program- file system," PhD Thesis, Case Western
mer reference manual, SH20-9027, IBM, Umv., Cleveland, Ohio, 1973
White Plains, N.Y. [SEVE76] SEVERANCE, e . G.; AND LOHMAN, G.
[INFo75] Infotech state o f the art report: data M. "Differential files: their application
base systems, Infotech Informatmn Ltd., to the maintenance of large databases,"
Maidenhead, UK, 1975. A C M Trans. Database Syst. 1, 3 (Sept.
[KNUT73] KNUTH, D.E. The art o f computer pro- 1976), 256-267.
grammmg, Vol. 3" sortsng and search- [SKLA76] SKLAROFF, J. R. "Redundancy man-
rag, Addison-Wesley Publ. Co., Reading, agement technique for space shuttle
Mass., 1973. computers," I B M J . Res. Dev 20, 1 (Jan
[LAMP76] LAMPSON, B.; AND STURGIS, H. Crash 1976), 20-28.
recovery sn a dsstr$buted data storage [SMIT72] SMITH, J. L.; AND HOLDEN, T.
system, Computer Science Laboratory, S. "Restart of an operating system hav-
Xerox Palo Alto Research Center, Palo ing a permanent file structure," Comput.
Alto, Cahf, 1976 J 15, 1 (1972), 25-32.
[LIND76] LINDEN, T. A. "Operating system [STER74] STERN, J . A . Backup and recovery of
structures to support security and relia- on-hne mformatmn tn a computer utd-
ble software," Comput. Surv 8, 4 (Dec. sty, Report MAC-TR-116, ProJect MAC,
1976), 4O9-445 MIT, Cambridge, Mass., Jan. 1974
fLock68] LOCKEMANN, P. C.; AND KNUTSEN, W [TAYL76] TAYLOR, J. M. Redundancy and re-
D. "Recovery of disk contents after covery tn the H I V E wrtual machzne,
Rep no. 76010, Royal Signal and Radar crash resistance In a filing system," in
Establishment, Christchurch, UK, May Proc 1977ACM SIGMOD Int Conf on
1976. Management of Data, ACM, New York,
[TITM74] TITMAN, P. J "An experimental data pp. 158-167.
base system using binary relations," in [VERH77b] VERHOFSTAD, J. S. M "The construc-
Data base management, J. W. Klimble, tion of recoverable multi-level systems,"
and K. L. Koffeman (Eds.), Elsevier P h D Thesis, Univ. Newcastle upon
North-Holland, Inc., New York, 1974, Tyne, UK, August 1977.
pp. 351-360. [WIMB71] WIMBROW, J. H "A large-scale inter-
[TONI75] TONIK, A . S . "Checkpoint, restart and active administrative system," IBM Syst
recovery: Selected annotated bibliog- J. 10, 4 (1971), 260-282.
raphy," FDT, Bull. ACM SIGMOD 7, [WILK75] WILKES, M.V. T~meshartng computer
3-4 (1975), 72-76. systems, Elsevier North-Holland, Inc.,
[VERH77a] VERHOFSTAD, J. S.M. "Recovery and New York, 1975