You are on page 1of 7

COMPUTING

PRACTICES

Operating System Support


for Database Management
Michael Stonebraker
University of California, Berkeley

1. Introduction
Database management systems SUMMARY: Several operating system services are examined
(DBMS) provide higher level user with a view toward their applicability to support of database
support than conventional operating management functions. These services include buffer pool
systems. The DBMS designer must
work in the context of the OS he/she
management; the file system; scheduling, process manage-
is faced with. Different operating ment, and interprocess communication; and consistency
systems are designed for different control.
use. In this paper we examine several
popular operating system services
and indicate whether they are appro-
priate for support of database man- suggestions concerning improve- the operating system is compiled.
agement functions. Often we will see ments. In the next several sections Then, all file I / O is handled through
that the wrong service is provided or we look at the services provided by this cache. A file read (e.g., read X
that severe performance problems buffer pool management; the file sys- in Figure 1) returns data directly
exist. When possible, we offer some tem; scheduling, process manage- from a block in the cache, if possible;
ment, and interprocess communica- otherwise, it causes a block to be
Permission to copy without fee all or part of
this material is granted provided that the cop- tion; and consistency control. We "pushed" to disk and replaced by the
ies are not made or distributed for direct then conclude with a discussion of desired block. In Figure 1 we show
commercial advantage, the ACM copyright the merits of including all files in a block Y being pushed to make room
notice and the title of the publication and its
"date appear, and notice is given that copying paged virtual memory. for block X. A file write simply
is by permission of the Association for Com- The examples in this paper are moves data into the cache; at some
puting Machinery. To copy otherwise, or to drawn primarily from the UNIX op- later time the buffer manager writes
republish, requires a fee and/or specific per-
mission. erating system [17] and the INGRES the block to the disk. The UNIX
This research was sponsored by U.S. Air relational database system [19, 20] buffer manager used the popular
Force Office of Scientific Research Grant 78- which was designed for use with L R U [15] replacement strategy. Fi-
3596, U.S. Army Research Office Grant
DAAG29-76-G-0245, Naval Electronics Sys- UNIX. Most of the points made for nally, when UNIX detects sequential
tems Command Contract N00039-78-G-0013, this environment have general appli- access to a file, it prefetches blocks
and National Science Foundation Grant cability to other operating systems before they are requested.
MCS75-03839-A01.
Key words and phrases: database manage- and data managers. Conceptually, this service is de-
ment, operating systems, buffer management, sirable because blocks for which
file systems, scheduling, interprocess commu- 2. Buffer Pool Management there is so-called locality of reference
nication
CR Categories: 3.50, 3.70, 4.22, 4.33, 4.34, 4.35 Many modern operating systems [15, 18] will remain in the cache over
Author's address: M. Stonebraker, Dept. of provide a main memory cache for repeated reads and writes. However,
Electrical Engineering and Computer Sci- the file system. Figure 1 illustrates the problems enumerated in the fol-
ences, University of California, Berkeley, CA
94720. this service. In brief, UNIX provides lowing subsections arise in using this
© 1981 ACM 0001-0782/81/0700-0412 $00.75. a buffer pool whose size is set when service for database management.

412 Communications July 1981


of Volume 24
the ACM Number 7
virtual memory (e.g., Pilot [16]) may exactly which block it will access
read X provide a solution to this problem. next. Unfortunately, this block is not
This topic is examined in detail in necessarily the next one in logical file

1
main memory [
Section 6. order. Hence, there is no way for an
OS to implement the correct prefetch
strategy.
cache ] 2.2 LRU R e p l a c e m e n t
Although the folklore indicates
D that L R U is a generally good tactic
for buffer management, it appears to
2.4 Crash Recovery
An important DBMS service is to
perform only marginally in a data- provide recovery from hard and soft
base environment. Database access crashes. The desired effect is for a
® (9 in I N G R E S is a combination of: unit of work (a transaction) which
(1) sequential access to blocks may be quite large and span multiple
which will not be rereferenced; files to be either completely done or
(2) sequential access to blocks look like it had never started.
which will be cyclically rerefer- The way many DBMSs provide
enced; this service is to maintain an inten-
(3) random access to blocks which tions list. When the intentions list is
will not be referenced again; complete, a commit flag is set. The
I - ' - "~'/ (4) random access to blocks for last step of a transaction is to process
disk I 3 the intentions list making the actual
which there is a nonzero prob-
ability of rereference. updates. The DBMS makes the last
Fig. 1. Structure o f a Cache. operation idempotent (i.e., it gener-
Although L R U works well for case ates the same final outcome no mat-
4, it is a bad strategy for other situ- ter how many times the intentions
ations. Since a DBMS knows which list is processed) by careful program-
blocks are in each category, it can ming. The general procedure is de-
2.1 Performance use a composite strategy. For case 4 scribed in [6, 13]. An alternate pro-
The overhead to fetch a block it should use L R U while for 1 and 3 cess is to do updates as they are
from the buffer pool manager usu- it should use toss immediately. For found and maintain a log of before
ally includes that of a system call and blocks in class 3 the reference pattern images so that backout is possible.
a core-to-core move. For U N I X on is 1, 2, 3 . . . . . n, 1, 2, 3 . . . . . Clearly, During recovery from a crash the
a PDP-11/70 the cost to fetch 512 L R U is the worst possible replace- commit flag is examined. If it is set,
bytes exceeds 5,000 instructions. To ment algorithm for this situation. the DBMS recovery utility processes
fetch 1 byte from the buffer pool Unless all n pages can be kept in the the intentions list to correctly install
requires about 1,800 instructions. It cache, the strategy should be to toss the changes made by updates in
appears that these numbers are immediately. Initial studies [9] sug- progress at the time of the crash. If
somewhat higher for U N I X than gest that the miss ratio can be cut the flag is not set, the utility removes
other contemporary operating sys- 10-15% by a DBMS specific algo- the intentions list, thereby backing
tems. Moreover, they can be cut rithm. out the transaction. The impact of
somewhat for VAX 11/780 hardware In order for an OS to provide crash recovery on the buffer pool
[10]. It is hoped that this trend to- buffer management, some means manager is the following.
ward lower overhead access will con- must be found to allow it to accept The page on which the commit
tinue. "advice" from an application pro- flag exists must be forced to disk
However, many DBMSs includ- gram (e.g., a DBMS) concerning the after all pages in the intentions list.
ing I N G R E S [20] and System R [4] replacement strategy. Designing a Moreover, the transaction is not re-
choose to put a DBMS managed clean buffer management interface liably committed until the commit
buffer pool in user space to reduce with this feature would be an inter- flag is forced out to the disk, and no
overhead. Hence, each of these sys- esting problem. response can be given to the person
tems has gone to the trouble of con- submitting the transaction until this
structing its own buffer pool man- 2.3 Prefetch time.
ager to enhance performance. Although U N I X correctly pre- The service required from an OS
In order for an operating system fetches pages when sequential access buffer manager is a selected force out
(OS) provided buffer pool manager is detected, there are important in- which would push the intentions list
to be attractive, the access overhead stances in which it fails. and the commit flag to disk in the
must be cut to a few hundred instruc- Except in rare cases I N G R E S at proper order. Such a service is not
tions. The trend toward providing (or very shortly after) the beginning present in any buffer manager
the file system as a part of shared o f its examination of a block knows known to us.

413 Communications July 1981


of Volume 24
the ACM Number 7
a character array object. The follow- tures as efficiently as those they cre-
COMPUTING ing subsections explain why. ate themselves [2]. It is our feeling
PRACTICES 3.1 Physical Contiguity
that OS designers should contem-
plate providing DBMS facilities as
The character array object can lower level objects and character ar-
usually be expanded one block at a rays as higher level ones. This phi-
2.5 Summary time. Often the result is blocks o f a losophy has already been presented
Although it is possible to provide given file scattered over a disk vol- [5].
an OS buffer manager with the re- ume. Hence, the next logical block in
quired features, none currently ex- a file is not necessarily physically 4. Scheduling, Process
ists, at least to our knowledge. De- close to the previous one. Since a Management, and Interprocess
signing such a facility with prefetch DBMS does considerable sequential Communication
advice, block management advice, access, the result is considerable disk Often, the simplest way to orga-
and selected force out would be an arm movement. nize a multiuser database system is
interesting exercise. It would be of The desired service is for blocks to have one OS process per user; i.e.,
interest in the context of both a to be stored physically contiguous each concurrent database user runs
paged virtual memory and an ordi- and a whole collection to be read in a separate process. It is hoped that
nary file system. when sequential access is desired. all users will share the same copy of
The strategy used by most This naturally leads a DBMS to pre- the code segment o f the database
DBMSs (for example, System R [4] fer a so-called extent based file sys- system and perhaps one or more data
and IMS [8]) is to maintain a sepa- tem (e.g., VSAM [11]) to one which segments. In particular, a DBMS
rate cache in user space. This buffer scatters blocks. O f course, such files buffer pool and lock table should be
pool is managed by a DBMS specific must grow an extent at a time rather handled as a shared segment. The
algorithm to circumvent the prob- than a block at a time. above structure is followed by Sys-
lems mentioned in this section. The tem R and, in part, by I N G R E S .
result is a "not quite right" service 3.2 Tree Structured File Systems Since U N I X has no shared data seg-
provided by the OS going unused U N I X implements two services ments, I N G R E S must put the lock
and a comparable application spe- by means of data structures which table inside the operating system and
cific service being provided by the are trees. The blocks in a given file provide buffering private to each
DBMS. Throughout this paper we are kept track of in a tree (of indirect user.
will see variations on this theme in blocks) pointed to by a file control The alternative organization is to
several service delivery areas. block (/-node). Second, the files in a allocate one run-time database pro-
given mounted file system have a cess which acts as a server. A l l c o n -
3. The File System user visible hierarchical structure current users send messages to this
The file system provided by composed o f directories, subdirecto- server with work requests. The one
U N I X supports objects (files) which ties, etc. This is implemented by a run-time server schedules requests
are character arrays o f dynamically second tree. A DBMS such as through its own mechanisms and
varying size. On top of this abstrac- I N G R E S then adds a third tree may support its own multitasking
tion, a DBMS can provide whatever structure to support keyed access via system. This organization is followed
higher level objects it wishes. a multilevel directory structure (e.g., by Enscribe [21]. Figure 2 shows
This is one o f two popular ap- ISAM [7], B-trees [1, 12], VSAM both possibilities.
proaches to file systems; the second [11], etc.). Although Lauer [14] points out
is to provide a record management Clearly, one tree with all three that the two methods are equally
system inside the OS (e.g., RMS-11 kinds of information is more efficient viable in a conceptual sense, the de-
for DEC machines or Enscribe for than three separately managed trees. sign o f most operating systems
T a n d e m machines). In this approach The extra overhead for three sepa- strongly favors the first approach.
structured files are provided (with or rate trees is probably substantial. For example, U N I X contains a mes-
without variable length records). sage system (pipes) which is incom-
Moreover, efficient access is often 3.3 Summary
patible with the notion of a server
supported for fetching records cor- It is clear that a character array process. Hence, it forces the use of
responding to a user supplied value is not a useful object to a DBMS. the first alternative. There are at least
(or key) for a designated field or Rather, it is the abstraction presum- two problems with the process-per-
fields. Multilevel directories, hash- ably desired by language processors, user approach.
ing, and secondary indexes are often editors, etc. Instead of providing rec-
used to provide this service. ords management on top o f character 4.1 Performance
The point to be made in this sec- arrays, it is possible to do the con- Every time a run-time database
tion is that the second service, which verse; the only issue is one o f effi- process issues an I / O request that
is what a DBMS wants, is not always ciency. Moreover, editors can possi- cannot be satisfied by data in the
efficient when constructed on top o f bly use records management struc- buffer pool, a task switch is inevita-

414 Communications July 1981


of Volume 24
the A C M Number 7
To achieve internal parallelism
user 1 user k user 1 0 0 0 user k yet avoid multitasking, one could
000 \ ,/ have user processes send work re-
quests to one of perhaps several com-
mon servers as noted in Figure 3.
DBMS I DBMS DBMS I However, such servers would have to
process process process share a lock table and are only
slightly different from the shared
code process-per-user model. Alter-
nately, one could have a collection
of servers, each of which would send
low-level requests to a group of disk
processes which actually peform the
I / O and handle locking as suggested
in Figure 4. A disk process would
process requests in first-in-first-out
Process-Per-User Server DBMS order. Although this organization
Structure Structure appears potentially desirable, it still
Fig. 2. Two Approachesto Organizing a Multiuser Database System. may have the response time penalty
mentioned above. Moreover, it re-
suits in one message per I / O request.
In reality one has traded a task
ble. The DBMS suspends while wait- As a result of these two problems
switch per I / 0 for a message per
ing for required data and another with the process-per-user model, one
I/O; the latter may turn out to be
process is run. It is possible to make might expect the server model to be
more expensive than the former. In
task switches very efficiently, and especially attractive. The following
the next subsection, we discuss mes-
some operating systems can perform subsection explores this point of
sage costs in more detail.
a task switch in a few hundred in- view.
structions. However, many operating 4.4 Performance of Message
systems have "large" processes, i.e., 4.3 The Server Model Systems
ones with a great deal of state infor- A server model becomes viable if Although we have never been of-
mation (e.g., accounting) and a so- the operating system provides a mes- fered a good explanation o f why
phisticated scheduler. This tends to sage facility which allows n processes messages are so expensive, the fact
cause task switches costing a thou- to originate messages to a single des- remains that in most operating sys-
sand instructions or more. This is a tination process. However, such a tems the cost for a round-trip mes-
high price to pay for a buffer pool server must do its own scheduling sage is several thousand instructions.
miss. and multitasking. This involves a For example, in PDP-11/70 U N I X
painful duplication of operating sys- the number is about 5,000. As a re-
4.2 Critical Sections tem facilities. In order to avoid such suit, care must be exercised in a
Blasgen [3] has pointed out that duplication, one must resort to the DBMS to avoid overuse o f a facility
some DBMS processes have critical following tactics. that is not cheap. Consequently, vi-
sections. If the buffer pool is a shared One can avoid multitasking and able DBMS organizations will some-
data segment, then portions o f the a scheduler by a first-come-first- times be rejected because of exces-
buffer pool manager are necessarily served server with no internal paral- sive message overhead.
critical sections. System R handles lelism. A work request would be read
critical sections by setting and releas- from the message system and exe- 4.5 Summary
ing short-term locks which basically cuted to completion before the next There appears to be no way out
simulate semaphores. A problem one was started. This approach o f the scheduling dilemma; both the
arises if the operating system sched- makes little sense if there is more server model and the individual pro-
uler deschedules a database process than one physical disk. Each work cess model seem unattractive. The
while it is holding such a lock. All request will tend to have one disk basic problem is at least, in part, the
other database processes cannot ex- read outstanding at any instant. overhead in some operating systems
ecute very long without accessing the Hence, at most one disk will be active of task switches and messages. Either
buffer pool. Hence, they quickly with a non-multitasking server. Even operating system designers must
queue up behind the locked resource. with a single disk, a long work re- make these facilities cheaper or pro-
Although the probability o f this oc- quest will be processed to completion vide special fast path functions for
curring is low, the resulting convoy while shorter requests must wait. The DBMS consumers. If this does not
[3] has a devastating effect on per- penalty on average response time happen, DBMS designers will pre-
formance. may be considerable [18]. sumably continue the present prac-
415 Communications July 1981
of Volume 24
the A C M Number 7
COMPUTING
user 1 000 user k
PRACTICES

tice: implementing their own multi-


/
tasking, scheduling, and message
systems entirely in user space. The
result is a "mini" operating system DBMS DBMS
running in user space in addition to process processy
a DBMS.
One ultimate solution to task-
switch overhead might be for an op-
erating system to create a special
scheduling class for the DBMS and
other "favored" users. Processes in
this class would never be forcibly
descheduled but might voluntarily
relinquish the CPU at appropriate
intervals. This would solve the con-
voy problem mentioned in Section
4.2. Moreover, such special processes disk
might also be provided with a fast
path through the task switch/sched-
uler loop to pass control to one of Fig. 3. Server Pool Structure.
their sibling processes. Hence, a
DBMS process could pass control to
another DBMS process at low over- user 1 0 0 0 user k
head.

5. Consistency Control
management. If a DBMS provides
buffer management in addition to
whatever is supplied by the operating
/
The services provided by an op- system, then transaction manage-
erating system in this area include
the ability to lock objects for shared
ment by the operating system is im-
pacted as discussed in the following I DBMS
process
or exclusive access and support for
crash recovery. Although most op-
erating systems provide locking for
files, there are fewer which support
subsections.

5.1 Commit Point


When a database transaction
/ \
finer granularity locks, such as those commits, a user space buffer man- disk disk
000 process
process
on pages or records. Such smaller ager must ensure that all appropriate
locks are deemed essential in some blocks are flushed and a commit de-
database environments. livered to the operating system.
Moreover, many operating sys- Hence, the buffer manager cannot be
tems provide some cleanup after immune from knowledge of trans-
crashes. If they do not offer support actions, and operating system func-
for database transactions as dis- tions are duplicated.
cussed in Section 2.4, then a DBMS
must provide transaction crash re- 5.2 Ordering Dependencies Fig. 4. Disk Server Structure.

covery on top of whatever is sup- Consider the following employee


plied. data:
It has sometimes been suggested Empname Salary Manager ployee to receive a decrease, al-
that both concurrency control and Smith 10,000 Brown though there are alternative semantic
crash recovery, for transactions be Jones 9,000 None definitions.
Brown 11,000 Jones
provided entirely inside the operat- Suppose the DBMS updates the
ing system (e.g., [13]). Conceptually, and the update which gives a 20% data set as it finds "overpaid" em-
they should be at least as efficient as pay cut to all employees who earn ployees, depending on the operating
if provided in user space. The only more than their managers. Presum- system to provide backout or re-
problem with this approach is buffer ably, Brown will be the only em- cover-forward on crashes. If so,
416 Communications July 1981
of Volume 24
the ACM Number 7
address space. In Figure 5 we show bind-unbind pairs in a transaction.
the address space of a process con- Since the overhead of a bind is likely
taining code to be executed, data that to be comparable to that of a file
DBMS run-time code
the code uses, and the files F1 and open, this may substantially slow
F2. Such files can be referenced by down performance.
a program as if they are program It is an open question whether or
run-time data variables. Consequently, a user never not novel paging organizations can
needs to do explicit reads or writes; assist in solving the problems men-
he can depend on the paging facili- tioned in this section.
file F1 ties of the OS to move his file blocks
6.2 Buffering
into and out of main memory. Here,
we briefly discuss the problems in- All of the problems discussed in
herent in this approach. Section 2 concerning buffering (e.g.,
file F2
prefetch, non-LRU management,
and selected force out) exist in a
user process 6.1 Large Files
paged virtual memory context. How
Fig. 5. Binding Files i n t o an Address Any virtual memory scheme they can be cleanly handled in this
Space. must handle files which are large context is another unanswered ques-
objects. Popular paging hardware tion.
creates an overhead of 4 bytes per
4,096-byte page. Consequently, a 7. Conclusions
Brown might be updated before
Smith was examined, and as a result, 100M-byte file will have an overhead The bottom line is that operating
Smith would also receive the pay cut. of 100K bytes for the page table. system services in many existing sys-
It is clearly undesirable to have the Although main memory is decreas- tems are either too slow or inappro-
outcome of an update depend on the ing in cost, it may not be reasonable pilate. Current DBMSs usually pro-
order of execution. to assume that a page table of this vide their own and make little or no
If the operating system maintains size is entirely resident in primary use of those offered by the operating
the buffer pool and an intentions list memory. Therefore, there is th e pos- system. It is important that future
for crash recovery, it can avoid this sibility that an I / O operation will operating system designers become
problem [19]. However, if there is a induce two page faults: one for the more sensitive to DBMS needs.
buffer pool manager in user space, it page containing the page table for A DBMS would prefer a small
must maintain its own intentions list the data in question and one on the efficient operating system with only
in order to properly process this up- data itself. To avoid the second fault, desired services. Of those currently
date. Again, operating system facili- one must wire down a large page available, the so-called real-time op-
ties are being duplicated. table in main memory. erating systems which efficiently
Conventional file systems include provide minimal facilities come clos-
5.3 Summary the information contained in the est to this ideal. On the other hand,
It is certainly possible to have page table in a file control block. most general-purpose operating sys-
buffering, concurrency control, and Especially in extent-based file sys- tems offer all things to all people at
crash recovery all provided by the tems, a very compact representation much higher overhead. It is our hope
operating system. In order for the of this information is possible. A run that future operating systems will be
system to be successful, however, the of 1,000 consecutive blocks can be able to provide both sets of services
performance problems mentioned in represented as a starting block and a in one environment.
Section 2 must be overcome. It is length field. However, a page table References
also reasonable to consider having for this information would store each I. Bayer, R. Organization and maintenance
all 3 services provided by the DBMS of the 1,000 addresses even though of large ordered indices. Proc. ACM-
SIGFIDET Workshop on Data Description
in user space. However, if buffering each differs by just one from its pred- and Access, Houston, Texas, Nov. 1970. This
remains in user space and consis- ecessor. Consequently, a file control paper defines a particular form of a balanced
tency control does not, then much block is usually made main memory n-ary tree, called a B-tree. Algorithms to
maintain this structure on inserts and deletes
code duplication appears inevitable. resident at the time the file is opened. are presented. The original paper on this
Presumably, this will cause perform- As a result, the second I / O need popular file organization tactic.
ance problems in addition to in- never be paid. 2. Birss, E. Hewlett-Packard Corp., General
creased human effort. The alternative is to bind chunks Syst. Div. (private communication).
of a file into one's address space. Not 3. Blasgen, M., et al. The convoy
6. Paged Virtual Memory phenomenon. Operating Systs. Rev. 13, 2
only does this provide a multiuser (April 1979), 20-25. This article points out
It is often claimed that the appro- DBMS with a substantial bookkeep- the problem with descheduling a process
priate operating system tactic for ing problem concerning whether which has a short-term lock on an object
database management support is to which other processes require regularly. The
needed data is currently addressable, impact on performance is noted and possible
bind files into a user's paged virtual but it also may require a number of solutions proposed.

417 Communications July 1981


of Volume 24
the ACM Number 7
COMPUTING 9. Kaplan, J. Buffer management policies
in a database system. M.S. Th., Univ. of
15. Mattson, R., et al. Evaluation techniques
for storage hierarchies. IBM Systs. J. (June
Calif., Berkeley, Calif., 1980. This thesis 1970). Discusses buffer management in
PRACTICES simulates various non-LRU buffer detail. The paper presents and analyzes
management policies on traced data obtained serveral policies including FIFO, LRU, OPT,
from the INGRES database system. It and RANDOM.
concludes that the miss rate can be cut 10- 16. Redell, D., et al. Pilot: An operating
15% by a DBMS specific algorithm system for a personal computer. Comm.
4. Blasgen, M., et al. System R: An compared to LRU management. A C M 23, 2 (Feb. 1980), 81-92. Redell et al.
architectural update. Rep. RJ 2581, IBM focus on Pilot, the operating system for
Res. Ctr., San Jose, Calif., July 1979. Blasgen 10. Kashtan, D. UNIX and VMS: Some
performance comparisons. SRI Internat., Xerox Alto computers. It is closely coupled
describes the architecture of System R, a with Mesa and makes interesting choices in
novel full function relational database Menlo Park, Calif. (unpublished working
paper). Kashtan's paper contains benchmark areas like protection that are appropriate for
manager implemented at IBM Research. The a personal computer.
discussion centers on the changes made since timings of operating system commands in
the original System R paper was published UNIX and VMS for DEC PDP-I 1/780 17. Ritchie, D., and Thompson, K. The
in 1976. computers. These include timings of file UNIX time-sharing system. Comm. A C M 17,
reads, event flags, task switches, and pipes. 7 (July 1974), 365-375. The original paper
5. Epstein, R., and Hawthorn, P. Design describing UNIX, an operating system for
I1. Keehn, D., and Lacy, J. VSAM data set PDP- 11 computers. Novel points include
decisions for the Intelligent Database
design parameters. I B M Systs. J. (Sept. accessing files, physical devices, and pipes in
Machine. Proc. Nat. Comptr. Conf.,
1974). a uniform way and running the command-
Anaheim, Calif., May 1980, pp. 237-241. An
overview of the philosophy of the Intelligent line interpreter as a user program. Strongly
12. Knuth, D. The Art of Computer recommended reading.
Database Machine is presented. This system Programming, Vol. 3: Sorting and Searching.
provides a database manager on a dedicated Addison Wesley, Reading, Mass., 1978. 18. Shaw, A. The Logical Design of
"back end" computer which can be attached Operating Systetms. Prentice-Hall,
to a variety of host machines. 13. Lampson, B., and Sturgis, H. Crash Englewood Cliffs, N.J. 1974.
recovery in a distributed system. Xerox Res. 19. Stonebraker, M., et al. The design and
6. Gray, J. Notes on operating systems. Ctr., Palo Alto, Calif., 1976 (working paper). implementation of INGRES. A CM Trans.
Report RJ 3120, IBM Res. Ctr., San Jose, The first paper to present the now popular Database Systs. 1, 3 (Sept. 1976), 189-222.
Calif., Oct. 1978. A definitive report on two-phase commit protocol. Also, an The original paper describing the structure
locking and recovery in a database system. It interesting model of computer system crashes of the INGRES database management
pulls together most of the ideas on these is discussed and the notion of "safe" storage system, a relational data manager for PDP-
subjects including two-phase protocols, write suggested. 11 computers.
ahead log, and variable granularity locks.
Should be read every six months by anyone 14. Lauer, H., and Needham, R. On the 20. Stonebraker, M. Retrospection on a
interested in these matters. duality of operating system structures. database system. A CM Trans. Database
Operating Systs. Rev. 13, 2 (April 1979), 3- Systs. 5, 2 (June 1980), 225-240. A self-
7. IBM Corp. OS I S A M Logic. GY28- 19. This article explores in detail the critique of the INGRES system by one of its
6618, IBM, White Plains, N.Y., June "process-per-user" approach to operating designers. The article discusses design flaws
1966. systems versus the "server model." It argues in the system and indicates the historical
that they are inherently dual of each other progression of the project.
8. IBM Corp. IMS- VS General Information and that either should be implementable as 21. Tandem Computers. Enscribe Reference
Manual. GH20-1260, IBM, White Plains, efficiently as the other. Very interesting Manual. Tandem, Cupertino, Calif., Aug.
N.Y., April 1974. reading. 1979.

418 Communications July 1981


of Volume 24
the ACM" Number 7

You might also like