You are on page 1of 31

Distributed Database Systems in High Speed Wide-Area Networks

Department of Electrical Engineering - Systems University of Southern California Los Angeles, CA 90089-2565 Department of Computer Science University of California Riverside, CA 92521-0304

Sujata Banerjee and Victor O.K. Li Chihping Wang

Abstract In a distributed database system (DDBS), the data is partitioned into smaller databases and distributed over multiple sites in a computer network. Associated with a DDBS are functions like query processing and concurrency control. Traditionally in DDBS research, the computer network has been considered a performance bottleneck and a lot of research e ort has been directed towards the design of database operations that minimize the data transmission cost. With the development of high speed networks, the network transmission costs go down and new algorithms that e ciently utilize the huge bandwidth available are required. In this paper, we rst identify the issues involved in developing this distributed application in a high speed environment. Then we demonstrate the inadequacy of existing database protocols in utilizing the Gigabit-per-second network. And nally, we develop a new concurrency control protocol that performs better than traditional DDBS in a high speed network. Both analytical and simulation results are presented. In this paper, we have concentrated on the concurrency control (CC) aspect of DDBS since this protocol is at the heart of the overall functioning of the distributed system.

This research is supported in part by the National Science Foundation under grant No. NCR-9016348, by the Department of Defense Joint Services Electronics Program under contract No. F49620-91-0028, and by the Paci c Bell External Technology Program.

1 Introduction and Motivation


The evolution of very high speed networks is prompting research in many areas, including that of distributed databases of the future. There are many issues involved in the implementation of distributed databases on networks that will have speeds of the order of Gigabits per second. In a distributed computer system environment, a database may be partitioned into smaller databases and distributed over all the computer sites. Associated with a distributed database system are functions like query processing, concurrency control, data availability and failure recovery. All of these fairly complex processes have to be looked into again, with respect to implementing them on high speed networks. It is our belief that new algorithms for each of the above functions will have to be developed if the advantages of a fast network are to be fully utilized. In this paper, the various issues involved in the implementation of distributed database systems (DDBS) in a high speed environment are considered, and some new techniques are presented. Our research is oriented towards nding the absolute bounds of performance that can be achieved when networks go high speed. In most studies of distributed database systems (design or performance analysis) so far, the communication network has been considered as a bottleneck and a lot of e ort has gone into designing database operations that minimize data transmission costs. With the development of high speed networks, the network transmission costs go down and new algorithms that e ciently utilize the huge bandwidth available are required. As discovered in this paper, the traditional protocols do not achieve this. In fact, there is no justi cation for implementing DDBS on high speed networks if no gain in throughput is observed with increase in the network speed. Looked at from another angle, distributed database systems are worthy candidates for applications of broadband networks. It is our belief that most of the tra c on the future high speed networks will be generated by users wanting access to information stored in di erent databases over the network. Thus, it becomes imperative to design algorithms for distributed database systems that are well adapted to work in the high speed environment, and this is another motivation for our research. In our DDBS design, we intend to give a more active role to the communication network and try to exploit the large bandwidths in getting higher transaction throughput. Also, we recognize that the transmission rates of communication networks will continue to increase in the future and DDBS designs that can keep up are necessary. In the following sections, rst the important issues involved in the design of DDBS in a high speed network are discussed, followed by a brief literature survey. The only work known to us that has attempted to exploit the large ber optic bandwidths in getting high transaction throughputs is the datacycle database machine that was targeted at primarily achieving very high read transaction throughputs. We spend a considerable amount of time analysing and comparing a DDBS with the datacycle concept and other traditional DDBS and demonstrate that the datacycle concept works better for a read-only system in a high speed environment. Then, since real DDBS have updates too, 2

we describe a new concurrency control algorithm that works well in the high speed environment. The main goal of the datacycle and the proposed concurrency control scheme is to use the large bandwidth available to buy extra transaction throughput.

2 Main Issues
As mentioned earlier, for a DDBS in a high speed environment, the main motivation is to be able to trade transaction throughput with the available bandwidth. All of the following discussion is motivated with the aim of reducing the average transaction response time (which increases the transaction throughput) using a broadband network resource. Conventional networks are di erent from high speed networks in that the propagation delay in the former is small relative to the transmission delay of data. For example, the propagation delay across the United States (at the speed of light) is about 20 msec. In the emerging high-speed network, at say, 1 Gigabit/sec, it will only take 1 msec. to transmit a 1 Megabit le, for a total delay of 21 msec. For a traditional network like the Internet, operating at 50 Kbits/sec, the transmission delay is 20 sec., for a total delay of 20020 msec. In addition, the transmission delay dominates the local processing delay at the computer sites. Therefore, most existing distributed database algorithms have focused on minimizing the volume of data to be transmitted. In a high speed environment, however, the emphasis now should be on reducing the total propagation and processing delay. It has been postulated 1] that the processor speed has a great impact on the data communication performance. Much of the time (up to 90%) required by the transfer of data from one site to another is spent by the processor in each site (packetizing, etc). Thus, it is even more important to have fast processors and optimize on the processing time. In recent years however, with the availability of higher bandwidth, research has been undertaken to make the communication protocols faster. An important criterion when dealing with DDBS in a high speed network is that the transaction volume is expected to be quite large, because of the high capacity of the system. This has signi cant e ect on the design of the DDBS protocols. Some of the other distinguishing characteristics of high speed networks include high reliability (owing to the low bit-error-rates (BER) of the optical ber transmission medium) and regular topology (to simplify routing). Although these two speci c issues will not be considered in this paper, they are to be kept in mind for future work. Summarizing, high speed networks are characterised by the following properties. The rest of this section is concerned with the way in which these properties a ect our design of a DDBS. Large propagation delay compared to transmission delay. Large propagation delay compared to processing delay. 3

Large transaction volume. High Reliability. Regular Topology.

2.1 Simple Protocols


To minimize the processing time, it is important to recognize that complicated protocols lead to additional processing, and thus all the protocols developed should be very simple to implement.

2.2 Minimizing Sequential Message Passing


The propagation delay between two sites is a function of the distance between them. Thus, protocols which minimize the number of sequential messages are necessary, since each message will incur the same propagation delay. Protocols based on broadcasting information may prove to be better suited to the high speed environment.

2.3 The Memory Resident Property


It has been well established that the performance of database systems is heavily dependent on the e ciency of the I/O system 2,3]. In other words, in current database systems, the I/O devices (disks) constitute the performance bottleneck. While the processor speeds have gone up rapidly, the I/O systems have not been able to keep up. An obvious solution to this problem is to put the entire database on the main memory of the computer system. This was not feasible for a long time owing to the cost of such large main memories. However, in the last decade, researchers have begun visualizing the possibility of \Memory-Resident Databases" owing to the decreasing memory costs 3{11]. While this eliminates the I/O bottleneck to a large extent (fewer disk accesses), the problem of failure recovery assumes great importance. This is due to the fact that the main memory is susceptible to many failures (power failure, software corruption, etc), and stable memory is extremely expensive even today. With distributed database systems being implemented on a high speed network, the volume of incoming transactions will increase many fold 12]. If we are to reap any of the advantages of a high speed network, the I/O bottleneck will have to be eliminated. Thus, a DDBS on a high speed network is expected to be \Memory-Resident".

2.4 Avoidance of Deadlocks and Rollback


A deadlock is said to have occurred when two or more transactions are waiting for each other in a cycle, to release one or more data items. For instance, a deadlock situation may arise when the concurrency control is implemented using a locking method. In deadlock-prone algorithms, valuable processing time has to be spent in running deadlock detection/resolution processes. Deadlocks do not occur very frequently. However, when they do occur, a large number of transactions may be blocked 13,14]. In DDBS on high speed networks, the transaction arrival rate is quite high, and during the lifetime of a deadlock, many transactions may have to wait although they are not directly involved in the deadlock. Thus, it would be advisable to implement protocols in which deadlocks do not occur at all, or implement robust deadlock avoidance schemes. In conventional DDBS, transactions are rolled back whenever an inconsistent operation is detected. Often, rolled back transactions will cause other transactions to be rolled back, causing a cascading e ect. In a DDBS that sustains a high transaction throughput, a cascaded rollback event will a ect a large number of transactions, many of which might have already left the system. Thus, another requirement of a DDBS on a high speed network is to have a rollback-free execution.

3 Related Work
So far, not much work has been done in the area of distributed database systems on wide area high speed networks. Two notable attempts are the Fragment and Replicate query processing strategy designed for high speed local area networks, and the Datacycle Architecture that was aimed at designing a database machine with very high throughputs for primarily read-oriented transactions. These contributions are described in some detail in the following subsections.

3.1 Fragment and Replicate Query Processing Strategy


The Fragment and Replicate method of query processing for fast local area networks was developed to minimize the CPU processing time 15{21]. This algorithm introduces parallelism in the CPU processing, thereby, minimizing the overall query response time. A query is divided into a number of subqueries (by fragmenting some relations, replicating others) and processed at di erent sites concurrently. Further, an integrated algorithm has been developed (C.T. Yu et al.) to minimize the data transmission cost (using traditional methods like the Semi-join algorithm 22, 23]) as well as the CPU processing time. The Fragment and Replicate algorithm, while it does introduce parallelism in the processing of queries, thereby reducing query response time, requires complex optimality calculations. It is expected that this overhead will dominate the advantages gained by using this method. This prompts research in database design issues. 5

3.2 The Datacycle Architecture


The datacycle architecture was developed to achieve very high read-oriented transaction throughputs 24{27]. The datacycle architecture utilizes the high ber optic bandwidths available to broadcast the entire database contents periodically on a ber optic broadcast bus. A central database site (also called the storage pump) is responsible for the broadcasting mechanism. User access units (UAU) equipped with data lters read the data \on the y" as the data les pass by that site on the broadcast bus. Thus, the attainable read throughput is quite high. An optimistic concurrency control scheme is implemented, whereby the data items are read from the broadcast bus and the updates are submitted to a central update manager (UM) via an upstream network. The central update unit is responsible for resolving update con icts and accepts only a non-con icting set of updates. This architecture employs a multiversion concurrency control strategy that is optimistic in nature, and uses validation (both at the centralized and local levels) before committing any updates. The main bottleneck in this strategy is the centralized update manager, whose saturation point ultimately limits the update throughput. The query throughput is essentially unlimited although long transactions that extend over one or more data cycles will have to undergo local validation. The Datacycle architecture also gets around the I/O bottleneck for record retrieval. In this sense, it can also be classi ed as a memory-resident system. A schematic diagram of the datacycle architecture is provided in Figure 1. The performance of the datacycle architecture depends on the fraction of queries (read-only transactions) generated in the system. This architecture is not expected to do very well in a system where the fraction of updates is high. Also, the response time of queries with large access sets or whose access sets cannot be pre-determined (data-dependent access sets) is large in the datacycle architecture. This is because a single cycle may not be su cient to access all the data items. The central database site is the most vulnerable component of the system, and in the case of failure, will paralyze the operation of the entire system. Recognizing the performance bottleneck of the centralized update manager as well as the reliability bottleneck of the central database pump, a distributed version of the Datacycle architecture has been proposed 25] in which the function of broadcasting the data items to all the sites is distributed over several sites. Also, a distributed process of validating transactions has been proposed.

4 The Datacycle Architecture as a potential DDBS in a High Speed Environment


As mentioned earlier, the only attempt so far at designing a database system to exploit the high optical ber bandwidth has been the Datacycle database machine architecture 24{27]. It must be noted, however, that the datacycle architecture in its original form does not satisfy all the char6

Figure 1: A schematic diagram of the datacycle architecture acteristics of a distributed database system. For instance, the data must be physically distributed over a computer network for the system to pass as a DDBS 22]. The datacycle architecture resembles a multi-processor database machine more than a DDBS. It is given such a prominent coverage here since it is the only database system till now that has attempted to utilize the high bandwidth of optical bers to provide higher transaction throughputs.
UAU

4.1 Disadvantages of using the Datacycle Architecture as a Wide Area High UAU UAU UAU UAU Speed DDBS

The main disadvantages that would appear if the Datacycle architecture were to be used as a distributed database systemNetwork in a high speed environment are enumerated in this section. Upstream It is mainly a read-oriented system. The update throughput is limited by the centralized Update Manager. The multiversion optimistic concurrency control scheme does introduce more concurrency and prevents multiple aborts of long transactions. However, the database size increases proportionately to the number of versions being maintained. This has an adverse e ect of increasing the data-cycle duration, which in turn increases the query response time. Optimistic schemes for concurrency control usually do not perform very well under situations of high probability of con ict. With very high volume of transactions coming into a high speed environment, the probability of con ict will be high too, and thus an optimistic scheme is not the appropriate choice. 7

The central database pump is the most vulnerable component in this architecture. In case of a pump failure, the entire system shuts down. Data dependent transactions as well as transactions with large access sets incur high response times and have a high abort probability. It is our intention to avoid (or at least minimize) the above problems in our design, and compare it with both traditional DDBS and the Datacycle system.

5 Performance Comparison of a Read-Only System with the Datacycle Architecture and Traditional Locking scheme
The concurrency control (CC) scheme is at the heart of any DDBS system. The performance of a DDBS is largely determined by the e ciency of its concurrency control scheme. Much work has been done in the area of CC. 28] provides a comprehensive collection of CC schemes proposed in the available literature. The two-phase locking (2-PL) CC scheme is the currently accepted industry standard. It is our belief that the datacycle concept of periodically broadcasting the entire database works very well for a Read-only system, and will outperform the standard 2-PL, given a communication network with a high enough data rate. This fact will be demonstrated with an example in this section using performance models developed by us. In conventional distributed databases, the data items reside at speci c computer sites, and all requests to read or write a data-item are directed to the respective computer site. It must be noted that in a DDBS where updates are not made, there is no need to exercise concurrency control since reading a data item can be done simultaneously by two or more transactions. However, in a conventional DDBS running two-phase locking, even a query (read-only transactions) would have to request the data items in its access set from the respective computer sites. In a pure read-only system, the lock-request messages may be renamed data-request messages. An example comparing the performance of a read-only system with traditional locking and a read-only system with the datacycle scheme follows. It is assumed that the communication network as well as the computer sites are perfectly reliable.

5.1 Example
The datacycle implementation of a Read-only DDBS on a high speed network is considered rst. Although, ultimately it would make sense to consider a distributed version of the datacycle architecture, we look at the centralized data pump implementation here. In any case, for a Read-only 8

system, the distributed version would have the e ect of producing a lower cycle time, and all our results here can be scaled accordingly. Next, a very small DDBS example for illustrative purposes alone, is described.

5.2 Datacycle Parameters


A database of size G Gigabits (G = 0:05, i.e. 50 Mbits) is transmitted from a central data site on a broadcast bus of length 2L miles (L = 3000). The entire database is split into D data-items (D = 100) of equal size1 . The network data rate is assumed to be C Gigabits per second. With these parameters, it is possible to calculate the data-cycle time (Tdc ), which is the time to transmit the entire database on the broadcast bus. Each data item occupies a time slot (Ts ) on the bus. The minimum time to access a data item under these circumstances is Ts , and the maximum time to access a data item is Tdc . Assuming that a transaction can arrive anywhere in the middle of a data cycle, the distribution of the time to access a single data item can be assumed to be uniform in (Ts; Tdc + Ts ). Let be the random variable denoting the time to access one data item. Then, the probability density function of is given by

G Tdc = C

dc Ts = T D

(1)

f (

( 1 ) = Tdc

otherwise

Ts <

Tdc + Ts

(2)

5.3 2-PL Parameters


The datacycle architecture uses a broadcast bus due to its speci c requirements, while conventional DDBS may not be limited to just the bus topology. For comparison purposes, we choose a bidirectional ber optic ring topology of circumference 2L with N computer sites located at equal distances on the ring. Each of the D data items is located at one of the N computer sites. Care is taken to see that each site has the same number of data items. Queueing delay at the links is ignored here. Denote the propagation delay between any two adjacent sites as Tp and the time to transmit a single data item as Ttr . The DDBS architectures for the datacycle and the locking schemes are depicted in Figure 2. The probability of deadlock is assumed to be negligible.

Tp = 2 N SpeedLof Light , where speed of light = 186000 miles/sec Ttr = DGC


1

(3) (4)

This is not true in reality, but since it does not a ect our results in any signi cant manner, we shall make this assumption.

Figure 2: DDBS architecture for (a) the datacycle scheme (b) the locking scheme The minimum time to access a data item by a site is zero (if the data item resides at that same site) and the maximum time to access a data item is N Tp + Ttr (the sum of the propagation delay for both the access request and the data itself and the transmission delay for the data1 ). Let us denote by the time to access a single data item, as before, and assume that it is uniformly distributed in (0; NTp + Ttr ). Now,

f (

8 1 < ) = : NTp+Ttr 0

0< N Tp + Ttr otherwise

(5)

5.4 Computer Site Parameters


Although, it has not been explicitly mentioned so far, each computer site in the DDBS under consideration is assumed to be a general-purpose database machine. Many database machines have been designed so far and some of them have maximum achievable throughputs up to 200-500 transactions per second 12]. From these gures, the maximum execution time per transaction ranges from a few milliseconds to a few hundred milliseconds. The execution time is expected to decrease when database systems are memory-resident, as the I/O bottleneck is eliminated. Further, the multiprogramming level is assumed to be 1 at each site, i.e., each site processes only (b) one transaction at a time.
1

(a)

The transmission delay for the access request is neglected owing to the small size of the access request

10

5.5 Transaction Parameters


For the sake of simplicity, we assume that all transactions at all sites have the same data access distribution. Each transaction has an access set cardinality k ranging from 1 to K with probability gk . Also, given the access set cardinality, all the D data items are accessed with equal probability. We further consider two \extreme" cases of data access distribution: All transactions can determine the data items in their access sets before execution begins. (Pre-declared access set transactions) All transactions access their data items in a \serial fashion", i.e, the identity of the data item to be accessed next can be determined only after accessing the data item before it (Data-dependent transactions). We can then calculate the average response time of a transaction, de ned here as the time required by a transaction to access all the data items in its access set after it arrives at a site. Thus the response time includes the time that a transaction waits in a queue before being serviced. The analysis for the pre-declared transactions is provided next, while the analysis for the data-dependent transactions is provided in Appendix A.

5.6 Analysis of DDBS with transactions having pre-declared access sets


Let k denote the time to access k data items by a transaction. Also, let Di denote the time to access the data item Di. Then,
k = max( D1 ; D2 ; : : :; Dk )

(6)

The random variables are independent and identically distributed with the uniform distributions f ( ) in Equations 2 and 5 respectively for the datacycle implementation and the 2-PL scheme. Hence,

f k ( ) = k f ( ) (F ( ))k?1

(7)

probability density function of k , the mean k and second moment k ]2 may be easily calculated. As mentioned earlier, the access set cardinality k of a transaction ranges from 1 to K with probability gk . We are interested in the mean (X ) and second moment (X 2) of the random variable X ,

F ( ) is the probability distribution function of the random variable . Thus from the above

11

which denotes the time for any transaction to access all the data items in its access set. Clearly,

X=

K X k=1

k gk

and

X2 =

K X k=1

k ]2 gk

(8)

Let the random variable E denote the transaction execution time. Also, let E and E 2 denote the rst and second moments of E respectively. As de ned in earlier sections, the random variable X denotes the time duration to access all the data items in the access set of the transaction being processed. The average transaction service time is now denoted by S which is the sum of the data access time and the execution time. We assume that the execution does not begin until all the data items have been read. Thus, the following equations may be written assuming that the execution time and the data access time are independent.

S = X +E S = X +E

and

S2 = X 2 + E2 + 2 X E

(9)

The transaction execution time here is assumed to be the same for all transactions. This may not be true in general and a better assumption perhaps might be that the transaction execution time is a function of the cardinality of its access set. Then E = E (k), where k is the cardinality of the transaction access set. In that case, we de ne

Sk = S=

k + E (k)

and

S k ]2 = S2 =
K X k=1

k ]2 + E 2(k) + 2 k E (k)

(10)

Now, S and S 2 have to be recalculated as below.


K X k=1

S k gk

and

S k ]2 gk

(11)

Transactions queue up at each of the computer sites at an average arrival rate of transactions per second, and are serviced with an average service time of S and second moment of S 2 . Now, the average time spent by each transaction in the system R, may be calculated using the standard M/G/1 formula 29] as given below.

S2 R = S + 2 (1 ? S)

(12)

5.7 Simulation Details


Since the assumption of the random variables D1 ; : : :; Dk being independent may not be very accurate, a simulation of the datacycle scheme as well as that of the traditional two-phase locking was done, with the same parameters: G = 0:05 Gbits, L = 3000 miles, D = 100 data items and 12

1000 900 .... : Traditional Locking 1 Gbps 1.5 Gbps 2 Gbps

1200

Transaction Response Time (msec)

700 600 500 400 300 200 100 0 0 5 10 15 20 25 30 35 40 45 50

Transaction Response Time (msec)

800

1000

1 Gbps

1.5 Gbps

2 Gbps

800

600

400

200

0 0

10

12

14

Transaction Arrival Rate at a Site

Transaction Arrival Rate at a Site

Figure 3: Comparison of datacycle and traditional locking for a read-only system with pre-declared access sets: E=0

Figure 4: Datacycle performance with exponentially distributed execution time: E = 50 msec

deadlock resolution time was not included in the overall query response time. Also, the access time of a single data item was assumed to be uniformly distributed. Results of the comparison between the datacycle scheme and the traditional locking are presented in Figure 3 for three di erent values of the network speed C , viz., 1, 1.5 and 2 Gbps with zero execution time. Figure 4 contains the same results for the datacycle scheme with an average exceution time of 50 msec. The execution time was assumed to be exponentially distributed and independent of the cardinality of the access set. The 1 ; k = 1; : : :; K maximum data access set cardinality K has been assigned a value of 10, with gk = K (i.e, all access set cardinalities between 1 and K are equally likely). A comparison of the simulation and analysis results for the datacycle and the traditional locking scheme (with zero execution time) are given in Figures 5 and 6 respectively. Clearly, the simulation and analytical results match fairly well for both cases. The simulations written in the C programming language executed 50,000 transactions typically and required a few minutes of CPU time (SUN Sparc Workstation) per data point. The analysis typically took some fraction of a second to compute all the necessary data points. As the speed of the network is increased, the maximum sustainable transaction arrival rate also increases as before. However, as the average query execution time increases, increasing the network data rate is not as e ective as before in improving the query throughput. This is a fairly obvious result, since the bottleneck device is now the database machine that processes the queries. Figure 4 has been provided here to stress the fact that there is little point in implementing distributed databases on very high speed networks if the query processing capabilities of database machines cannot keep up. From Figure 3, it is evident that the datacycle scheme performs worse than the 2-PL scheme, at 13

N = 10 computer sites. Deadlocks were taken into consideration in the simulation although the

1000 900 * : Simulation Points

700 1.5 Gbps * : Simulation Points


*

600

Transaction Response Time (msec)

Transaction Response Time (msec)

800 700
*

500

600 1 Gbps 500 400


* *

2 Gbps
*

400 2 Gbps 300


*

300 200 100 0 0


* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

200
*

* * * * * *

100
* * * * * * * * * * * * * * * *

10

15

20

25

30

35

40

45

50

0 0

10

15

20

25

30

35

40

Transaction Arrival Rate at a Site

Transaction Arrival Rate at a Site

Figure 5: Comparison of simulation and analysis of the datacycle scheme for a read-only system with pre-declared access sets: E=0

Figure 6: Comparison of simulation and analysis of traditional locking for a read-only system with pre-declared access sets: E=0

a low data rate (1 Gbps); the two schemes perform almost equally at an intermediate data rate (1.5 Gbps) and at a high data rate (2 Gbps), the datacycle scheme out-performs the traditional 2-PL scheme. The same pattern is noticed for both data access distributions. Further, it is interesting to note that the performance of the 2-PL scheme does not change very much with increase in the data rate. This is to be expected since the 2-PL scheme is limited by the propagation delay between sites which is una ected by the network data rate. On the other hand, since the performance of the datacycle scheme directly depends on the transmission time of the entire database, with increase in the data rate, the performance drastically improves. Thus, there is a cross over point in the speed of the network where the datacycle scheme does better than the traditional 2-PL system. This cross over occurs just over 1.5 Gbps for both data access distributions, as can be seen in the gures provided. It would be very useful to be able to theoretically derive this cross over point, and this is what is done in the following sub-section. The datacycle scheme, however, su ers from the extra cost involved in equipping the computer sites with sophisticated data lters to read the data on the broadcast bus \on the y" and having a reliable centralized data pump.

5.8 Derivation of the Approximate Cross-over Point


Let TA and TB be the maximum time required to access a data item in the datacycle and 2-PL schemes respectively. The approximate cross over point in the data rate above which the datacycle system outperforms the 2-PL scheme can be obtained by equating TA and TB . From the discussions

14

in the previous sections,

C is the network data rate at which the cross over in performance occurs. Now, TA = TB is solved for C , with G = 0:05, D = 100 data items, N = 10 computer sites and Tp is calculated using Equation 3 with L = 3000. C is found to be 1:534 Gbps, quite close to the value obtained from
the gures.

G) TA = Tdc (= C TB = N Tp + Ttr (= N Tp + DG C )

(13) (14)

5.9 The E ect of Updates in the Datacycle Scheme


So far, we have studied a read-only system with no updates in the DDBS. Clearly, the read throughput will decrease as updates take place. In this section, the e ect of updates on the query throughput in the datacycle scheme is examined. Speci cally, we look at an update-intensive system, where each data item is written on in every data cycle. We look at such a situation because it is the worst case scenario for query execution and will give us the lower bound of the query throughput. The upper bound for the throughput is for the case when there are no updates. This was studied in the previous sections and the results are summarized in Figure 3. Here, we consider only those queries whose readsets are predeclarable, and the execution time is assumed to be negligible. In the datacycle architecture, the updates, if any, are incorporated in the data stream at the beginning of each cycle, and never in the middle of a cycle. Thus, data items read by a DDBS site in the same cycle are guaranteed to be mutually consistent. This fact adversely a ects the response time of queries with relatively large access sets, as these queries may require upto two data cycles to access their entire readset 25,26].

5.9.1 Details of the Analysis


The parameters used here are the same as made in earlier sections. The important point to be noted in the analysis is the fact that if the processing of a query begins at a time when some of the data items in its readset have already passed by that site, then the query has to wait until it can access its entire readset in the same data cycle. This implies that the query has to wait at least until the beginning of the next data cycle. This is the extra overhead incurred due the frequent updating of data items. Given that n data items out of D have passed the site processing a query with k data items in its readset, the following may be derived. It is assumed as before that the data items are accessed uniformly.
k = ProbfAll k data items are accessed in the same cyclej n out of D data items Pn

15

have gone pastg


8 > > > < => > > :

D?n k D ; k D?n k

0;

k > D?n

(15)

Qk n;m = Probfm of the k data items are accessed in the same cyclej n out of D data items have gone pastg Dm ?n n k ? m ; m D ? n and k ? m n D k

(16)

Since the processing of a query may begin anywhere within a data cycle, P = Probfn out of D data items have gone pastg = 1 ; n = 1; 2; : : :; D
n

k be the average time for a query to access all the k data items in its readset, given that n Let Xn of the data items in the current cycle have already gone past the site. Equation 18 is the formula k and has three terms. The rst term accounts for the case where all the k data items are for Xn accessed in the current cycle. The second term is for the case where all of the k required data items have gone past. Here, the query has to wait until the end of the current cycle, which is the time required for the remaining (D ? n) data items to go past. As per Equation 1, the transmission time per data item is Ts . Assuming that queries arrive exactly at the beginning of a data item time slot, the remaining time until the end of the current data cycle is (D ? n)Ts . Then, since all the k data items are among the rst n data items in the new cycle, the time to access the data items can be a maximum of nTs . The third term considers a scenario where only m out of k data items can be accessed in the current cycle. Owing to the data consistency criterion, all the data items have to be accessed in the same cycle. Thus, again the query has to wait until the end of the current cycle and access all the k data items in the new cycle. Figure 7 should help clarify the scenario being considered. k = P k ?k k k Xn n 1;D?n + Qn;0 (D ? n)Ts + ?1;n ] + k ?1 X Qk n;m m=1

(17)

(D ? n)Ts + ?m n;D ]

(18)

?r i;j denotes the average time required to access r data items, the access times of each of which is uniformly distributed between iTs; jTs]; i < j . This parameter may be easily calculated as done k ]2 may be computed. Now, X k , the average time required before. Similarly, the second moment Xn 16

Figure 7: Timing diagram for a query that arrives when n data items have gone past by a query having k data items in its readset may be evaluated as below.

Xk =

D X n=1

kP Xn n

and

X k ]2

D X n=1

k ]2 P Xn n

(19)

The average time X and its second moment X 2 required by a query to access its entire readset, averaged over all the K classes of queries is given below. From earlier sections, gk = ProbfThe cardinality of the readset of a query = kg.

X=

K X k=1

X k gk

and

X2 =

K X k=1

X k ]2 gk

(20)

Now the average response time of a query may be calculated using Equation 12. The graphs in Figure 3 correspond to sthe case where there are no updates in the system. The e ect of updates (D-n)T nT nTs (D-n)T s s will be to shift the graphs to the left. In Figure 8, the graphs corresponding to the case where all the data items are updated at the beginning of each cycle are presented. For easy comparison, the throughput curves of the update-free case are also provided in the same gure. It is interesting to Broadcast Cycle note Transaction (by comparing Figures 3 andBegins 8) that at a higher data rate (of 2 Gbps), the datacycle scheme in an Arrival update-intensive environment performs slightly better than the traditional locking protocol in an update-free environment. As the data rate is further increased, the margin of improvement in performance is expected to increase. Since some assumptions were made to simplify the analysis (basically, the same assumptions made earlier), a simulation was done to validate the analysis. The comparison between the analysis and simulation results are presented in Figure 9. For low loads, the simulation and analysis results match very well while at higher loads, the results deviate a little. This is owing to the fact that at high loads, there is a dependency between the response times of successive queries, which is not taken into account in the analysis.

17

800 ......: Frequent Updates 700 ____ : No Updates

1200 * : Simulation Points 1000


*

Transaction Response Time (msec)

Transaction Response Time (msec)

600 1 Gbps 500 400 300 200 100 0 0 2 Gbps

1 Gbps 800

* * * *

2 Gbps

600

400

* * * * * * * *

* * * * * * * * * * * * * * *

200
* * * * * * * * * * *

* * * * * * *

10

15

20

25

30

35

40

45

50

0 0

10

15

20

25

30

35

40

45

Transaction Arrival Rate at a Site

Transaction Arrival Rate at a Site

Figure 8: The e ect of updates on the query throughput in the datacycle scheme: E = 0

6 A New Concurrency Control Technique

Figure 9: Comparison of simulation and analysis of the e ect of frequent updates on the query throughput in the datacycle scheme: E=0

From the previous sections, it is clear that the datacycle concept works well for a read-only system. We have looked at some other concurrency control (CC) strategies with respect to implementing them in DDBS in a high speed environment. Speci cally, we have looked at the datacycle implementation of update transactions, and the standard two-phase locking. The shortcomings of these schemes have been highlighted below. In this section, we develop a new CC protocol termed Sendon-Demand , bearing in mind the various requirements of a DDBS in a high speed environment. Our database correctness notion is that of con ict serializability 22, 28]. To avoid rollbacks to the maximum possible extent, we use the consistency criteria of the highest degree (degree 3) as de ned by Gray et al. in 22, 30]. CC problems in the datacycle architecture 1. The centralized update manager is the ultimate performance and reliability bottleneck. 2. A multiversion optimistic algorithm is used that may not work very e ciently under conditions of high data contention (which is to be expected in a DDBS in a high speed environment). CC problems in locking 1. There is a fair possibility of deadlocks. 2. There is also an overhead associated with the sequence of lock request, lock grant and lock release messages while accessing any data item. 18

Figure 10: Broadcast message format

6.1 The Send-on-Demand Protocol as applied to Updates


Data items are traditionally bound to one computer site. By this, we mean that all requests to access a data item have to be processed by the computer site where the data item resides. This leads to an exchange of a sequence of messages between the transaction initiating site and the site where the data item resides. In our e ort to reduce the number of sequential messages and also promote cooperative execution between all the sites in the DDBS, we propose the following concurrency control scheme. The underlying support mechanism of this protocol is similar to that of the datacycle architecture. The software implementation di culties are comparable to that of locking. We have ommitted these details in order to keep the paper focussed. In this new scheme, data items are no longer bound to any particular computer site. Every site in the DDBS maintains what we call a \claim queue" for each data item that is currently located at that site. The claim queue for a data item contains a list of transactions (with the site ID where the transaction is being executed) that require that particular data item and also speci es the action intended to be taken by the transaction on the data item (read or write). The access set along with the arrival time-stamp of a transaction T { TS(T) (i.e., the time of arrival at the Head-of-line (HOL) position at the transaction initiation site) of every incoming transaction is broadcast to all sites in the DDBS. A typical broadcast message sent by a site processing a transaction with k data items in its access set is given in Figure 10. The transaction access set is entered in the respective claim queues by a site in the order of the transaction arrival timestamps. However, no action is taken until it is sure that every site in the DDBS has received the above information. At this point, the transaction is said to be con rmed in our terminology, and the duration a site has to wait until a transaction is con rmed is termed the con rmation duration. A claim queue at any point in time contains a set of con rmed transactions and a set of uncon rmed transactions. In Figure 11, a claim queue for a data item with m claims of which n are con rmed transactions is depicted. The con rmation duration of a transaction is the time greater than or equal to that required to send the access set information to all the sites. This ensures that at every site, the claim queues have entries in the same order, and thus no deadlocks can occur. 19

Figure 11: Structure of the claim queue for a data object At system start-up time, all the data items reside at certain sites. As the transactions start arriving, the claim queues start lling up. Every site transmits each of its resident data items to the site where the rst con rmed transaction in the respective claim queues was initiated. To improve overall e ciency, each site may transmit the remainder of the associated claim queue along with the data item, although construction of the claim queue may also be done by processing the broadcast information stored earlier. While the claim queue is being transmitted, some other transactions requiring that data item may arrive in the system (broadcast their access set information and subsequently get con rmed). The information about these transactions may be derived from the broadcast information. Every transaction-initiating site waits for the entire access set to arrive at its location, nishes processing the transaction, and then sends out the data items to other sites that are next on the respective claim queues. Clearly, this mechanism eliminates the necessity to unlock data items, leading to better performance. In traditional systems, to maintain the atomicity of transactions, special commit protocols like the two-phase commit 22] may have to be implemented. However with a little thought, it is clear that only an inexpensive local commit protocol is required here, thus enhancing performance at another level. Since each site maintains a claim queue for only those data items that currently reside at that site, the memory requirement is not very large. However, each site has to allocate memory to store the access set information of con rmed transactions. A typical update transaction has a readset as well as a writeset. Thus a typical claim queue for a data item has read entries as well as write entries. Now, consecutive read entries can be processed in parallel, while the write entries have to be served serially. Consider the claim queue of a single data item Dj , where there are two write entries separated by a few read entries. After the rst write has been completed by the site that currently possesses the write copy of Dj , it sends the updated version of Dj to all sites that have a read entry in the claim queue before the next write. The actual write copy is sent to the site that is responsible for the last read-entry before the second write entry. However, after each transaction that needed to read Dj is completed, there should be a mechanism to inform the site that needs to write on Dj . This is achieved if the sites reading Dj send a message to the site that is waiting to write on Dj the moment the update requiring the read is over. The site that has the write copy transmits the write copy of Dj along with the message, 20

and then the update at that site can begin. One of the important assumptions made here is that the con rmation duration is set to a value such that total ordering of the messages is achieved, i.e., every site sees the same ordering of conrmed transactions in a claim queue. In a real system, messages may be delayed and the broadcast mechanism may have to be sophisticated enough to guarantee total ordering even under those circumstances. Further work is planned on studying the performance of the send-on-demand protocol under the relaxed assumption that total ordering is not met by the broadcasting mechanism. To illustrate the working of the send-on-demand protocol, a simple example has been worked out in Figure 12 with a 3-node DDBS and 3 data items. The communication links are labelled with the communication time (transmission + propagation delay, the major component of which in high speed networks is the propagation time).

6.2 The Integrated Send-on-Demand CC Protocol


The integrated CC algorithm is applicable to a general DDBS where both queries (read-only transactions) and updates (read and write transactions) are executed. The integrated algorithm is a hybrid of the datacycle scheme and the send-on-demand scheme. The read operations are handled by the datacycle mechanism of broadcasting data items and the write operations are managed by the send-on-demand protocol. The integration process is not absolutely straightforward as the read and write processes have to be integrated in a way so as to not violate any consistency criteria. For instance, the readset of a transaction may not be written over by another transaction, until the rst transaction has nished execution. For a detailed description of the consistency criteria, the reader is referred to 22,30]. For the read operations, we propose to apply the distributed version of the datacycle scheme. The entire database is divided into fragments and some of the DDBS sites are given the responsibility of broadcasting one fragment to all the other sites periodically. The fragment sizes need not be equal, depending on other factors. Each data item in each database fragment has timestamp information appended to it. The timestamp value is the time at which that particular data item was updated by some transaction and also contains the identity of the transaction that was responsible for the update2 . This timestamp value is initialized to zero at system start-up time. As in the datacycle architecture, the computer sites are assumed to be equipped with data lters and other necessary hardware required to read data items on the y 24{27]. Data items may be read by the computer sites in every broadcast cycle and used in processing queries. As long as all the data items in the query readset are of the same version, the query is guaranteed to give the correct result. In the datacycle architecture context, a query that can obtain its entire readset within one
The transaction ID is used to obtain the set of data items that were updated along with the data item under consideration.
2

21

0 T3 Site 2

time of arrival at HOL Site 3

ite 1 t Message nt

Figure 12: A simple example illustrating send-on-demand execution data cycle is guaranteed to be correct since updates are incorporated in the broadcast stream only at the beginning of the cycle. The information D2may not be the most up-to-date but the result of the Broadcast Message for query will be T consistent. However in the case of the distributed datacycle implementation, it is not 2 sent very clearSite as to consistent since now the updates of data items a ected D1 w data 2 how TS=1 D3 may w beTderived, 1 by the same update operation may be re ected in the di erent broadcast streams at di erent times. 22
D2 T1 conrmed and D2 sent to Site 1 Broadcast Message for T

S=0 D2 w D3 w

D1

D1

Figure 12: Example (continued) The D2 sent to Site 3.data items read from one broadcast stream is of course still consistent within a single cycle. The example below will demonstrate how this problem is tackled in our system. Example
D1 D3

Consider a query that has three data items, say D1 ; D2 and D3 in its readset. Let the pair T2 execution t T (TSDj ; SDj ) represent the timestamp information, where TSDj is the write timestamp of the data 3 continues for 1 unit item Dj and SDj = fDi1 ; : : :; Dik g is the set of k data items (including Dj ) updated by the same update that created the latest version of Dj respectively, as read by a particular site at time t. The second piece of information may be derived from the transaction ID information. Without any loss of generality, Let t ; tDand t3 be the respective points in time that data items D1; D2 and D3 D11 2 T2query nishes execution respectively are available3to the in question. Figure 13a illustrates this scenario, where the and departs. data items in the readset are D available onSite two 3 di erent broadcast channels. Also, the broadcast sent to
T3
1

D3

23

Figure 13: Timing diagram for a query


3 cycles aretassumed to be continuous. The query processing starts at t0 . Let

st Cycle Begins

(TSD1 ; SD1 )t1 (a) = (ts1 ; fD1; D4g) (TSD2 ; SD2 )t2 = (ts2 ; fD2; D3g) (TSD3 ; SD3 )t3 = (ts3 ; fD3g)

If is violated since no con icting updates t2 no consistency criteria t1 it is obvious that t0ts2 ts3 , then t4 are re ected in the versions being read. Figure 13b depicts the case when ts2 > ts3 . In this case, an update that should have re ected on the version of D3 has not yet taken e ect and using these versions of the three data items to compute the query would give inconsistent results. Hence, the site has to wait until a consistent set of versions can be obtained. Suppose (TSD1 ; SD1 )t4 = (ts4 ; fD1; D2g) (TSD2 ; SD2 )t5 = (ts4 ; fD1; D2g) (TSD3 ; SD3 )t6 = (ts2 ; fD2; D3g)
t3

24

t6

(b)

Let ts4 > ts2 . Now, D3 has been updated to the new version. However, in the meantime, the versions of D1 and D2 have changed. Thus the data versions at time instants t1 ; t2 and t6 will constitute a consistent set. From the above example, the consistency criteria for queries may be developed. A query that requires k data items, say D1 ; : : :; Dk without loss of generality, may read those versions of the data items that satisfy atleast one of the following conditions.

Dm; Dn 62 SDm T SDn , 1 m; n k and m 6= n.

For any two data items Dm and Dn , 1 m; n k and m 6= n, If Dm or Dn 2 SDm SDn ], then at least one of SDm or SDn contains both Dm and Dn . Let SDm be that update set. Then, the set of versions accessed is consistent if TSDm TSDn . The integrated send-on-demand algorithm is summarized in Figure 14.

6.3 Simulation Results


In Figures 15 and 16, we present a comparison between locking and the Send-on-Demand protocol for a write-only system. This is the result of a simulation study, in which upto 150,000 transactions were executed in each run. The parameters used are the same as that used in the read-only system of the previous section, viz., G=0.05, L=3000, D=100, N=10. The access set cardinality of all transactions is assumed to be a constant K . In Figure 15, when the data contention is high (K=4), the send-on-demand protocol outperforms 2-PL, while at low data contention (K=1), the locking performance is better. The send-on-demand protocol has an initial overhead - the broadcasting phase, before any transaction can be con rmed, i.e., o cially accepted into the system. However, the data items themselves are locked for a relatively short period as compared to the 2-PL scheme, which su ers from the extra overhead of unlock messages. Thus, the queueing delay for data items is larger in 2-PL and degrades the performance in high contention situations. At low contention, the queueing delay is negligible, and the broadcasting overhead in the send-on-demand scheme causes the performance to be worse than locking. It has been observed in database systems that some data items are accessed more often than others. These data items are referred to as the hot data. The concurrent access of hot data items causes a performance bottleneck in that all the transactions accessing the hot data item have to be serially executed. The larger the service time of the hot data items, the worse the performance. A hot data simulation in which a single data item occurred in the access set of 20% of the transactions was carried out at two di erent data transmission rates, viz., 50 Mbps and 1 Gbps. The performance of a DDBS with locking and the send-on-demand protocol respectively was studied for K = 1, and all other conditions remaining the same as before. The results are provided in Figure 16, where it is 25

All Sites:

Sites Processing Queries:

Update claim queues as and when transactions are con rmed. If responsible for broadcasting a fragment of the database, append timestamp information to the data items as the updated data items arrive. Remove the update transaction informationfrom the claim queues as and when they depart by checking the timestamp information in the broadcast streams.

Sites Processing Updates:

Scan broadcast channels on which data items in the query access set are being broadcast. Read only those copies that are mutually consistent by checking the respective timestamp information. Execute the query.

Broadcast the entire access set (readset+writeset) of the update to all the sites. Wait for the following events: { All the data items in the access set arrive. { The \completion" message arrives from the transactions that hold a read-copy of a data item in the writeset. Execute the update. Send a copy of each updated data item (along with the time at which the update was executed) to the respective sites responsible for broadcasting them. If holding a read-copy of data item Dj , send a \completion" message to the next write in the claim queue for Dj . If holding the write-copy of Dj , send a read-copy of Dj to all sites which have a read entry in the claim queue before the next write; and send the write-copy to the next write.

Figure 14: The Integrated Send-on-Demand CC Algorithm clear that the send-on-demand protocol does better in a high contention scenario. At the lower data rate of 50 Mbps, the performance curves for the two protocols are fairly close, and this is because the data transmission time is a dominant factor. At even lower data rates (or in local area networks where the propagation delay is insigni cant), the performance of both protocols is about the same. As the data rate is increased, (and the e ect of the propagation delay dominates) send-on-demand outperforms locking. At a low transaction arrival rate, the average response time for locking is lower than that in the send-on-demand, and that is because of the initial broadcasting overhead in the send-on-demand. As the arrival rate is increased, the queueing delay for the transactions in the locking case dominates and consequently the response time increases.

26

1200 ...... : Locking ___ : Send-on-Demand 1000

700 50 Mbps K=1 20% update the same data item ..... : Locking ___ : Send-on-Demand

600

Transaction Response Time (msec)

Transaction Response Time (msec)

800

K=4 High Contention

K=1 Low Contention

500

400 1 Gbps

600

300

400

200

200

100

0 0

100

200

300

400

500

600

0 0

50

100

150

200

250

300

350

400

Total Arrival Rate to the System

Total Arrival Rate to the System

Figure 15: Comparison of send-ondemand and traditional locking for a write-only system

Figure 16: Hot data simulation in a write-only system

7 Conclusions and Future Work


In this paper, the issues that are deemed important by us in the design of distributed database systems in high speed networks have been stated. We have also demonstrated the inadequacy of conventional protocols in terms of giving improved performance as the transmission rate is increased. Finally, a new distributed concurrency control protocol has been described that is based on the availability of sophisticated technology and the high throughput datacycle database machine concept. As explained in previous sections, new protocols will have to be developed in order for DDBS to be adapted for broadband networks. It is important, however, to note that this change from low speed to high speed network distributed database systems cannot, and will not be possible overnight. While the software changes are not so di cult to handle, it is important to realize that there will be many hardware changes in database machines before they can be put on-line in a high speed environment. For instance, data lters to read data items on the y as they pass by the computer sites' read heads is not a standard procedure yet. Further, we propose to use the distributed datacycle scheme to handle reads in our architecture. In this case, there are multiple read-heads in a single machine that have to be coordinated quickly, and the hardware required is still not readily available. To realize any increase in the transaction throughput with a high speed network, the CPU processing bottleneck has to be resolved. The send-on-demand CC protocol is by no means an optimal protocol as far as maximizing the throughput of a DDBS in a high speed environment is concerned. It is simply one protocol that performs better than the standardized CC algorithms. The send-on-demand algorithm is 27

also propagation delay bound, i.e., its performance does not improve with the speed of the network after a certain threshold data rate. Further, at low data contention, the standard 2-PL CC protocol performs better than send-on-demand because of the overhead of the broadcast phase in the latter. Thus, a CC algorithm that is a hybrid of the send-on-demand and locking schemes might prove to be better, and is under active consideration. As part of the future work, the operation of the send-on-demand protocol in the face of network and computer site failures is being considered. The e ect of di erent data granularity, di erent data access distributions and multiple copies of data items on the performance of the send-on-demand protocol is also being looked at. Data security and network architecture issues in the context of this new protocol is also being studied.

Acknowledgements
The authors would like to thank the anonymous reviewers for their detailed comments on the original manuscript. They were of great assistance in improving the overall presentation of this paper.

A Analysis of DDBS with data dependent transactions


In this case, the data request for a data item can be sent out to the computer site where it resides, only after the previous data item has been received and read by the transaction initiation site. The identity of the next data item is unknown until the current data item is read and some computations done. Let k and Di denote the same random variables as in the above discussion. Now,
k k=X Di i=1

(21)

Again, we make the assumption that the random variables Di ; : : :; Dk are independent and identically distributed with the uniform distributions in Equations 2 and 5, as the case may be. Using the characteristic function of the p.d.f f ( ) denoted by (s), the characteristic function of k , k (s) may be calculated as below:
k (s) =

(s)]k

(22)

Now, the rst and second moment of k , viz., k and k ]2 may be calculated by standard procedures. As before, Equation 8 may be used to calculate X and X 2. The average response time of a query, R may be calculated as in Equation 12. The results of comparison of the datacycle and the traditional scheme for dependent-data transactions is provided in Fig. 17 with the same parameters as in the previous section. A comparison 28

1200 .... : Traditional Locking

1000 900

* : Simulation Points

1000

1.5 Gbps *
*

Transaction Response Time (msec)

1 Gbps 800

1.5 Gbps

2 Gbps

Transaction Response Time (msec)

800 700 600 500 400


* * *

1 Gbps

600

2 Gbps
*

400

300
* * * * * * * * * * * * * *

* * * * * * * *

200

200 100
* *

0 0

10

12

14

0 0

10

12

14

Transaction Arrival Rate at a Site

Transaction Arrival Rate at a Site

Figure 17: Comparison of datacycle and traditional locking for a read-only system with data dependent transactions: Ei = 0
1200 * : Simulation Points 1000 2 Gbps
*

Figure 18: Comparison of simulation and analysis of the datacycle scheme for a readonly system with data dependent transactions: Ei = 0

Transaction Response Time (msec)

800
*

600
*

400
* *

200
* * * * * *

0 0

10

12

Transaction Arrival Rate at a Site

Figure 19: Comparison of simulation and analysis of traditional locking for a readonly system with data dependent transactions: Ei = 0 of simulation and analysis for both cases is presented in Figures 18 and 19.

29

References
1] W. Luk, X. Wang, and F. Ling, \On the Communication Cost of Distributed Database Processing," in Proceedings of the International Conference on Distributed Computing Systems, pp. 528{535, 1988. 2] C. Date, An Introduction to Database Systems, vol. I. Addison-Wesley, 4 ed., 1986. 3] H. Garcia-Molina, R. J. Lipton, and J. Valdes, \A Massive Memory Machine," IEEE Transactions on Computers, vol. C-33, pp. 391{399, May 1984. 4] A. Ammann, M. Harahan, and R. Krishnamurthy, \Design of a Memory Resident DBMS," in Proceedings of the IEEE Spring Computer Conference COMPCON-85, (San Francisco, California), pp. 54{57, February 1985. 5] D. Dewitt, R. Katz, F. Olken, L. Shapiro, M. Stonebraker, and D. Wood, \Implementation Techniques for Main Memory Database Systems," in Proceedings of the 1984 ACM SIGMOD Conference, (Boston, Massachusetts), pp. 1{8, June 1984. 6] M. H. Eich, \MARS: The Design of a Main Memory Database Machine," in Proceedings of the International Workshop on Database Machines, pp. 468{481, October 1987. 7] M. H. Eich, \Main Memory Database Research Directions," in Proceedings of the Sixth International Workshop on Database Machines (H. Boral and P. Faudemay, eds.), (Deauville, France), pp. 251{268, Springer-Verlag, June 1989. 8] J. Gray and F. Putzolu, \The 5-minute Rule for Trading Memory for Disc Accesses and the 10-byte Rule for Trading Memory for CPU time," in Proceedings of the ACM SIGMOD Conference, (San Francisco, California), pp. 395{398, May 1987. 9] R. Hagmann, \A Crash Recovery Scheme for a Memory Resident Database System," IEEE Transactions on Computers, vol. C-35, pp. 839{843, September 1986. 10] T. J. Lehman and M. J. Carey, \A Recovery Algorithm for a High Performance Memory Resident Database System," in Proceedings of the ACM SIGMOD Conference, (San Francisco, California), pp. 104{117, May 1987. 11] K. Salem and H. Garcia-Molina, \System M: A Transaction Processing Testbed for Memory Resident Data," IEEE Transactions on Knowledge and Data Engineering, vol. 2, pp. 161{172, March 1990. 12] J. Gray, B. Good, D. Gawlick, P. Homan, and H. Sammer, \One Thousand Transactions per Second," in Proceedings of the IEEE Spring Computer Conference COMPCON-85, (San Francisco, California), pp. 96{101, February 1985. 13] S.-C. Shyu, Design and Performance Analysis of Locking Algorithms for Distributed Databases. PhD thesis, University of Southern California, Los Angeles, CA 90089, December 1989. 14] S.-C. Shyu and V. O. Li, \Performance Analysis of Static Locking in Distributed Database Systems," IEEE Transactions on Computers, vol. 39, pp. 741{751, June 1990. 30

15] A. Chen, D. Brill, M. Templeton, and C. Yu, \Distributed Query Processing in a Multiple Database System," IEEE Journal on Selected Areas in Communications, vol. 7, pp. 390{398, April 1989. 16] M. Templeton, D. Brill, S. Dao, E. Lund, P. Ward, A. Chen, and R. MacGregor, \Mermaid{ A Front-End to Distributed Heterogeneous Databases," Proceedings of the IEEE, vol. 75, pp. 695{708, May 1987. 17] C. Yu, K. Guh, D. Brill, and A. Chen, \Partitioning Relation for Parallel Processing in Fast Local Networks," in Proceedings of the IEEE 1986 Parallel Processing, pp. 1021{1028, 1986. 18] C. Yu, K. Guh, D. Brill, and A. Chen, \Partition Strategy for Distributed Query Processing in Fast Local Networks," IEEE Transactions on Software Engineering, vol. 15, pp. 780{793, June 1989. 19] C. Yu, K. Guh, C. Chang, C. Chen, M. Templeton, and D. Brill, \An Algorithm to Process Queries in a Fast Distributed Network," in Proceedings of the IEEE 1984 Real Time Systems Symposium, pp. 115{122, 1984. 20] C. Yu, K. Guh, W. Zhang, M. Templeton, and A. C. D. Brill, \Algorithms to Process Distributed Queries in Fast Local Networks," IEEE Transactions on Computers, vol. C-36, pp. 1153{1163, October 1987. 21] P. S. Yu, D. Cornell, D. M. Dias, and A. Thomasian, \On Coupling Partitioned Database Systems," in Proceedings of the 6th International Conference on Distributed Computing Systems, (Massachusetts), pp. 148{157, May 1986. 22] M. T. Ozsu and P. Valduriez, Principles of Distributed Database Systems. Prentice Hall, 1991. 23] J. D. Ullman, Principles of Database Systems. Computer Science Press, second ed., 1982. 24] T. Bowen, G. Gopal, G. Herman, and J. William Mans eld, \A Scale Database Architecture for Network Services," IEEE Communications Magazine, vol. 29, pp. 52{59, January 1991. 25] G. Herman and G. Gopal, \The Case for Orderly Sharing," in Springer Verlag Lecture Notes in Computer Science on High Performance Transaction Systems (D. Gawlick, M. Haynie, and A. Reuter, eds.), pp. 148{174, Springer-Verlag, 1989. 26] G. Herman, G. Gopal, K. Lee, and A. Weinrib, \The Datacycle Architecture for very High Throughput Database Systems," in Proceedings of the ACM, SIGMOD, pp. 97{103, 1987. 27] A. Weinrib and G. Gopal, \Decentralized Resource Allocation for Distributed Systems," in Proceedings of INFOCOM 87, (San Francisco), pp. 328{336, April 1987. 28] W. Cellary, E. Gelenbe, and T. Morzy, Concurrency Control in Distributed Database Systems. Studies in Computer Science and Arti cial Intelligence, Elsevier Science Publishers, 1988. 29] L. Kleinrock, Queueing Systems, vol. I: Theory. McGraw Hill, 1960. 30] J. Gray, R. Lorie, G. Putzolu, and I. Traiger, \Granularity of Locks and Degrees of Consistency in a Shared Data Base," in Modelling in Data Base Management Systems (G. Nijssen, ed.), pp. 365{394, Amsterdam: North-Holland, 1976. 31