Professional Documents
Culture Documents
198
2. The Multiprocessor Computer System The machine uses conventional disk drives for
secondary storage and databases (relations) are stored
2.1. The Architecture on these disk storage devices. Both disks and memory
The multiprocessor systems we are concerned are are organized in fixed-size pages. Hence, the unit of
general purpose systems without any special-purpose transfer between the secondary storage and memory is a
hardware for database operations such as sorting of rela- page. The processors, disks and memory are linked by
tions. Ensbw summarized the salient characteristics of a an interconnection network. This network allows dif-
multiprocessor system as follows [Ens177 : ferent processors to read from the same page of the
shared memory at the same time (broadcasting). For
l the system has a set of general-purpose processors writes, different processors can only wriie to duerent
with identical capabilities, pages at the same time. We assume that a lodting
l all the i/O devices and I/O channels are shared by all mechanism is used to enforce this concurrent access
the processors, policy and the locking granularity is a page. We also
l all processors have accessed to a common pool of assume that there is no l/O cost associated with locking,
memory modules, that is, the lode table is assumed to be in main memory.
Under this mechanism, a processor has to obtain a lock
l the processors, the l/O devices and the shared on a memory page to which it intends to wriie. The lock
memory modules are connected by an interconnec- is released after data is written to the page. Since con-
tion network, current read is allowed, there is no need to lock a page if
l the operation of the entire system is controlled by the operation is a read operation. However, it is assumed
one operating system. that the interconnection network has sufficient bandwidth
Such a general multiprocessor organization is shown in for the tasks at hand and the contention for the intercon-
Figure 2.1. nection network is not considered in our following
analysis.
Finally, though it is expected that main memory sizes
Mass Storage Device of a gigabyte or more will be feasible and perhaps even
fairly common within the next decade, we cannot assume
........... that a whole relation can be read from mass storage to
either the processor’s local memory or the shared
memory before processing. That is, in general, both the
total memory of the processors and the size of the
shared memory are not large enough to contain a whole
Shared relation.
Interconnection Network
Memory
2.2. Concurrent write to the shared-memory
I
() (J ............ - One major issue in analyzing the performance of
multiprocessor computers with shared memory is the
possible contention that happens when more than one
Processors processor intends to write to the same memory page
concurrently. In the case of hash-based join algorithms,
there are two possible ways in which this might happen.
Figure 2.1. A multiprocessor computer system. First, when a global hash table is used to explore the
benefit of the shared memory, it is likely that different
The number of processors of such system is rela- processors may hash different tuples into the same page
tively small compared to some database machines that at the same time. Second, more than one processor
may consist of a few hundred or even thousands of pro- may output tuples of the same partition to the same
cessors [Tera83]. Each processor shares the common buffer page at the same time during the partitioning
memory (shared memory) with other processors. It may phase.
also have some buffers dedicated to itself (local memory) As we mentioned above, a locking mechanism is
for input/output. lt is reasonable to assume that, when a used to enforce our memory sharing policy. When a pro-
processor is assigned some task to execute, it is also cessor cannot obtain the lock for writing, it must wait.
allocated a certain amount of memory space. From the This implies that, if p processors should write to the some
view point of database operations, if a certain number of page at the sa+netime (i.e. with contention), the processing
processors is allocated to process a query, the control of cost (time) will be p times the processing cost without
these processors and related memory will be transferred contention, since the request would be queued for pro-
to the database management system. lt is up to its cessing serially.. Therefore, by letting the expected
buffer management subsystem to efficiently use the number of processors that write to the same page at the
available memory space. Such a multiprocessor archi- same time be 5, we have
tecture can provide both inter- and intra-request parallel-
ism. That is, the processors may either independently Proc, = Proc,x 5
execute different relational database operations, or exe- where Proc, and Proc, are CPU processing cost with
cute the same database operation at the same time. and without contention respectively.
199
To determine the expected concurrent writes, 5, we then presented.
formulate the problem as follows:
3.1. Categorization of hash-based join algorithms
GivenllRll trcplesandMmemorypages, (I <MS Given two relations, R and S, the basic approach of
IlW VP tupla (P 5 IIRII - IIWM) um ~~- hash-based join methods is to buikf a hash table on the
do& selected from the IIRI] tuples, jnd the join attributes for one relation, say R, and then to probe
expected number of pages with at least one tuples this hash table using the hash values on the join attri-
to be written to. butes of tuples from the other relation, S. The join result
is formed by the matching tuples. Since we assume that
This is none other than the problem of characterizing the memory available is usually much smaller than the size
number of granules (bloc&s) accessed by a transaction of relations to be joined, it is impossible to build the hash
[Yao77, Lang82]. The solution to the above problem is table for the entire relation R. The hash-based join algo-
given by Yao’s theorem [Yao77] which states that ‘the rithms !rsually process the join in batches. In each batch,
expected number of blocks hit is given by only a portion of R is read into memory and the
corresponding hash table is built. There are a few possi-
ble ways to form portions from relation R.
200
Kits83], the Hybrid algorithm [DeWi84], the Simple hash
and the Hash loops algorithms [DeWi85]. In the same
paper, they presented some simulation results on the
performance of the multiprocessor versions of the The elapsed time fi for phase i will in general be less
Hybrid and Grace algoriihms. than T&, + Th due to overlap. It can be explained as fol-
lows. In each phase, the processors can begin its pro-
3.2. The elapsed time versus total processing time cessing as soon as some pages from both relations are
In most of the previous work on analytical modeling in memory. Moreover, while the processors are comput-
of join algorithms [Bitt83 DeWi84, Vald84], the elapsed ing the join operation, the other pages may be read in at
time is used as a criteria to evaluate the performance of the same time. Hence, the elapsed time can be com-
an algorithm. In such cases, the best algorithm is that puted as the maximum of the above l/O time and CPU
which minimizes the elapsed time. Moreover, most time:
analysis assumed that there is no overlap in disk Ei=m (&uvTjO)
transfers and computations. The elapsed time is essen- (3)
tially the sum of the computation and disk transfers That is, if a phase is CPU-bound, the elapsed time
times. An exception is the work of Richardson, Lu and equals to the CPU time needed and if it is l/O-bound, the
Krishna [Rich871 which models overlap in computation, elapsed time equals to the VO time required. Here we
disk transfers and interconnection network transfers. assume that, for a CPU-bound phase, the time to read in
In our analysis, we like to emphasis two of our obser- the initial pages before the processing begins, and the
vations. First, in a multiprocessor or parallel processing time to write out the final pages of the resulting tuples are
environment, it is possible to increase parallelism by negligible compared to &u, while for an i/O-bound
duplicating part of computation among different proces- phase, the time of processing the last few pages are
sors. Some algorithms deliberately use this duplication negligible compared to Tjo. Since all phases of an algo-
to minimize the elapsed time. As the result, an algorithm rithm are executed in sequential, the elapsed time of an
that achieves minimum elapsed time may require high algorithm with n phases is
total processing time. This is different from what we
have in uniprocessor systems where shorter elapsed (4)
time means lesser total processing time. An obvious
implication of high total processing time is that more
resources are tied down to the particular task and hence 3.3. Algorithms and analysis
may decrease the system’s overall petformance. Hence, We analyzed four hash-based join algorithms for the
both the elapsed time and the total processing time are multiprocessor system described in section 2: Hybrid
important in choosing a suitable multiprocessor algo- HashJoin(HHJ), modified Hybrid HashJoin (MHHJ),
rithm. Second, the overlap between different resources Simple Hash Join (SHJ) and Hash-based Nested Loops
is quite important. The overlap should be taken into Join (HNW). HHJ and MHHJ are variations of multipro-
account because we should not only model real systems cessor Hybrid HashJoin algorithms with different
more closely but also understand the behavior of an memory allocation strategies during the partitioning
algorithm in more detail. It is highly desirable that an phase. They represent the algorithms that partition both
algorithm make good use of both CPU and disk relations before the join process. HNW was chosen as
resources. Especially for most parallel processing algo- the representative of algorithms that do not partition S.
rithms, some subtasks can be done in parallel but others For this group of algorithms, the process of probing the
have to be in sequential. Whether a subtask is CPU- hash tables using tuples from S is the same but both
bound or l/O-bound becomes an important factor in prior and on-fly partitioning require extra work to partition
determining the overall performance of an algorithm. R. It is therefore expected that the Hash-based Nested
WEththe above observations, both the elapsed time Loops performs the best among them. The Simple Hash
and the total processing time of an algorithm are was chosen as the representative of the algorithms that
analyzed in our study. The process of computing a join partition relations on-fly. In the following discussion, we
is divided into phuses that are executed one after another. only present the i/O and CPU times for each phase, Tb
Within each phase, there may be several passes of a and T&,, and the elapsed times, E, and E, and the total
series of operations. In other words, phases are exe- processing times, F’ and T, can be easily computed
cuted in sequential while within a phase different tasks using equations (1) - (4). The detailed derivation of the
can be either parallelized among processors or be over- results can be found in [LuSO]. The parameters and their
lapped among CPU and disks. To evaluate an algorithm, values used in our analysis are listed in Table 3.2.
we first compute the required disk I/O time per disk drive
and CPU time per processor for each phase i in execut- Hybrid Hash-Join
ing an algorithm, Tb and T&,. For a multiprocessor The Hybrid HashJoin algorithm [DeWi84] is a varia-
system with d disk drives and p processors, the total tion of the GRACE Hash-Join algorithm [Kits831 . Both
processing time for phase i is then algorithms comprises of two phases - the purtitiming
T)=pxTh+dxTb (1)
and the joining phase. The former phase divides the rela-
tions R and s into disjoint buckets such that the buckets
The total processing for the algorithm that requires n of relation R are of approximately equal size. The latter
phases to complete the computation is phase performs the join of the corresponding buckets of
201
join selectivity, defined by size (R JOIN S)/(lRI x ISI)
size of shared memory available for the join process (in pages)
size of relation R (in pages)
size of relation S (in pages)
number of tuples in relation R
number of tuples in relation S
number of tuples per page of relation R
number of tuples per page of relation S
number of disk drives
number of processors available for query processing
rtion of R that falls into partition R0 for Hybrid hash-join
R and S. The GRACE algorithm uses one page of the the number of buckets that relation R is partitioned into,
memory as a buffer for each bucket of R so that there where
are at most jMI buckets with each bucket containing
j f R f/f M I 1 pages. This means that each bucket requires
rF x I R j /I M I 1 pages of shared memory to construct a
hash table. Thus, the afgoriihm requires that
B=ma 0, FxlRI-IMI
[ i IMI -1
as given in [DeWi84]. The sizes of R. and Ri (1 pi
11 I; S)
are 1R,( = v and I Ri I = $-j- respectively.
This restriction on the minimum amount of memory The execution of the hybrid hash join can be divided
required is still necessary for the Hybrid Hash-Join algo- into four phases that are executed serially : (1) to parti-
riihm. However, whereas the GRACE algorithm parti- tion R, (2) to partition S, (3) to build the hash table for
tions the relations into jMj buckets, the Hybrid algoriihm tuples from R and (4) to probe the hash tables using
chooses the number of buckets such that the tuples in tuples from S and to form the join results. The required
each bucket will fit in the memory so as to exploit the disk 110 and CPU processing time for each phase, T/c
additional memory to begin joining the first two buckets. and T&, (I s i < 4) can be computed as fol1ows.f
The extra memory, if any, is used to build a hash table
for a partition of R that is processed at the same time
while R is being partitioned. The corresponding partition t In fact, for each phase, there are more than one batches. The
of S is used to probe the hash table while S is being par- formulas sum up the processing times of all betches in the same
phase. This !s the same for all subsequent analysis.
titioned. Hence, this partition is not rewriien back to disk
and processed again in the second phase. Let S+l be
202
Phase 1: To partiM relation R be 1 since there are no contention. However, an extra
cost was introduced to merge the unfilled pages together
T&= IRI xE/Ow+ IRI x(1-q)xEIo,, for each b&et before wriiing back to disk. In the worst
case, half the number of pages are moved for each
T&,=~x(&+thrh+~~-x~,) bucket for each processor. We have the following cost
formulas.
+ JIRll x(1-q) x (t,, + t- x 5,)
P Phase 1: To partition relation R
&me Cl= q x C(ll~II~ I MI - BnP)+V - 4 x C(llW~ BPP) Th= IRI xE/O-+ IRI x(1-q)xEK&,
%J=~ Rllx(‘-q)
p x(thpsh+t~xE;3) +9&?dlxb,
P
where 53 = C(JRII ‘i’ - ql, 1MI, p) +s.tvsx,
2 2 -
Phase 4: To join S with R The cost formulas for the third phase and the fourth
phase of MHHJ are exactly the same as those for HHJ
T,$=(l -q).(lSI + IResl)xEIO,w and are not repeated here.
Tzm =us” x ‘l - ‘) X (t,& + t/j&-m&) Hash-based Nested Loops Join Algorithm
P
The Hash-based Nested Loops Join algoriihm is a
+ lwsll x (1 - ti x fbuiM_,(*JB modified version of the traditional nested loops algorithm
P - hash tables are built on the join attributes of the outer
relation to efficiently find the match tuples. When the
Modified Hybrid Hash-Join available memory is smaller than the size of the outer
In the above Hybrid HashJoin algorithm, each relation, more than one pass is needed to complete the
bucket is allocated one buffer page during the partition- join. During each pass, only H = min( I R 1, F) pages
ing phase. When the number of buckets is relatively of the outer relation R are read into the memory and a
small to the number of processors, the contention due to hash table is constructed. Then the entire inner relation
conflicting writes to the buffer during the partitioning S is scanned and each tuple is used to probe the hash
phase results in long waiting time. The modified Hybrid table. The use of a hash table avoids the exhaustive
HashJoin algorithm (MHHJ) intends to reduce this con- scanning of all the R tuples in memory for every tuple in
tention by allocating more output buffer pages during the S as is done in the nested loops algorithm. Though the
partition phase. That is, p x B buffer pages are used for entire inner relation S is scanned at every pass of the
writing out the tuples in buckets Ri, (1 5 i < 8). In this algorithm, the entire relation R is scanned only once.
way there are no buffer contention in the partitioning
phase. The available memory for RO, however, The number of passes required, k, is given by
decreases and the value of B becomes
203
Phase 7: To read in R and to construct hash tables
+y x buikf&JB +-1 x ((k - 1) x IlSll
P
T&= IRI x NOW
- w x (H x rus xj-$)) x (0 t r-)
T&,,=~~(t~,,+r~,-,xt) where5=C(llRII,HxF,p)
204
. . Modified Hybd Hash-J&l
2.9
2.8
results of the MHHJ algorithm rather than both. better when the number of processors increases. This is
mainly because of the architectural assumptions of our
4.2. Relative performance model. In our system, an increase in the number of pro-
Figures 5.2 - 5.5 show the elapsed times and total cessors means both increase in the amount of memory
times of the Modified Hybrid HashJoin, the Hash-based available for the join operation and the CPU processing
Nested Loops and the Simple HashJoin algorithms with power. For The Modified Hybrid HashJoin, large amount
different amount of available memory, different relation of memory means large Rs and reduces the number of
sizes and different number of disks in the system. pages wriien back to the disks and reread during the
joining process. the number of disk I/O’s and thus
Among the three algorithms, the Simple Hash Join reduces both the elapsed time and the total processing
algorithm performs worst in most cases. When the time. For both the Simple Hash Join and Hash-based
number of processors increases and the amount of Nested Loops Join, larger memory size means lesser
memory available increases, its performance can be number of scans of the S relation. Another interesting
comparable with the other algorithms but it is always fact is that when the number of processors is small, it is
bounded by either the Hybrid Hash-Join or the Hash- very effective to introduce more processors to reduce the
based Nested Loops Join. The major reason is that the elapsed time. But after the number of processors
Simple Hash Join partitions the relations on-fly. It reads reaches some point, the performance gain from allocat-
and writes the relations repeatedly and thus incurs a ing more processors will not be so much.
large number of disk VOs. The Hybrii HashJoin also
partitions the relations but it passes through them less The total processing times show similar behaviors
than three times. As for the Hash-based Nested Loops with the exception of the Modified Hybrid HashJoin
Join, it has to scan relation S several times, but it only where the total processing time does not decrease a lot
scan R once. However there are some instances that when the number of processors increases. This is
SHJ performs a little better than MHHJ. This can be because the total processing time of MHHJ will only be
explained as follows: First, although the number of disk affected by the size of Rs. However, this increase is lim-
I/OS for SHJ may be higher, but all the disk I/OS are ited. For SHJ and HNW, the number of scans through s
sequential reads and writes. For MHHJ, some writes are is the major portion of the total processing time. With the
random accesses. The time needed for random disk I/O increase of memory, it will reduce the number of scans of
is 5/3 times that for sequential disk l/O in our analysis. S and hence improve the performance of the algorithms.
Second, there are p x 5 output buffer pages to be For HNLJ, if the increase in memory is not large enough
merged before writing out while there are only p pages in to reduce the number of scans of S, the performance will
SHJ case. not be affected at all. The staircase shape of the HNW
curves clearly indicates this. For example, both the
The performance of all the three algorithms, both for elapsed time and the total processing time for the HNW
the elapsed time and the total processing time, become does not change when the number of processors
205
c_ Modified Hybrid Hash-Join A-------A Hash-based Nested-Loop Join m .... ........l Simple Hash-Join
15.7
Ml=32Xp
IMI-32Xp :I
IsI=lRI=1ocm
13.7 : :i Isl=4xlRI=4ocm
: ::
d=6 : :, d=6
: i
: i
11.7
: .:,
: \.
: i
i :g
\: i 9.1
5.4a.
‘btd Time
Cloos)
5.3
: \ 4.3
: i
: \ 6.9
:
:
: ‘:
i
:
: i
: L
:A ‘\ 3.3
: 8..
:
A
“A
‘.A
=..m..
- A-.
‘A-A-A-A
“A-A 2:
NumkofFkocumm Numba of Roceuors
i 1 5 3 4 il i3 I’s i 4 3 ? 4 ii i3 I’5 i 9 4 3 4 ii 1’3 1’5
206
9.1
6.5
7.1
55
5.1
4.5
3.1
3.5
1.1 2.5
increases from 9 to 10 and from 11 to 14. During these the number of processors is small. If the total processing
two regions, the processing is l/O-bound. Thus increas- time is taken as the metrii. then MHHJ is the best algo-
ing the number of processors will not decrease the rithm in all cases when the number of processors is lim-
elapsed time. At the same time, the increase of memory ited.
size is not large enough to reduce the number of times In Figure 5.5, the number of disks in the system is
required to scan relation S. This also accounts for the increased to 12 from 6. This serves to represent the use
better performance of SHJ than HNW when relation S is of faster disks as well as slower processors. By compar-
much larger than relation FI. ing with Figure 5.2, we can see that the elapsed time in
While the performance of the Simple Hash Join is Figure 5.5 is smaller than that in Figure 5.2 for all the
bounded by the other two algorithms, the Modified Hybrid algorithms. However, the differences among them are
HashJoin and the Hash-based Nested Loops Join out- smaller. This indicates that the algorithms, on the whole,
perform each other depending on the sizes of the rela- are still slightly l/O bound with the testing parameters.
tions and the system configuration. We discuss the There are some phases that are not I/O bound and
elapsed time first. The set of results presented indicates hence the decrease of the elapsed time is not propor-
that HNW performs quite well with the exception of Fig- tional to the increase of the number of disks.
ure 5.4. The reason is that HNW has better overlap in
CPU and I/O processings when the number of proces- 4.3. Comparison to previous work
sors is small. Most of the costs comes from the second The performance of hash-based join algoriihms have
phase of the algorithm where the CPU and l/O costs are been discussed in several literatures, so we would like to
both high. On the other hand, for MHHJ, with a small compare the results of our study with two previous work
number of processors, the partitioning phase is I/O- - [DeWi85] and [Rich87]. Both work are chosen
bound while the joining phase is CPU-bound. The because overlap was considered in their studies. In
elapsed time is thus higher than HNW. When the [DeWi85], results of actual implementations of the
number of processors increases, the amount of memory uniprocessor version of the hash-based algorithms
available increases, HNLJ performs well since the [DeWi84] were presented. In (Rich87], the analytical
number of scannings of the entire S relation is reduced model considers overlap in disks, CPU and network
(as discussed above). Figure 5.4 shows that MHHJ per- transfers.
forms much batter than HNW when the size of relation S
is four times the size of relation R. This is expected The results of [DeWi85] shows that the Hash Loops
because HNW eliminates prior partitioning with the cost and the Hybrid Hash dominated the other algorithms and
of repeated scannings of S. This cost is clearly shown that the Hybrid Hash performs well in uniprocessor
in the results of the total processing time. Although environment. This is also the case in our study (see Fig-
HNW outperforms MHHJ with the elapsed time as the ure 5.3). Both results also show that when the size of
metric, it requires much more total processing time when the larger relation S is much larger than the size of the
207
smaller relation R, the modified Hybrid Hash outperforms unable to exploit the memory as it is supposed to do,
the Hash-based Nested Loops in most cases. In especially when the number of partitions is small. This is
[DeWi85], there is no implementations of the parallel ver- due to contention in such an environment. Our modified
sion of the Hashed Loops algorithm. However, our study version - Modified Hybrid Hash - eliminates the con-
shows that Hash-based Nested Loops can outperform tention by allocating one output buffer to each partition.
Hybrid Hash. This is because, in a multiprocessor However, this is done at the expense of higher elapsed
environment, the Hash-based Nested Loops is able to time. The simpler Hash-based Nested Loops performs
exploit the shared memory to parallelize its operation. better in elapsed times when the sizes of both relations
Moreover, in our study, we assume the CREW (con- are similar. However, when the size difference between
current read exclusive write) model of a parallel system. the two relations are widened, the Modified Hybrid Hash
Hence, there is no waiting when several processors read outperforms the other algorithms.
the same location in the hash table. From the study, we may draw the following conclu-
In [Rich87], the overlap in CPU, disk and network sions :
communications is considered. The Hybrid HashJoin Both the memory and the number of processors are
was one of the join algorithms proposed. We have important factors in a multiprocessor environment.
shown that when CPU processing is not the bottleneck, More memory reduces disk I/OS in hash-based join
increasing the number of processors does not reduce the algorithms while more processors facilitates parallel-
elapsed times. In fact, in our study, the reduction in ism. As such, a multiprocessor environment which
elapsed times of the Hybrid Hash algorithms are due to increases the memory whenever the number of pro-
the increase in memory as a resuft of an increase in the cessors allocated to an operation increases is desir-
number of processors. Similarly, when the disk I/O is the able.
bottleneck, by increasing the number of disks will reduce
the elapsed time. The same conclusions were arrived at Proper management of memory may reduce memory
in [Rich87]. However, in jRich87, it was shown that contention. The two versions of the Hybrid Hash Join
there is not a big difference in elapsed time with different proposed are two extreme strategies in allocating
memory sizes once the memory was large enough for buffers during the partitioning phase - one buffer
the algorithm to begin execution. This was so provided page per partition which may leads to high contention
the first phase of the Hybrid Hash was not I/O-bound. In and p buffer pages per partition where there is no con-
our study, memory size increases together with the tention. We may explore for a balance between these
number of processors and a large enough amount of two extremes.
memory is only available when the number of processors In a multiprocessor environment, the goal of improv-
are large too. As a result, the disk l/O becomes a ing the elapsed time and reducing the total process-
bottleneck. This results in the reduction in elapsed time ing time may conflict. An algorithm which performs
as memory increases. well with respect to the elapsed time may do so at
the expense of consuming more resources. ft seems
5. Conclusion that, using the elapsed time or the total processing
time as the only criteria of choosing an optimal join
In this paper, we have exploited the shared-memory method may not be sufficient for multiprocessor com-
of a generalized multiprocessor system to parallelize the puter systems. A more appropriate metric would
costly join operation in database query processing. Such need to combine the effect of both the elapsed time
a system comprises of conventional, commercially avail- and the total processing time in order to simplify the
able components without the assistance of any special- query optimization process. Intuitively, if both the
purpose hardware components. The system allows con- elapsed and total procession times of an algorithm
current read from but exclusive write to the shared- are the smallest among all algorithms under con-
memory. Any conflict in writing the shared-memory is sideration, then the algorithm should be selected.
regulated by a locking mechanism. Moreover, each pro- Similarly, when both the elapsed and total processing
cessor is allocated a fixed amount of shared-memory for times are the largest, the algorithm should not be
each operation. Thus, the amount of memory increases considered. However, when two algorithms conflict
as the number of processors increases. such that algoriihm A may have a lower elapsed time
The four parallel hash-based join algorithms but a higher total processing time than algoriihm B,
designed to be executed on such an environment - some metrics need to be defined to decide which one
Hybrid Hash, Modified Hybrid Hash, Hash-based Nested is better. One possible metric is the product of E and
Loops and Simple Hash - were studied. The perfor- T:
mance of the algorithms were modeled analytically, with
two key features, to determine the elapsed times. First, p=ExT=&
we model the overlap between the CPU and i/O opera-
tions of each algorithm when analyzing the elapsed The 8 value can be roughly interpreted as the ratio of
times. Second, we consider the contention when there is total processing cost and the system throughput (l/E)
a write conflict. and it satisfy the first two points of the above intuition.
Our study shows that the Hybrid HashJoin, which In fact, if 8 is used as a metric, the h4odified Hybrid
outperforms other hash-based algorithms in uniprocessor HashJoin outperforms the Hash-based Nested
environment, does not always performs the best. It is Loops in most instances.
208
The results of this paper can be extended in several [LOWSO] Low, c. c., D’Souza, A. J., and
directions. First, we may study the trade off between Montgomery, A. Y., “The influence of
elapsed time and total processing time in order to arrive Skew Distributions on Query Optimiza-
at a more appropriate performance metric for multipro- tion,” Proc. australia Research Database,
cessor computer systems. Second, a uniform distribu- Con& April 1990.
tion of the join attribute values is assumed in our
analysis. Some recent work [Laks88, LowsO] has indi- [ Lu85) Lu, H., and Carey, M. J., “Some Experi-
cated that skew distributions of join attribute values may mental Results on Distributed Join Algo-
significantly affect the performance of join algorithms. rithms in a Local Network,” Pm. VLDB 85,
One possible future work is to include this skew factor in Stockholm, Aug. 1985, pp. 292-304.
our analysis. Third, as an analytical analysis, it is very [Lu90] Lu, H., Tan, K-L, and Shan, M-C, “A Per-
difficult to model the effects of multiuser environment. formance Study of Hash-Based Join Algo-
Further study using simulation is planed to evaluate the n’thms in Tightly-Coupled Muiti@cessor
performance of those algorithms in the multiuser Systems,” National University of Singa-
environment. Furthermore, in addtion to hash-based join pore, Department of information Systems
methods, we would like to further explore other join and Computer Science Technical Report,
methods for multiprocessor computer systems, such as TRA6/90, June 1990.
those methods based on index [Meno86]. [Men0861 Menon, J., “Sorting and Join Algorithms for
Multiprocessor Database Machines,”
References NATO ASI Series, Vol. F 24, Database
[Bitt831 Bitton, D., et al., “Parallel Algorithms for Machines, Springer-Verlag, 1986.
the Execution of Relational Database [Qada88] Qadah, G. Z., and Irani, K. B., “The Join
Operations,” ACM Trans. Database Syst. , Algorithms on a Shared-Memory Multipro-
vol. 8, no. 3, Sept. 1983, pp. 324-353. cessor Database Machine,” IEEE Trans.
[ Brat841 Bratsbergsengen, K., “Hashing Methods Sojlware Eng . , vol. 14, no. 11, Nov. 1988,
and Relational Algebra Operations,” Pm. pp. 1668-l 683.
VLDB 84, Singapore, Aug. 1984, pp. 323- [Rich871 Richardson, J. P., Lu, H., and Mikkilineni,
333. K., “Design and Evaluation of Parallel
[DeWi84] Dewitt, D. J., et al.. “Implementation Tech- Pipelined Join Algorithms,” Proc. SIGMOD
niques for Main Memory Database Sys- 87, San Francisco, May 1987, pp.399-409.
tems,” Pm. SIGMOD 84, Boston, June [Schn89] Schneider, D. A. and DeWii, D. J., “A Per-
1984, pp. 1-8. formance Evaluation of Four Parallel Join
[DeWi85] Dewitt, D. J., and Gerber, R., “Multipro- Algorithms in a Shared-Nothing Mukipro-
cessor Hashed-Based Join Algorithms,” cessor Environment,” Proc. SIGMOD 89,
Proc. VLDB 85, Stockholm, Aug. 1985, pp. Portland, Oregon, June 1989, pp. 1lo-
151-164. 121.
[EnsIn] Enslow, P. H. Jr., “Multiprocessor Organi- W-VW Shapiro, L. D., “Join Processing in Data-
zation - A Survey,” ACM Computing Sw- base Systems with Large Main Memories,”
veys, Vol. 9, No. 1, March 1977, pp. 103- ACM Tram. Database Syst.. VOI. 11, NO. 3,
129. Sept. 1986, pp. 239-264.
[Good811 Goodman, J. R., “An Investigation of Mul- [Tera83] Teradata Corporation, DBC/1012 Data-
tiprocessor Structures and Algorithms for base Computer Concepts and Facilities,
Database Management,” University of Cal- Inglewood, CA, April 1983.
ifornia at Berkeley, Technical Report [Vald84] Valduriez, P., and Gardarin, G., “Join and
UCBIERL. M81/33, May 1981. Semijoin Algorithms for a Multiprocessor
[ Kiis83) Kitsuregawa, M., Tanaka, H., and Moto- Database Machine,” ACM Trans. Database
oka, T., “Application of Hash to Data Base Sysr., vol. 9, no. 1, March 1984, pp. 133-
Machine and its Architecture,” New Genera- 161.
ban Computing, Vol. 1, NO. 1, 1983, pp. [Yao77] Yao, S. B., “Approximating Block
63-74. Accesses in Database Oroanizations.”
[Laks88] Lakshmi, M. S. and Yu, P. S., “Effect of Comm.ACM, Vol. 20, No. 4, Gril 1977, pp.
Skew on Join Performance in Parallel 260-261.
Architectures,“. Proc. Intl. Symp. on data-
base in Parallel and Distributed Systems,
1988, pp. 107-120.
[ Lang821 Larger, A. M., and Shum, A.W., “The Dis-
tribution of Granule Accesses made by
Database Transactions,” Comm.ACM, Vol.
25, No. 11, Nov. 1982, pp. 831-832.
209