You are on page 1of 8

JOIN OPERATION

Let r and s are two relations. Let they are subjected to join
operation as represented by:

r r.A=s.B s,
where A and B are set of attributes or sets of attributes of
relations r and s.
Now the join operation between relations can achieved by
the following several techniques:
• Nested-Loop join .
• Block Nested–Loop join .
• Indexed Nested-Loop join.
• Merge join.
• Hash join.

 Nested-Loop join operation:-

The nested=loop join operation consists of a


pair of nested for loops. Figure 1 shows a simple
algorithm to compute the theta join,r θ s, of
two relations r and s.This algorithm is called the
nested-loop join algorithm.

For each tuple tr, in r do begin


For each tuple ts, in s do begin
Test pair (tr,ts ) to see if they satisfy the join condition θ
if they do,add tr . ts to the result.
End
End

Figure 1.
From the above figure , we see that :
 Relation
elation r is called the outer relation and relation s is
called the inner relation of the join.
 ts and tr are tuples, tr . ts denote tuple constructed by
concatenating the attribute values of the tuples.

Some of the important features of the nested –loop join


operation are as follows:
1. The nested-loop join operation requires no
indices,, and it can be used regardless of ehat the
join condition is.
2. The nested-loop join algorithm is expensive ,
since it examines every pair of tuples in the two
relations.

 Block Nested-Loop join:-

The block nested-loop join operation is a variant of


the nested-loop join where every block of the inner
relation is paired with every block of the outer
relation. Within each pair of blocks , every tuple in
one block is paired with every tuple on the other
block, to generate all pair of tuples. If the buffer is
too small to hold either relation entirely in the
memory, we can still obtain a major saving in block
accesses by processing the relation on a per block
basis, rather than per tuple basis. Figure 2, shows a
nested-loop join.
join.

For each block Br of r do begin


For each block Bs of s do begin
For each tuple tr in Br do begin
For each tuple ts is Bs do begin
Test pair(tr,ts)to see if they satisfy the join condition
If they do, add tr.ts to the result.
End
End
End
End.

Figure 2.

The primary difference between the nested-loop join and


the basic nested-loop join is that, in the worst case, each
block in the inner relation s, is read only once for each
block in the outer relation, instead of once for every tuple
in the outer relation.
The performance of the nested-loop join and the block
nested-loop join can be further improved by th following
techniques:-
i. If the join attributes in a natural join form a
key on the inner relation, then for each outer
relation tuple, the inner loop can terminate as
soon as the first match is found.
ii. If the memory has M blocks, we read in (M-2)
blocks of the outer relation at a time. While
reading each blocks of the inner relation we
join it with all (M-2) blocks of the outer
relation. This reduces the number of scans of
inner relation from br to [br/(M-2)], where br is
the number of blocks of the outer relation.
iii. We can scan the inner loop alternately forward
and backward.

 Indexed Nested-Loop join:-

This method is used with existing


indices, as well as with temporary indices
created for the sole purpose of evaluating
the join. For example, consider
depositor customer. From figure 1.
looking up the tuples in s, that will satisfy
the join conditions with a given tuple tr is
essentially a selection on s. Suppose that we
have a depositor tuple with customer_name
“John” .Then the relevant tuples in s are
those that satisfy the selection
“customer_name=John”.
The time cost of the indexed nested-
loop join can be computed as follows”

br( tT + tS ) + nr * c
where,
nr is the number of records in relation r.
c is the cost of a single section.
br denotes the number of blocks containing
records of r.
This cost formula indicates that it is generally the most
efficient way to use the one with fewer tuples as the outer
relation , if indices are available on both relations r and s.

 Merge join:-

This join algorithm is also called the sort-


merge-join algorithm. It can be used to compute natural
joins and equi joins. Let r(R) and s(S) be the relations
whose natural join is to be computed, and let R  S denote
their common attributesSuppose that both relations are
sorted on the attributes R  S. Then their join can be
computed by a process like the merge-sort algorithm.
Figure 3. shows the merge-join algorithm.

pr:= address of first tuple of r;


ps:= address of forst tuple of s;
while(ps ≠ null and pr ≠ null) do
begin
ts = tuple to which ps points;
Ss = {ts };
set ps to point to next tuple of s;
done:=false;
while(not done and ps ≠ null) do
begin
ts’ := tuple to which ps points;
if (ts’[j] = ts[j])
then begin
Ss := Ss  {ts’};
Set ps to point to next tuple of s;
End
Else done:= true;
End
tr := tuple to which pr points;
while (pr≠null and tr[j] < ts[j]) do
begin
set pr to point to next tuple of r;
tr := tuple to which pr points;
end
while(pr ≠ null and tr[j] = ts[j] ) do
begin
for each ts in Ss do
begin
add ts tr to result;
end
set pr to point to next tuple of r;
tr := tuple to which pr points;
end
end
Figure 3.

From the algorithm we see that “j” refers to the attributes


in R  S , and tr ts , where ts and tr are tuples that have
the same values for JoinAttrs, denotes the concatenation of
the attributes of the tuples, followed by projecting out
repeated attributes.The merge-join algorithm associates
one pointer with each relation. These pointers point
initially to the first tuple of the respective relations. As the
algorithm proceeds , the pointers move through the
relation.

 Hash Join :

The Hash-Join algo. is used to implement natural


joins and equi-joins.Here,
Here, a hash function h is used to
partition tuples of both relations. The basic idea in the
partitioning the tuples that :- the tuples of each relation
into the sets that have the same hash value on the join
attributes.
Assumptions : The following assumptions are made in the
hash join :
(i) h is a hash function mapping j
values to ( 0,1,….,nh), where, j
denotes the common attributes
of r and s in the natural join.
(ii) Hro,Hr1,….Hrnh denote partitions of
r tuples , which are initially
empty. Each tuple tr ∈ r is put in
partition Hri, where, i=h(tr[j])
(iii) Hso, Hs1,….,Hsnh denote partitions
of s tuples, each initially empty.
Each tuple ts∈ s is put in partition
His, where, i=h(ts[j])
Algorithm:The
The hash algorithm is proposed as detailed
below :
• r,s : are tuples satisfying join condition and have
the same value for the join attributes.
• When hashed to some value , say i , then r
tuple .has to be in Hri and the s tuple in Hsi ie r
tuples in Hri need only to be compared with s
tuples in His

/* partition s */
For each tuple ts in s do
Begin
i:=h(ts [j]);
[j]);
Hsi:= Hsi  {ts } ;
End
/*partition r*/
For each tuple tr in r do
Begin
i:=h( tr [j]);
H r := H r  { tr } ;
i i

End
/*perform join on each partition */
For i:= 0 to n do
h

Begin
Read H s and build an in-memory hash index on it.
i

For each tuple tr in H r do i

Begin
Probe the hash index on H s to locate all tuples t
i s

(such that) t [j]= tr [j]


s

For each matching tuple t in H s do s i

Begin
add tr t to the result
s
end
end
end

An important feature of hash join is that –it requires –


3(br+bs)+4nh block transfers. The overhead 4nh is usually
quite small compared to (br+bs) and hence can be ignored.

Hash partitioning of relations

You might also like