You are on page 1of 5

Homework 10

Due date is Ross closing time on Thursday, January 3, 2013

Print this document and write each answer in the designated area
Name: Question 1 (20 points) You have to compute R 1 S 1 T , where B(R) = 50 and B(S) = B(T ) = 100. The buer size is 103 blocks. It is known that the size of the join of any two (of the three) relations is at least 300 blocks. In this question, you should consider only the three variants of block nested-loop join that were discussed in the exercise session this week (see QP2.2 Tirgul.ppt, Slides 1227). Which variant has the minimal I/O cost? Login: Student Number:

What is the optimal buer allocation of the variant with the minimal I/O cost?

What is the minimal I/O cost?

In this part, explain your answers to the rst two parts (i.e., why the variant and buer allocation you specied are optimal).

Question 2 (20 points) This question is also about computing R 1 S 1 T . However, now the buer size is only 62 blocks and you can use any join method (and not just block nested-loop join). The rest is the same as in the previous question. Describe how you would compute the expression most eciently.

What is the optimal buer allocation in your answer to the rst part?

What is the I/O cost of your answer to the rst part?

Question 3 (10 points) This question refers to Slides 1617 of QP1.3 Indexes.ppt. Consider a sparse index of a le with duplicate key values, but the sparse index has no repeated keys, as explained in Slides 1617. The rule for searching is given in Slide 17. This rule leads to a block of the le and then the search may proceed through a sequence of blocks. Give the most ecient rule for deciding whether to continue to the next block of the le or end the search.

Question 4 (20 points) You have to compute R(A, B) 1 S(A, C) using sort-merge join. Each one of R and S has 10,000 blocks. The buer size is 201 blocks. Attribute A of R has 80 dierent values (i.e., V (R, A) = 80) that are uniformly distributed. For S we have V (S, A) = 200; it is also known that the total number of records of S with the same value of A is at most twice the average. What is the value of P (which is dened on Slide 29 of QP2.1 Join.ppt)?

You have to use sort-merge join to compute R 1 S so that the I/O cost is the best possible in the worst case. What is the number of lists in each relation and what is the size of each list? (Hint: The answer is not necessarily as dened in class.)

Describe how the blocks of the buer are allocated.

What is the I/O cost?

Question 5 (10 points) This question is about Slides 2326 of QP2.3 Index Nested Loop Join.ppt. First, a summary of those slides. The analysis of Slide 25 does not use the standard formula (for the clustered case) of Slide 18. Instead, it does a more detailed analysis that exploits the statistics about the numbers of dierent values in columns A of R and S. In R there are 3

only 50 dierent values. Therefore, only a few blocks of S participate in the join (because S is clustered on A and has 5,000 dierent values in column A). In particular, only 50 blocks of S participate in the join. If we allocate 50 blocks in the buer to S, no block of S has to be read more than once, even if it is used by many tuples of R. So, the I/O cost is 60 (because we also have to read all of R). Slide 25 also gives the I/O cost for the worst case, namely, when we add the +1 in the formula of Slide 17. You have to solve the question on Slide 26. That is, how would you modify the index nested-loop join of Slide 15 so that the I/O cost will be the same as on Slide 25, but the buer allocation will be only 10 blocks to R, one to S and one to the output?

Question 6 (20 points) You have to compute R(A, B) 1 S(B, C), where B(R) = 1, 000, 000 and B(S) = 10, 000. The buer size is 1,002. Which methods are applicable in this case?

For each applicable method, write the I/O cost and the buer allocation.

For each inapplicable method, explain why it cannot be used.

Which is the best method in this case?

You might also like