Professional Documents
Culture Documents
1. Consider the join R ⊲⊳R.a=S.b S, given the following information about the relations to be joined. The cost
metric is the no. of page I/Os unless otherwise noted, and the cost of writing out the result is uniformly
ignored.
(a) What is the cost of joining R and S using a page-oriented simple nested loops join? What is the no. of
buffer pages required for this cost to remain unchanged?
(b) What is the cost of joining R and S using a block nested loops join? What is the no. of buffer pages
required for this cost to remain unchanged?
10,000
Cost = 10,000 + 200,000*⌈ 1002−2 ⌉ = 2,010,000. Min. no. of buffer pages = 1002.
(c) How many tuples does the join of R and S produce at most, and how many pages are required to store
the result of the join back to disk?
Since R.a is the primary key, any tuple in S can match at most one tuple in R. So, the max. no. of tuples
in the result is equal to the number of tuples in S which is 4,000,000.
The size of a tuple in the result will be approx. double the size of R. Thus, only 10 tuples can be stored
in a page. Storing 4,000,000 tuples at 10 per page would require 400,000 pages in the result.
(d) Would any of your answers change if you were told that R.a is a foreign key that refers to S.b? Explain.
It would contradict the statement that each R tuple joins with exactly 20 S tuples.
• Employees(eid: integer, ename: string, sal: integer, title: string, age: integers)
1
Suppose the data entries (leaf nodes) of the indexes are of the form < search key, record id >. We have the
following indexes: a hash index on eid, a B+ tree index on sal, a hash index on age, and a clustered B+ tree
index on < age, sal >. Each Employees record is 100 bytes long, and you can assume that each index data
entry is 20 bytes long. The Employees relation contains 10,000 pages. Each data page contains 20 records per
page.
(a) Consider each of the following selection conditions and assuming that the reduction factor for each
term that matches an index is 0.1, compute the cost of the most selective access path for retrieving all
Employees tuples that satisfy the condition:
i. sal > 100
Filescan is best with cost 10,000.
ii. age = 25
Clustered B+ tree. Cost = 2(lookup) + 10000*0.1(selectivity) + 10,000*0.2*0.1 = 1202.
iii. eid = 1000
Cost = 1.2 (lookup) + 1 = 2 or 3.
ii. age = 25
Clustered index on < age, sal >. Cost = 2 (lookup) + 10,000*0.1*0.4 = 402.
2
ii. age = 25
Clustered B+ index in an index only scan. Cost = 2 (lookup) + 10,000 *0.1 *0.4 = 2 + 400 = 402.
• Executives has attributes ename, title, dname, and address; all are string fields of the same length.
• The ename attribute is a candidate key.
• The relation contains 10,000 pages.
• There are 10 buffer pages.
Assume that only 10% of Executives tuples meet the selection condition.
(a) Suppose that a clustered B+ tree index on title is the only index available. What is the cost of the best
plan?
Cost = 2 (finding the first title) + 10,000*0.1 (getting 10% of data pages) + 2,500 * 0.1 (scanning the
index) = 1252.
(b) Suppose that a unclustered B+ tree index on title is the only index available. What is the cost of the best
plan?
Filescan. Cost = 10,000.
(c) Suppose that a clustered B+ tree index on ename is the only index available. What is the cost of the best
plan?
(d) Suppose that a clustered B+ tree index on address is the only index available. What is the cost of the
best plan?
Filescan. Cost = 10,000.
(e) Suppose that a clustered B+ tree index on < ename,title > is the only index available. What is the cost
of the best plan?
Scanning the entire index = 10,000 * 0.5 = 5,000.