F1 Slot

F1 Slot
1. [CO2]
SET A
Consider the below schema.
Sailors (sid, sname, rating, age)
Reserves ( sid, bid, day, rname)
Each tuple of Reserves is 50 bytes long, that a page can hold 200 Reserves tuples and
800 pages are used to store Reserves tuples.
Each tuple of Sailors is 20 bytes long, that a page can hold 500 Sailors tuples and 300
pages are used to store Sailors tuples.
They are stored in distributed DBMS with all Sailors stored at Chennai and all
Reserves at Bangalore.
Consider the query:
Select * from Sailors S, Reserves R where S.sid = R.sid.
Find the cost of answering this query using each of the following plane.
a. Compute the query at Chennai using page oriented nested loop joins. [3]
b. Compute the query at Bangalore using sort merge join. [3]
c. Compute the query at Hyderabad by moving both relations to Hyderabad and
using a sort merge join. [4]
Solution:
a. 300td+(300*800)(td+ts)
b. 800td+(300+800)(td+ts)
c. 300(td+ts)+800(td+ts)+3(300+800)td
SET B
Consider a parallel DBMS in which each relation is stored by horizontally partitioning its
tuples across all disks.
Employees(eid: integer, did: integer, sal: real)
Departments(did: integer, mgrid: integer, budget: integer)
The mgrid field of Departments is the eid of the manager. Each relation contains 20-byte
tuples, and the sal and budget fields both contain uniformly distributed values in the range
0 to 1,000,000. The Employees relation contains 100,000 pages, the Departments relation
contains 5,000 pages, and each processor has 100 buffer pages of 4,000 bytes each. The
cost of one page I/O is td, and the cost of shipping one page is ts; tuples are shipped in units
Page 1 of 6
of one page by waiting for a page to be filled before sending a message from processor i to
processor j. There are no indexes, and all joins that are local to a processor are carried out
using a sort-merge join. Assume that the relations are initially partitioned using a round-
robin algorithm and that there are 10 processors.
They are now stored in a distributed DBMS with all of Employees stored at Naples and all
of Departments stored at Berlin. There are no indexes on these relations. Consider the
query:
SELECT * FROM Employees E, Departments D WHERE E.eid = D.mgrid
The query is posed at Delhi, and you are told that only 1 percent of employees are
managers.
Find the cost of answering this query using each of the following plans:
1. Compute the query at Naples by shipping Departments to Naples; then ship the result to
Delhi. [5 Mark]
2. Compute the query at Berlin by shipping Employees to Berlin; then ship the result to
Delhi. [5 Mark]
Solution:
a. 5000(2td+ts)+3(100,000+5000)td+2000ts
b. 10,000(2td+ts)+3(100,000+5000)td+2000ts
Joining page:
1 page=4000 bytes
1page = 4000/20 = 200 tuples
100,000 pages, so 200*100,000 pages = 2,00,00,000 tuples of employee relation
1% of employees are managers so 2,00,000 tuples of join
Join = 20 bytes of employee +20 bytes of department = 40 bytes of join tuples
2,00,000*40 bytes = 8,000,000 bytes of join
No of pages of join tuples = 8,000,000/4000 = 2000 pages
2.
[CO2]
SET A
Consider a parallel DBMS in which each relation is stored by horizontally partitioning its
tuples across all disks.
Page 2 of 6
Patient (pid: integer, pname: char, did: integer, patient_bill: real)
Doctor(did: integer, dname: integer, salary: real, dept: char)
Each relation contains 20-byte tuples, and the patient_bill and salary fields both
contain uniformly distributed values in the range 0 to 1,000,000. The patient relation
contains 100,000 pages, the Doctor relation contains 5,000 pages, and each processor
has 100 buffer pages of 4,000 bytes each. Consider there are 10 processors and shared
nothing architecture is followed.
a. In order to find the patient who has paid the highest bill, what data partitioning
technique should be used? Justify. [5]
b. To display the number of patients treated in each department, suggest a joining
technique. [5]
Solution:
a. Range Partition
b.
SET B
2. Consider a parallel DBMS in which each relation is stored by horizontally partitioning
its tuples across all disks. 5 Marks
Employee (EmployeeID, EName, Salary, Department, Poistion, JoiningDate)
Page 3 of 6
Sports (EmployeeID, Sports).For Example to find the employees who have been
paid salary in the range 500 and 1,00,0.
a. In a range selection on a range-partitioned attribute, it is possible that only one
disk may need to be accessed. Describe the benefits and draw-backs of this
property.
b. What form of parallelism (interquery, Intraquery) is likely to be the most
important for each of the given query? 5 Marks
To display all the orders from the orders table issued by the salesman 'Paul Adam'.
SELECT *FROM orders WHERE salesman_id = (SELECT salesman_id
FROM salesman WHERE name='Paul Adam');
Prepare a list with salesman name, customer name and their cities for the salesmen
and customer who belongs to same city.
SELECT salesman.name AS "Salesman", customer.cust_name, customer.city
FROM salesman,customer WHERE salesman.city=customer.city;
Answer:
a. In few scenarios, all disks are not searched which gives data skewness on
certain disks, but on few queries directly data can be searched on few disks
which leads to speedup.
b. Both the queries follow intra query parallelism.
With a few large queries, intra-query parallelism is essential to get fast
response times. Given that there are large number of processors and disks,
only intra-operation parallelism can take advantage of the parallel hardware
– for queries typically have few operations, but each one needs to process a
large number of tuples.
3.
[CO1]
Page 4 of 6
SET A
Convert the below EER to relational model
SET B
Convert the below EER to relational model
Page 5 of 6
Person (ID, telephone, street, postcode, Town)
Page 6 of 6

F1 Slot - Cost of Join Queries Across Distributed Databases

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

F1 Slot - Cost of Join Queries Across Distributed Databases

Uploaded by

Copyright:

Available Formats

Employees(eid: integer, did: integer, sal: real)

Departments(did: integer, mgrid: integer, budget: integer)

SELECT * FROM Employees E, Departments D WHERE E.eid = D.mgrid

You might also like