You are on page 1of 2

1. In the context of query optimization, what is an SQL query block?

An SQL query block is is smaller query that makes up a larger query. A query optimizer uses
blocks to break up larger SQL queries so that it can optimize these small blocks at a time. A
query block has one parameter for the select, from, grouby, having, etc operators.

2. Define the term reduction factor.


Reduction factor is a term used to define the value/ratio by which the output of an SQL query
shrinks due to its WHERE clause. E.g. if a reduction factor is 2 then that must mean the output
goes from a size of 20 to a size of 10.

3. Describe a situation in which projection should precede selection in processing a project-


select query and describe a situation where the opposite processing order is better. (Assume that
duplicate elimination for projection is done via sorting.)
A projection should precede a selection if there is a join where there is a large amount of data
and our selection is not equivalency based but rather range based. In this situation our data
comes sorted and we can preform fast binary searches. A selection should precede projection in
the circumstance we’re doing an equivalency selection across unsorted data.

4. If there are dense, unclustered (secondary) B+ tree indexes on both R.a and S.b, the join R
⨝a=bS could be processed by doing a sort-merge type of join—without doing any sorting—by
using these indexes. (a) Would this be a good idea if R and S each have only one tuple per page,
or would it be better to ignore the indexes and sort R and S? Explain. (b) What if R and S each
have many tuples per page? Again, explain.

5. Why does the System R optimizer consider only left-deep join trees? Give an example of a
plan that would not be considered because of this restriction.
It only considers left-deep join trees because it allows us to generate fully pipelined plans. This
means intermediate plans are not written to temporary files. A plan that would not be considered
is a plan where there is a join between table A and table B and this gets stored in memory then a
join between table C and table D and this gets stored in memory and then these two stored tables
get joined together. Since this situation doesn’t cause our plan to be pipelined it doesn’t get
considered.

6. Explain the role of interesting orders in the System R optimizer.


An interesting order is data which has been sorted in some way by either ORDER BY attributes,
GROUP BY attributes, or join attributes of other joins. Since interesting order data come sorted
in a manner which may provide to be useful, we do not get rid of them immediately since they
may overall decrease query cost.
Problem 2
Consider the following relational schema:
Suppliers (sid: integer, sname: char(20), city: char(20) Supply(sid: integer, pid: integer)
Parts(pid: integer, pname: char(20), price: real)
Suppose we are given the following query:
SELECT S.sname, P.pname
FROM Suppliers S, Parts P, Supply Y
WHERE S.sid = Y.sid AND Y.pid = P.pid AND S.city = ‘Madison’ AND P.price £ 1000

1. What information about these relations does the query optimizer need to select a good query
execution plan for the given query?
To do a good query execution plan the query needs statistics about the data, such as what the size
of a table is, how many tuples it has, etc. This can be written by file size, tuple size, number of
pages. Also statistics about what the ratio of an attribute having a specific value amongst the
table is also very valuable.

2. What indexes might help process this query? Explain briefly about your choices.
Indexes on Price and City will be very valuable since these will immediately allow the optimizer
to find the ids of the tuples and from there the optimizer can do simple join. Often sifting through
the data to find the ids of a relation satisfying a certain selection is what takes the longest.

3. How does adding DISTINCT to the SELECT clause affect the plans produced?
Since there is no order by clause in this query the distinct will cause the data to get sorted since
this is how duplicates are removed in the simplest way. Another process could be based on
hashing. In this situation we will need to create a hash table. Nevertheless DISTINCT will add a
layer of complication to the query plans.

4. How does adding ORDER BY sname to the query affect the plans produced?
Order by is fairly simple as it will cause us to sort based on the attribute specified. Since order by
has nothing else it has to do, it's a fairly simple sorting.

5. How does adding GROUP BY sname to the query affect the plans produced?
Since group by is an aggregate operator it will impact the plans by causing the plans to add
sorting based on the group by attribute then it scan the relation formed from this and computing
the aggregate by the attribute. Essentially it creates an extra sort and an extra pass

You might also like