Professional Documents
Culture Documents
An SQL query block is is smaller query that makes up a larger query. A query optimizer uses
blocks to break up larger SQL queries so that it can optimize these small blocks at a time. A
query block has one parameter for the select, from, grouby, having, etc operators.
4. If there are dense, unclustered (secondary) B+ tree indexes on both R.a and S.b, the join R
⨝a=bS could be processed by doing a sort-merge type of join—without doing any sorting—by
using these indexes. (a) Would this be a good idea if R and S each have only one tuple per page,
or would it be better to ignore the indexes and sort R and S? Explain. (b) What if R and S each
have many tuples per page? Again, explain.
5. Why does the System R optimizer consider only left-deep join trees? Give an example of a
plan that would not be considered because of this restriction.
It only considers left-deep join trees because it allows us to generate fully pipelined plans. This
means intermediate plans are not written to temporary files. A plan that would not be considered
is a plan where there is a join between table A and table B and this gets stored in memory then a
join between table C and table D and this gets stored in memory and then these two stored tables
get joined together. Since this situation doesn’t cause our plan to be pipelined it doesn’t get
considered.
1. What information about these relations does the query optimizer need to select a good query
execution plan for the given query?
To do a good query execution plan the query needs statistics about the data, such as what the size
of a table is, how many tuples it has, etc. This can be written by file size, tuple size, number of
pages. Also statistics about what the ratio of an attribute having a specific value amongst the
table is also very valuable.
2. What indexes might help process this query? Explain briefly about your choices.
Indexes on Price and City will be very valuable since these will immediately allow the optimizer
to find the ids of the tuples and from there the optimizer can do simple join. Often sifting through
the data to find the ids of a relation satisfying a certain selection is what takes the longest.
3. How does adding DISTINCT to the SELECT clause affect the plans produced?
Since there is no order by clause in this query the distinct will cause the data to get sorted since
this is how duplicates are removed in the simplest way. Another process could be based on
hashing. In this situation we will need to create a hash table. Nevertheless DISTINCT will add a
layer of complication to the query plans.
4. How does adding ORDER BY sname to the query affect the plans produced?
Order by is fairly simple as it will cause us to sort based on the attribute specified. Since order by
has nothing else it has to do, it's a fairly simple sorting.
5. How does adding GROUP BY sname to the query affect the plans produced?
Since group by is an aggregate operator it will impact the plans by causing the plans to add
sorting based on the group by attribute then it scan the relation formed from this and computing
the aggregate by the attribute. Essentially it creates an extra sort and an extra pass