4.query Processing and Optimization

1
L .J. INSTITUTE OF ENGINEERING & TECHNOLOGY

GTU B.E. SEM – III (Computer Engineering)
Sub: - Database Management System (2130703)
Question Bank with Solution (Year – 2016)
Unit 5: Query Processing and Optimization
1. Write short note on query processing. [LJIET]May 2014 7

OR
Explain various steps involved in Query evaluation..[LJIET] Mar 2010
OR
Explain various steps of query evaluation.[LJIET]May 2011
Query processing refers to the range of activities involved in extracting data from a database.
The steps involved in processing a query appear in Figure. The basic steps are:
1. Parsing and translation
2. Optimization
3. Evaluation
1. Parsing and translation
Before query processing can begin, the system must translate the query into a usable form. A more
useful internal representation is one based on the extended relational algebra. The first action the
system must take in query processing is to translate a given query into its internal form. In generating
the internal form of the query, the parser checks the syntax of the user’s query, verifies that the relation
names appearing in the query are names of the relations in the database, and so on. The system
constructs a parse-tree representation of the query, which it then translates into a relational-algebra
expression.
Given a query, there are generally a variety of methods for computing the answer. For example, we
have seen that, in SQL, a query could be expressed in several different ways. Each SQL query can
itself be translated into a relational algebra expression in one of several ways. Furthermore, the
relational-algebra representation of a query specifies only partially how to evaluate a query; there are
usually several ways to evaluate relational-algebra expressions. As an illustration,
Prepared By: - Prof. Munira Topia, Computer Engineering Department, L. J. Institute of Engineering & Technology, Ahmedabad
2
consider the query:

select salary
from instructor
where salary < 75000;
This query can be translated into either of the following relational-algebra expressions:
• _salary <75000 (_salary (instructor))
• _salary (_salary<75000 (instructor))
Further, we can execute each relational-algebra operation by one of several different algorithms. For
example, to implement the preceding selection, we can search every tuple in instructor to find tuples
with salary less than 75000. If a B+-tree index is available on the attribute salary, we can use the index
instead tolocate the tuples.
3. Evaluation
A relational algebra operation annotated with instructions on how to evaluate it is called an evaluation
primitive. A sequence of primitive operations that can be used to evaluate a query is a query-execution
plan or query-evaluation plan. Figure below illustrates an evaluation plan for our example query, in
which a particular index (denoted in the figure as “index 1”) is specified for the selection operation.
The query-execution engine takes a query-evaluation plan, executes that plan, and returns the answers
to the query. The different evaluation plans for a given query can have different costs. The system to
construct a query evaluation plan that minimizes the cost of query evaluation; this task is called query
optimization.
2.Query Optimization
Query optimization is the process of selecting the most efficient query-evaluation plan from among the
many strategies usually possible for processing a given query, especially if the query is complex. We
do not expect users to write their queries so that they can be processed efficiently. Rather, we expect
the system to construct a query-evaluation plan that minimizes the cost of query evaluation
Consider the following relational-algebra expression, for the query “Find the names of all instructors in
the Music department together with the course title of all the courses that the instructors teach.”
The above expression constructs a large intermediate relation
3
However, we are interested in only a few tuples of this relation and only few attributes. Since we are
concerned with only those tuples in the instructor relation that pertain to the Music department, we do
not need to consider those tuples that do not have dept name = “Music”. By reducing the number of
tuples of the instructor relation that we need to access, we reduce the size of the intermediate result.
Our query is now represented by the relational-algebra expression:
Which is equivalent to our original algebra expression, but which generates smaller intermediate
relations.
Given a relational-algebra expression, it is the job of the query optimizer to come up with a query-
evaluation plan that computes the same result as the given expression, and is the least-costly way of
generating the result.
To find the least-costly query-evaluation plan, the optimizer needs to generate alternative plans that
produce the same result as the given expression, and to choose the least-costly one. Generation of
query-evaluation plans involves three steps:
(1) generating expressions that are logically equivalent to the given expression,
(2) annotating the resultant expressions in alternative ways to generate alternative query-evaluation
plans
(3) Estimating the cost of each evaluation plan, and choosing the one whose estimated cost is the least.
2. Explain query optimization process. [LJIET]Dec 2009 Dec 2014 4/2
OR
Explain Query Optimization Process. [LJIET] May 2012
OR
Explain method of query optimization[LJIET]May 2011
Write the same as given in “ Query Optimization” point no. 2 in above answer
3. Explain linear search and binary search algorithm for selection operation.[LJIET]Dec 2010 7
Consider selection operation on a relation whose tuples are stored together in one file. Two scan
algorithms to implement the selection operation are given below:
1) A1 – Linear Search:
In a linear search, the system scans each file block and tests all records to see whether they satisfy the
selection condition. For a selection on a key attribute, the system can terminate the scan if the required
record is found, without looking at the other records of the relation.
The cost of linear search, in terms of number of I/O operations, is br , where br denotes the number of
blocks in the files. Selection on key attributes has an average cost of br / 2 but still have a worst-case
cost of br.
The linear search algorithm can be applied to any file, but it is slower.
2) A2 – Binary Search:
If the file is ordered on an attribute, and the selection condition is an quality comparison on the
attribute, we can use a binary search to locate records that satisfy the selection. The system performs
the binary search on the block of the file.
The number of blocks that need to be examined to find a block containing the required records is
(log2(br)), where br the number of blocks in a file. If the selection is on nonkey attribute, more than one
block may contain required records, and the cost of reading the extra blocks ha to be added to the cost
estimate.
4. Explain evaluation of expression process in query optimization?[LJIET]Dec 2009 Dec 2013 4
To evaluate an expression is simply to evaluate one operation at a time, in an appropriate order. The result
4
of each evaluation is materialized in a temporary relation for subsequent use. A disadvantage to this
approach
is the need to construct the temporary relations, which must be written to disk. An alternative approach is
to evaluate several operations simultaneously in a pipeline, with the results of one operation passed on to
the next, without the need to store a temporary relation. Both the approaches used for evaluation of
expressions.
1. Materialization
Consider the expression:
The pictorial representation of material is given below
If we apply the materialization approach, we start from the lowest-level operations in the expression (at
the bottom of the tree). In our example, there is only one such operation: the selection operation on
department. The inputs to the lowest-level operations are relations in the database.We execute these
operations by the algorithms that we studied earlier, and we store the results in temporary relations. We
can use these temporary relations to execute the operations at the next level up in the tree, where the
inputs now are either temporary relations or relations stored in the database. In our example, the inputs
to the join are the instructor relation and the temporary relation created by the selection on department.
The join can now be evaluated, creating another temporary relation. By repeating the process,wewill
eventually evaluate the operation at the root of the tree, giving the final result of the expression. In our
example, we get the final result by executing the projection operation at the root of the tree, using as
input the temporary relation created by the join. Evaluation as just described is called materialized
evaluation, since the results of each intermediate operation are created (materialized) and then are used
for evaluation of the next-level operations.
The cost of evaluating an expression is the addition of the costs of all operations, as well as the cost of
writing the intermediate results to disk.
2. Pipelining
We can improve query-evaluation efficiency by reducing the number of temporary files that are
produced. We achieve this reduction by combining several relational operations into a pipeline of
operations, in which the results of one opFor example, consider the expression If
materialization were applied, evaluation would involve creating a temporary relation to hold the result
of the join, and then reading back in the result to perform the projection. These operations can be
combined: When the join operation generates a tuple of its result, it passes that tuple immediately to the
project operation for processing. By combining the join and the projection, we avoid creating the
intermediate result,and instead create the final result directly.
5. Explain the purpose of sorting with example with reference to query optimization [LJIET] May 2012 5
Sorting in DBMS is important for two reasons
1. Output can be displayed in sorted order
5
2. Several relational operations can be implemented efficiently if input relations are first sorted.
- We can sort a relation by building an index on the relation and then using that index to read he
relation in sorted order. This may lead to one disk block access for each tuple. This is expensive as
number of records can be much larger than the number of blocks. For this reason, it is desirable to
order the records physically.
- The sorting can be applied to the relations that fit in the main memory and to the relations that do
not fit in main memory.
- For relations that fit in memory, techniques like quick sort can be used. For relations that do not fit
in memory, external sort-merge is a good choice.
6. Explain the measures of query cost, selection operation and join. [LJIET] Jan 2013 7
Measure of query cost – as given in question no 8

Selection operation – as given in question no 3
Join - There are several algorithms for computing the join of relations.
Queries involving natural join may be processed in several ways depending on the availability of the
indices and the form of physical storage for the relations
- If the join results is almost as large as the Cartesian product of the two relations, a block nested loop
join may be advantageous.
- If indices are available, the indexed nested loop join can be used.
- If the relations are sorted a merge join may be desirable. It may be advantageous to sort a relation
prior to join computation.
- The hash join algorithm partitions the relation into several pieces, such that each piece of one of the
relations fits in memory. The partitioning is carried out with a hash function on the join attributes, so
that corresponding parts of partitions can be joined independently.
7. Explain the measures of finding out the cost of a query in query processing [LJIET]May 2013 5
The cost of query evaluation can be measured in terms of a number of different resources, including
disk accesses, CPU time to execute a query, and, in a distributed or parallel database system, the cost of
communication.
The response time for a query-evaluation plan could be used as good measure of the cost of the plan.
In large database systems, the cost to access data from disk is usually the most important cost, since
disk accesses are slow compared to in-memory operations. Moreover, CPU speeds have been
improving much faster than have disk speeds.
The number of block transfers from disk and the number of disk seeks to estimate the cost of a query-
evaluation plan. We also need to distinguish between reads and writes of blocks, since it takes more
time to write a block to disk than to read a block from disk. For more accurate measure find out
1. The number of seek operations performed

2. The number of blocks read
3. The number of blocks written
And then add up these numbers after multiplying them by average seek time, average transfer time for
reading a block and average transfer time for writing a block respectively.

4.query Processing and Optimization

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

4.query Processing and Optimization

Uploaded by

Copyright:

Available Formats

1

L .J. INSTITUTE OF ENGINEERING & TECHNOLOGY

1. Write short note on query processing. [LJIET]May 2014 7

1. Parsing and translation

consider the query:

The above expression constructs a large intermediate relation

The pictorial representation of material is given below

Measure of query cost – as given in question no 8

1. The number of seek operations performed

You might also like