You are on page 1of 17

Query Evaluation

Prepared by Harish Patnaik

School of Computer Engineering, KIIT Deemed to be University


Content

1. Query processing
2. Query cost measurement
3. Cost of Select, Join, Merge operation
4. Cost of Merge Sort operation
5. Cost of Duplicate elimination,Projection
6. Cost of Set operations
7. Cost of Outer join
8. Query Evaluation
Query Processing
• Steps for query processing -
ü Parsing and translation
ü Optimization
ü Evaluation
• Translation to internal form -
• checks syntax
• Verifies relation names
• Parse-tree representation
• Relational algebra expression
• select marks from student where marks > 80
╥ marks (σ marks >80 (Student))
σ marks >80 (╥ marks (Student))

• To implement - liner search


- Index search
• To evaluate - we need to provide annotation alongwith the
relational algebra expression
╥ marks

σ marks >80 ; use index1

Student (Query evaluation plan)


• Optimization -
– Different evaluation plans can have different costs.
– Query optimizer - most efficient evaluation plan based on cost
• Query cost measurement
• disk access
• CPU time to execute query
• Cost of communication - distributed database system
• Response time for a query evaluation plan includes all.
Mostly it indicates disk access time, ignores CPU time.
Query cost measurement
• Response time for a query which transfers ‘b’ blocks
of data from disk -
Total time=b *tT + S *tS
where b= no of blocks
tT = time to transfer a block of data
tS = average disk seek time
S = no of seek operations
Exceptions -
ü block write vs block read
ü all data transferred to buffer (large) in one operation.
No further seek operation required.
ü Block may be already present in memory
Cost of Select operation
Assumption -All tupples are stored together in one file
• Linear Search - Response time for a query which
transfers ‘b’ blocks of data from disk (if blocks are
stored contigously)-
Total time=b *tT + tS
where b= no of blocks
tT = time to transfer a block of data
tS = average disk seek time
• For sleection on key attributes, have an average transfer
cost of b/2 blocks (in worst case b block transfer)
• It is slower than other algorithms
• It can be applied to any file even without indices
Cost of Join Operation
Nested -Loop join
• R - outer relation S - inner relation
for each tuple tr in R do begin
for each tuple ts in S do begin
test pair (tr ,ts ) to check if they satisfy the join condition θ
if they do, add (tr .ts ) to the result
end
end
nr (br)- no of tupples(block) in R ns( bs )- no of tupples(block) in S
• For each tuple in R, we have to perfrom a complete scan on S.
• Worst case - if buffer can hold only one block of each relation, then
a total of nr * bs + br block transfer required
• One seek for each scan on relation S since it is read sequentially,
and br seeks to read R, then total no of seek is nr + br
Example - Let relations R1 nad R2 have the following
properties: R1 has 20000 tuples, R2 has 45000 tuples. 25
tuples of R1 fit in one block and 30 tuples of R2 fit in one
block. Estimate th no. of block transfers and seeks
required for the nested loop join of R1 and R2.
Cost of Merge Sort Operation
• As the input file is much large, the blocks (pages) are sorted
individually and then merged in sorted order.

3,4 6,2 9,4 8,7 5,6 3,1 2

3,4 2,6 4,9 7,8 5,6 1,3 2

2,3 4,6 4,7 8,9 1,3 5,6 2

2,3 4,4 6,7 8,9 1,2 3,5 6

1,2 2,3 3,4 4,5 6,6 7,8 9


Merge Sort
• Algorithm for Merge Sort
If the no. of pages in the input file is 2k then -
pass1 - 2k sorted runs of one page each
pass2 - 2k-1 sorted runs of two pages each
pass3 - 2k-2 sorted runs of four pages each
.....
pass k+1 - one sorted run of 2k pages
• No of passes = 4, no of pages=7
For each page we need 2 disk IOs.
Total IOs = 56
• Overall cost is 2N(log2 N+1)
where N= no of pages in the file
External Merge Sort
• When the input file does not fit into main memory-
• If the no. of pages in the input file is 108 and buffer of 5 pages
then -
pass1 -108/5 = 22 sorted runs of five pages each (except last)
pass2 - 4- way merge (one page for output) - 22/4 = 6 sorted runs
of twenty pages each
pass3 - 6/4=2 sorted runs; one with 80 pages & one wtih 28 pages
pass4 - merges two runs in sorted order
• No of passes = 4, no of pages=108
For each page we need 2 disk iOs.
Total IOs = 2 *108*4 = 864
• Overall cost is 2N([logB-1 N1]+1)
where N= no of pages in the file, N1= 108/5~=22
B= no of pages in buffer
Cost of Other operations
• Duplicate Elimination
üIt can be implemeted by sorting.
üWith external merge-sort, duplicates are found while
merging and creating runs. So duplicates are removed
before writing to disk and reducing no. of block transfers.
üWorst-case cost estimate for duplicate elimination is same
as worst-case cost estimate for sorting.
• Projection
üScan R and produce a set of tuples that contain the desired
attributes.
üSort this set of tuples.
üScan the sorted result and eliminate duplicates.
Cost of Set operations
• Union, Intersection and Set-difference operations - These can
be implemented by first sorting both the relations and then
scaning once through each of the sorted relations to produce
the result.
• For each of these operations, only one scan of the two input
relations is required. So the cost is br + bs block transfers if
the relations are sorted in the same order.
• Assuming a worst case of one block buffer for each relation,
additional a total of br + bs disk seek is required.
• If the relations are not sorted initially then the cost of sorting
is to be included.
Cost of Outer Join
• Compute the corresponding join and then add further tuples
to the join result
• For left outer-join operation of R(r) and S(s)
– Perform θ join of R and S and save the result as Q
– Compute R - ╥r (Q) to obtain those tuples of R that do
not participate in theta join
– Pad each of these tuples with null values for attributes
from S
– Add these tuples to Q
• Right outer-join is simiar to left outer-join
• For full outer-join, compute the join operation and then add
the extra tuples of both left and right outer join
Query Evaluation
• For evaluating an expression containing multiple
operations -
üPipelined evaluation - Result of one operation are
passed along to the next operation in the pipeline
üMaterialization - The result of each evaluation is
materialized in a temporary relation
• In Materialization, the cost of writing the intermediate
results to the disk are added to the cost of individual
operations
• In Pipelining , the cost of individual operations are
added

You might also like