You are on page 1of 18

Query Processing

(With changes by P S Dhabe)

Database System Concepts, 5th Ed.


©Silberschatz, Korth and Sudarshan
See www.db-book.com for conditions on re-use
VIT Syllabus
 Query Processing: Steps, Algorithms for Selection (A1,A2,A3,A4), Join
Operation (any two);

Database System Concepts - 5th Edition, Aug 27, 2005. 13.2 ©Silberschatz, Korth and Sudarshan
Basic Steps in Query Processing

1. Parsing and translation


Some DBMS uses
2. Optimization Annotated parse-tree
here
3. Evaluation

Number of
tuples, size of
tuples

Database System Concepts - 5th Edition, Aug 27, 2005. 13.3 ©Silberschatz, Korth and Sudarshan
Basic Steps in Query Processing
(Cont.)
 Parsing and translation
 translate the query into its internal form. This is then translated into
relational algebra.
 Parser checks syntax, verifies relations
 Evaluation
 The query-execution engine takes a query-evaluation plan, executes
that plan, and returns the answers to the query.

Database System Concepts - 5th Edition, Aug 27, 2005. 13.4 ©Silberschatz, Korth and Sudarshan
Basic Steps in Query Processing :
Optimization
 A relational algebra expression may have many equivalent expressions
 E.g., balance2500(balance(account)) is equivalent to
balance(balance2500(account))
 Each relational algebra operation can be evaluated using one of several
different algorithms
 Correspondingly, a relational-algebra expression can be evaluated in
many ways.
 Annotated expression specifying detailed evaluation strategy is called an
evaluation-plan.
 E.g., can use an index on balance to find accounts with balance < 2500,
 or can perform complete relation scan and discard accounts with
balance  2500

Database System Concepts - 5th Edition, Aug 27, 2005. 13.5 ©Silberschatz, Korth and Sudarshan
Basic Steps: Optimization (Cont.)
 Query Optimization: Amongst all equivalent evaluation plans choose
the one with lowest cost.
 Cost is estimated using statistical information from the
database catalog
 e.g. number of tuples in each relation, size of tuples, etc.

Database System Concepts - 5th Edition, Aug 27, 2005. 13.6 ©Silberschatz, Korth and Sudarshan
Measures of Query Cost
 Cost is generally measured as total elapsed time for answering
query
 Many factors contribute to time cost
 disk accesses, CPU, or even network communication
(distributed/parallel databases)
 Typically disk access is the predominant cost, and is also
relatively easy to estimate. Measured by taking into account
 Number of seeks
(block access time) * average-seek-cost
 Number of blocks read * average-block-read-cost
 Number of blocks written * average-block-write-cost
 Cost to write a block is greater than cost to read a block
– data is read back after being written to ensure that
the write was successful

Database System Concepts - 5th Edition, Aug 27, 2005. 13.7 ©Silberschatz, Korth and Sudarshan
Measures of Query Cost (Cont.)
 For simplicity we just use the number of block transfers from disk and the
number of seeks as the cost measures
 tT – time to transfer one block
 tS – time for one seek
 Cost for b block transfers plus S seeks
b * tT + S * t S
 We ignore CPU costs for simplicity
 Real systems do take CPU cost into account
 We do not include cost to writing output to disk in our cost formulae
 Several algorithms can reduce disk IO by using extra buffer space
 Amount of real memory available to buffer depends on other concurrent
queries and OS processes, known only during execution
 We often use worst case estimates, assuming only the minimum
amount of memory needed for the operation is available
 Required data may be buffer resident already, avoiding disk I/O
 But hard to take into account for cost estimation

Database System Concepts - 5th Edition, Aug 27, 2005. 13.8 ©Silberschatz, Korth and Sudarshan
Algorithms of Selection Operation
 File scan – search algorithms that locate and retrieve records that
fulfill a selection condition.
 Algorithm A1 (linear search). Scan each file block and test all
records to see whether they satisfy the selection condition.
 Cost estimate = br block transfers + 1 seek
 br denotes number of blocks containing records from relation r
 If selection is on a key attribute, can stop on finding record
 average case cost = (br /2) block transfers + 1 seek ()
 Worst case cost = (br) block transfers + 1 seek ()
 Linear search can be applied regardless of
 selection condition or
 ordering of records in the file, or
 availability of indices

Database System Concepts - 5th Edition, Aug 27, 2005. 13.9 ©Silberschatz, Korth and Sudarshan
Algorithms of Selection Operation
(Cont.)
 A2 (binary search). Applicable if selection is an equality comparison on
the attribute on which file is ordered.
select salary from emp where emp_id=101
 Assume that the blocks of a relation are stored contiguously
 Cost estimate (number of disk blocks to be scanned):
 Wrost case, the number of blocks to be examined to find a block
with required record is log2(br)
 Each of these block need a seek and a transfer time
 cost of locating the first tuple by a binary search on the blocks
 log2(br) * (tT + tS)
 If there are multiple records satisfying selection
– Add transfer cost of the number of blocks containing records
that satisfy selection condition

Database System Concepts - 5th Edition, Aug 27, 2005. 13.10 ©Silberschatz, Korth and Sudarshan
Selections Using Indices
 Index scan – search algorithms that use an index
 selection condition must be on search-key of index.
 A3 (primary index on candidate key, equality). Retrieve a single record
that satisfies the corresponding equality condition
 If B+ Trees are used then we need hi (Height of Tree) IO operations to
locate a block and 1 IO to fetch that record = (hi + 1)
 Each of these (hi + 1) IO operations need a seek and a transfer, thus
 Cost = (hi + 1) * (tT + tS)

NOTE:-
primary index- File is stored in sequential order of the search key
Secondary index:- Order is other than sequential

Database System Concepts - 5th Edition, Aug 27, 2005. 13.11 ©Silberschatz, Korth and Sudarshan
Selections Using Indices
 A4 (primary index on nonkey, equality) Retrieve multiple records using
primary index defined on non-key attribute and selection condition
specifies equality on non-key attributes.
 Records will be on consecutive blocks
 Let b = number of blocks containing matching records
 Number of IOs required for locating the first block= hi * (tT + tS)
 One seek is needed to access that block=tS
 There are b blocks need to be transferred= tT * b

 Thus, total Cost = hi * (tT + tS) + tS + tT * b

Database System Concepts - 5th Edition, Aug 27, 2005. 13.12 ©Silberschatz, Korth and Sudarshan
Join Operation
 Several different algorithms to implement joins
 Nested-loop join
 Block nested-loop join
 Choice based on cost estimate
 Examples use the following information
 Number of records of customer: 10,000 depositor: 5000
 Number of blocks of customer: 400 depositor: 100

Database System Concepts - 5th Edition, Aug 27, 2005. 13.13 ©Silberschatz, Korth and Sudarshan
Nested-Loop Join

 To compute the theta join r  s (theta= join condition)


for each tuple tr in r do begin
for each tuple ts in s do begin
test pair (tr, ts) to see if they satisfy the join condition 
if they do, add tr • ts to the result.
end
end
 r is called the outer relation and s the inner relation of the join.
 Requires no indices and can be used with any kind of join condition.
 Expensive since it examines every pair of tuples in the two relations.

Database System Concepts - 5th Edition, Aug 27, 2005. 13.14 ©Silberschatz, Korth and Sudarshan
Nested-Loop Join (Cont.)
 Let nr and ns be the number of tuples in relations r and s, respectively.
 Let br and bs be the number of blocks in relations r and s, respectively.
 In the worst case, if there is enough memory only to hold one block of each
relation, the estimated cost is
(for each record in r we need complete scan on s)
block transfers= nr  bs + br (r is outer and s is inner relation)
 We need only one seek to scan inner relation since it can be read in a sequential manner,
thus number of seeks required for accessing s is = nr
 To read relation r we need seeks= b r
 Total seeks= nr + br

 If the smaller relation fits entirely in memory, use that as the inner relation.
 Reduces cost to br + bs block transfers and 2 seeks

Database System Concepts - 5th Edition, Aug 27, 2005. 13.15 ©Silberschatz, Korth and Sudarshan
Nested-Loop Join (Cont.)
 Assuming worst case memory availability (memory can hold one block of each
relation) cost estimate is
 with depositor as outer relation:
 nr  bs + br (r=5000, bs=400, br=100 )
 5000  400 + 100 = 2,000,100 block transfers,
 5000 + 100 = 5100 seeks
 with customer as the outer relation
 nr  bs + br (r=10000, bs=100, br=400 )
 10000  100 + 400 = 1,000,400 block transfers and 10,400 seeks
 Using tT – time to transfer one block=0.1 msec, and tS – time for
one seek=4 msec, this strategy works better

 If smaller relation (depositor) fits entirely in memory, the cost estimate will be
 br + bs =(400+100)=500 block transfers.
 Block nested-loops algorithm (next slide) is preferable.

Database System Concepts - 5th Edition, Aug 27, 2005. 13.16 ©Silberschatz, Korth and Sudarshan
Block Nested-Loop Join
 If the memory is too small to hold either relation entirely in
memory still we can obtain major savings, if relations are
processed on per-block basis rather than per record basis.
 Variant of nested-loop join in which every block of inner relation is
paired with every block of outer relation.
for each block Br of r do begin
for each block Bs of s do begin Per-Block
operation
for each tuple tr in Br do begin
for each tuple ts in Bs do begin
Check if (tr,ts) satisfy the join condition
if they do, add tr • ts to the result.
end
end
end
end

Database System Concepts - 5th Edition, Aug 27, 2005. 13.17 ©Silberschatz, Korth and Sudarshan
Block Nested-Loop Join (Cont.)
 Worst case estimate: br  bs + br block transfers + 2 * br seeks
 Each scan of inner relation requires 1 seek and scan of outer relation need one seek
per block, thus total 2 * br seeks needed
 Each block in the inner relation s is read once for each block in the outer
relation (instead of once for each tuple in the outer relation)
 Best case: (inner relation fits in memory)
br + bs block transfers + 2 seeks.

 with depositor as outer relation:


 b r  bs + b r (bs=400, br=100 )
 100  400 + 100 = 40,100 block transfers,
 2*br =2* + 100 = 200 seeks
(A considerable performance gain over Nested loop join)

Database System Concepts - 5th Edition, Aug 27, 2005. 13.18 ©Silberschatz, Korth and Sudarshan

You might also like