Professional Documents
Culture Documents
Query Processing
Overview
Measures of Query Cost
Selection Operation
Sorting
Join Operation
Other Operations
Evaluation of Expressions
2
Basic Steps in Query Processing
3
Basic Steps in Query Processing (Cont.)
Evaluation
• The query-execution engine takes a query-evaluation
plan, executes that plan, and returns the answers to the
query.
4
Basic Steps in Query Processing:
Optimization
EQUIVALENT OF EXPRESSION?
Given schema
instructor (id, name, dept-name, salary)
Query: Write relational algebra expression to find id and salary of
all instructors whose id is less than 50501.
SQL: SELECT id, salary FROM INSTRUCTOR WHERE id<50501
Answer 1: id50501(id, salary(instructor))
equivalent to
Answer 2: id, salary(id50501(instructor))
6
Basic Steps: Optimization (Cont.)
7
Measures of Query Cost
Many factors contribute to time cost
• disk access, CPU, and network communication
Cost can be measured based on
• response time, i.e. total elapsed time for answering query, or
• total resource consumption
We use total resource consumption as cost metric
• Response time harder to estimate, and minimizing resource
consumption is a good idea in a shared database
We ignore CPU costs for simplicity
• Real systems do take CPU cost into account
• Network costs must be considered for parallel systems
We describe how to estimate the cost of each operation
• We do not include cost to writing output to disk
8
Measures of Query Cost
9
Measures of Query Cost
10
Selection Operation ()
File scan
Algorithm A1 (linear search). Scan each file block and test all
records to see whether they satisfy the selection condition.
Example: The size of the given student relation is 2000 blocks. The seek
time is 5ms and block transfer time is 0.1 ms. Find the estimated query
cost for the following query
SELECT * FROM STUDENT WHERE cgpa > 3.5 or name =
‘Rafique’
Given student(id, name, CGPA, street, city)
Answer: The query requires 1 seek and full file scan to find all students
with cgpa > 3.5. So br = 2000
11
Selection Operation ()
File scan
Algorithm A1 (linear search). Scan each file block and test all records
to see whether they satisfy the selection condition.
Question: The size of the given student relation is 2000 blocks. The seek
time is 5ms and block transfer time is 0.1 ms. Find the estimated query cost
for the following query
For partial scan, cost estimate (average) = 1 seek + (br /2) block transfers
12
Selections Using
Indices
13
Selections Using Indices
14
Index Search
Data Search
15
Selections Using Indices
16
Selections Using
Indices
Example: The relation student(id,
name, CGPA, street, city, NID) is
given. There is a clustering B+
tree index on city attribute of
student relation and there are 200
cities. The height of the index is 2
and there are 400 students living
in Uttara city. The record size is
400 byte and block size is 4KB.
Seek time = 10ms, block transfer
time = 0.2ms
Find the query cost for the
following query
SELECT * FROM STUDENT
WHERE CITY = ‘UTTARA’
17
Selections Using
Indices
Total cost =
Index cost + data cost
= 2 * (10+0.2) + 1 * 10 + 40 *
0.2
= 20.4 + 10+ 8
=
18
Selections Using Indices
A3 (clustering index, equality on nonkey) Retrieve multiple
records.
• Records will be on consecutive blocks
Let b = number of blocks containing matching records
• Cost = hi * (tT + tS) + tS + tT * b
19
Selections Using Indices
22
Selections Using Indices
26
Selections Involving Comparisons
Example: Given clustering index on id of Answer of Example:
student relation.
With Index ( No need to use
Q1. Find all students with id <= index)
1500001042. The id 1500001042 is the first Cost without index
tuple of student relation. Q1 cost
SELECT * FROM student WHERE id <= 1500001042 = index cost + data cost
= 0 + 1 * (10+0.2)
Q2. Find all students with id <= = 10.2ms
1800001042. The id 1800001042 is in 1200th
block of student relation. Q2 cost
SELECT * FROM student WHERE id <= 1800001042
= index cost + data cost
= 0 + 1 seek * 10 + 1200block
Find the query cost of Q1 and Q2. Here no * 0.2
need to use index (why?). Seek cost is =10 + 240
10ms and block transfer cost is 0.2 ms. =250ms
27
Selections Involving Comparisons
Example: Given clustering index on id of Answer of Example:
student relation.
With Index
Query cost
Q3. Find all students with id > 1800001042. = index cost + data cost
The id 1800001042 is in 1200th block of = 4 * (10+0.2) + 10+
student relation. (2000 – 1200) * (0.2)
SELECT * FROM student WHERE id > 1800001042 = 50.8 + 160
28
Selections Involving Comparisons
A6 (Secondary index, comparison).
For A V(r) use index to find first index entry v and scan index sequentially from
there, to find pointers to records.
For instructor relation, 12 blocks for 12 records.
Select * from instructor where salary >= 60000
Asssume index in memory,
Query Cost = 11 seek + 11 blocks
29
Selections Involving Comparisons
A6 (Secondary index, comparison).
For AV (r) just scan leaf pages of index finding pointers to records, till first entry > v
For instructor relation, 12 blocks for 12 records.
Select * from instructor where salary <= 87000
30
Selections Involving Comparisons
A6 (Secondary index, comparison).
For A V(r) use index to find first index entry v and scan index sequentially from
there, to find pointers to records. Index in memory.
For instructor relation, 12 blocks for 12 records. Q1:Select * from instructor where salary
>= 60000, Data Cost = 11 seek + 11 blocks, seek cost 10ms, block cost 0.1ms
For AV (r) just scan leaf pages of index finding pointers to records, till first entry > v
Q2:Select * from instructor where salary <= 87000, Data Cost = 9 seek + 9 blocks
Question: In either case, retrieve records that are pointed to, requires an I/O per record;
Linear file scan may be cheaper! Explain?
31
Implementation of Complex What is conjunctive selection?
Selections (12-11-20) Selection that uses a combinations of
Conjunction: 1 2. . . n(r) predicates
A7 (conjunctive selection using one index).
Given schema
Student(id, name, fname, mname, cgpa,
Select a combination of i and algorithms year-admit, division, district, thana,
A1 through A7 that results in the least cost street, house-number, city, NID, DOB,
for i (r). total-credit)
Test other conditions on tuple after
fetching it into memory buffer. Example: Find students with CGPA =
4 and city = Feni and year-admit =
A8 (conjunctive selection using composite
2019
index).
Use appropriate composite (multiple-key)
index if available. Algebra:
A9 (conjunctive selection by intersection of CGPA=4 city = Feni year-admit=2019(student)
identifiers).
1= (CGPA=4), 2= (City=‘Feni’),
Requires indices with record pointers.
3= (year-admit=2019),
Use corresponding index for each
condition, and take intersection of all the
obtained sets of record pointers.
Then fetch records from file 32
Implementation of Example Given B+ tree indices on CGPA, city
Complex Selections and year-admit. The height of the indices are
2, 2 and 1 respectively.
Conjunction: 1 2. . . n(r) SQL: SELECT * FROM student WHERE
A7 (conjunctive selection using one CGPA=4 AND City=‘Feni’ AND year-
index). admit=2019. tT = 0.2ms, tS = 10ms
Select a combination of i and
algorithms A1 through A7 that results in The number of tuples (pointers) for CGPA=4,
the least cost for i (r). City=‘Feni’ AND year-admit=2019 are 5, 20
Test other conditions on tuple after and 4000 respectively
fetching it into memory buffer. Find the least cost of the SQL using A7
algorithm.
Example: Find students with CGPA The indices are secondary index, We have to
= 4 and city = Feni and and year- use A4 algorithm.
admit = 2019 Query cost = index cost + data cost
= hi * (tT + tS) + n * (tT + tS)
Algebra:
CGPA=4 city = Feni year-admit=2019(r)
1= (CGPA=4), 2= (City=‘Feni’),
3= (year-admit=2019),
33
Implementation of Example Given B+ tree indices on CGPA, city
Complex Selections and year-admit. The height of the indices are
2, 2 and 1 respectively.
C1: Cost using CGPA index SQL: SELECT * FROM student WHERE
CGPA=4 AND City=‘Feni’ AND year-
n = 5,
admit=2019. tT = 0.2ms, tS = 10ms
cost = 2 * (10 + 0.2) + 5 * (10 + 0.2)
C2: Cost using city index The number of tuples (pointers) for CGPA=4,
City=‘Feni’ AND year-admit=2019 are 5, 20
n = 20 and 4000 respectively
cost = 2 * (10 + 0.2) + 20 * (10 + 0.2) Find the least cost of the SQL using A7
algorithm and analyze the result.
Cost using year-admit index
The indices are secondary index, We have to
n = 4000 use A4 algorithm.
Query cost = index cost + data cost
C3: cost = 2 * (10 + 0.2) + 4000 * (10 +
= hi * (tT + tS) + n * (tT + tS)
0.2)
Analysis:
34
Implementation of
Complex Selections
35
Implementation of Given B+ tree composite index on
(CGPA, city, year-admit).
Complex Selections Discussion 2: Explain how will this
query be executed using composite
Conjunction: 1 2. . . n(r) index?
A8 (conjunctive selection using composite
index). What is composite index?
Use appropriate composite (multiple-key) index Index on multiple attributes.
if available. composite index on
(CGPA, city, year-admit)
36
Implementation of Given B+ tree composite index on
(CGPA, city, year-admit).
Complex Selections Discussion 2: Explain how will this
query be executed using composite
Conjunction: 1 2. . . n(r) index?
A8 (conjunctive selection using composite
index). Example Find the cost of the SQL
Use appropriate composite (multiple-key) index using A8 algorithm.
if available. The height of the composite index is
4 and the number of tuples (pointers)
for CGPA=4, City=‘Feni’ AND year-
Example: Find students with CGPA = 4 and admit=2019 is 2.
city = Uttara and and year-admit = 2019
The composite index is a secondary
Algebra: index.
CGPA=4 city = Feni year-admit=2019(r)
1= (CGPA=4), 2= (City=‘Feni’), Query cost = index cost + data cost
3= (year-admit=2019),
= hi * (tT + tS) + n * (tT + tS)
SQL: SELECT * FROM student WHERE
CGPA=4 AND City=‘Feni’ AND year- Analyze A7 and A8
admit=2019
37
Implementation of Given B+ tree indices on CGPA, city
Complex Selections and year-admit.
Conjunction: 1 2. . . n(r) Discussion 3: Explain how will this
query be executed using algorithm
A9 (conjunctive selection by intersection of
A9?
identifiers).
Requires indices with record pointers. Example: Find the cost of the SQL
Use corresponding index for each condition, using A9 algorithm.
and take intersection of all the obtained sets of The height of the indices are 2, 2 and
record pointers. 1 respectively.
Then fetch records from file The number of tuples (pointers) for
CGPA=4, City=‘Feni’ AND year-
Example: Find students with CGPA = 4 and admit=2019 are 5, 20 and 4000
city = Feni and and year-admit = 2019 respectively
= 61.2ms
39
Question Given B+ tree indices on name, and total-credit.
The height of the indices are 4 and 2 respectively.
SQL: SELECT * FROM student WHERE name=‘Abdullah’
AND total-credit = 100
tT = 0.1ms, tS = 10ms
40
Algorithms for Disjunctive Selection
41
Algorithms for Example Find the cost of the SQL using A10
algorithm.
Complex Selections The height of the indices are 2, 2 and 1
respectively.
Disjunction:1 2 . . . n (r). The number of tuples (pointers) for CGPA=4,
A10 (disjunctive selection by City=‘Feni’ AND year-admit=2019 are 5, 20 and
union of identifiers). 4000 respectively
Applicable if all conditions The indices are secondary index.
have available indices.
Otherwise use linear
scan. Cost = Cost of (CGPA index + City index+year-
Use corresponding index for admit index) + ??? data
each condition, and take
union of all the obtained sets = 2 * (tT + tS) + 2 * (tT + tS) +1 * (tT + tS) +n(?) * (tT +
of record pointers. tS )
Then fetch records from file
= 5*10.2 + n(?) *10.2
42
Example Find the cost of the SQL using
A10 algorithm.
The height of the indices are 2, 2 and 1
respectively. 2 1
The number of tuples (pointers) for 6001 6001
CGPA=4, City=‘Feni’ AND year- 9002 6002
admit=2019 are 5, 20 and 4000 12001 12001
respectively 12002 16001
43
Question: Given B+ tree indices on name, and total-credit.
Total number of blocks is 2000. The height of the indices are
4 and 3 respectively.
SQL: SELECT * FROM student WHERE name=‘Abdullah’
OR total-credit = 100
tT = 0.1ms, tS = 10ms
44
Algorithms for Complex Selections
Negation: (r)
• Use linear scan on file
• If very few records satisfy , and an index is applicable to
Find satisfying records using index and fetch from file
45
Sorting
We may build an index on the relation, and then use the index
to read the relation in sorted order. May lead to one disk
block access for each tuple.
Discussion: The case when it May lead to one disk block access
for each tuple.
46
Example: External Sorting Using Sort-Merge
47
External Sort-Merge
Memory
(Run Creation)
Run Creation Let M denote memory size (in pages).
Here M = 3
1. Create sorted runs. Let i be 1 initially.
R1 Repeatedly do the following till the end
of the relation:
(a) Read M blocks of relation into
memory
(b) Sort the in-memory blocks
R2
(c) Write sorted data to run Ri;
increment i.
Here N = 4
2. Merge the runs (next slide)…..
R4
48
External Sort-Merge
a 19
b 14
Input Buffer (Sort-Merge)
Output Merge the runs (2-way merge). Here
a 19
Buffer
Merge pass1 N > M. M=3, N=4
Use 2 blocks of memory to buffer input
runs, and 1 block to buffer output. Read the
R1 first block of each run into its buffer page
repeat
Select the first record (in sort order)
among all buffer pages
R2
Write the record to the output buffer. If
the output buffer is full write it to disk.
Delete the record from its input buffer
R3 page.
If the buffer page becomes empty
then
read the next block (if any) of the run
R4 into the buffer.
until all input buffer pages are empty:
49
External Sort-Merge
d 31
b 14
Memory
(Sort Merge)
Merge the runs (2-way merge). Here
b 14
Merge pass1 N > M. M=3, N=2
Use 2 blocks of memory to buffer input
runs, and 1 block to buffer output. Read the
R1 first block of each run into its buffer page
repeat
Select the first record (in sort order)
among all buffer pages
R2
Write the record to the output buffer. If
the output buffer is full write it to disk.
Delete the record from its input buffer
R3 page.
If the buffer page becomes empty
then
read the next block (if any) of the run
R4 into the buffer.
until all input buffer pages are empty:
50
External Sort-Merge
d 31
b 14
Memory
(Algorithm Analysis)
Merge pass2
If N M, several merge passes are
b 14 required.
Merge pass1
Here, it is 2
In each pass, contiguous groups of M - 1
R1 runs are merged.
A pass reduces the number of runs by a
factor of M -1, and creates runs longer by
the same factor.
R2 E.g. If M=11, and there are 90 runs,
one pass reduces the number of runs
to 9, each 10 times the size of the
initial runs
R3
Repeated passes are performed till all
runs have been merged into one.
R4
51
External Sort-Merge
d 31
b 14
Memory
(Algorithm Analysis)
Merge pass2
b 14
Merge pass1
Question
Find the number of runs and number of
passes for a relation with 2200 blocks.
R1 Memory size is 11.
R4
52
External Sort-Merge
d 31
b 14
Memory
(Algorithm Analysis)
Merge pass2
b 14
Merge pass1
Cost analysis (Total number of passes)
Total number of blocks of the relation is br
R1 Size of the memory is M
Total number of runs = (br/M)
After merge pass 1, number of runs =
(br/M) /(M-1)
R2
After merge pass 2, number of runs
= (br/M) /(M-1) (M-1)
= (br/M) /(M-1) 2
R3
R4
53
External Sort-Merge
d 31
b 14
Memory
(Algorithm Analysis)
Merge pass2
b 14 Cost analysis (Total number of block
Merge pass1
transfer)
Block transfers for initial run creation as well
R1
as in each pass is 2br
for final pass, we don’t count write cost
since the output of an operation may be sent
to the parent operation
R2 Block transfer for run creation
= br for read+ br for write = 2 br
Block transfers for merging
R3 = 2 br * log (M–1)(br/M)
BT for run and merge (B_RM) = 2 br + 2 br *
log (M–1)(br/M)
Final merge, no write.
R4
So Net block transfer = B_RM - br
= 2 br + 2 br * log (M–1)(br/M) - br
= br * (2 * log (M–1)(br/M) +1)
54
External Sort-Merge
d 31
b 14
Memory
(Algorithm Analysis)
Merge pass2
b 14 Cost analysis (Total number of seek)
Merge pass1
55