You are on page 1of 41

Database Systems II

Query Optimization Techniques

Dr. Noha Nagy


What is the Problem?

• Problem: efficiently answering a given Query

Query Query
Processing

Result

2
Relational Algebra: Project Operator
• Produces table containing subset of columns of argument table
attribute list(relation)
• Example:
Person Name,Hobby(Person)

Select Name, Hoppy from Person

Id Name Address Hobby Name Hobby

1123 John 123 Main stamps John stamps


1133 John 123 Main coins John coins
5556 Mary 7 Lake Dr hiking Mary hiking
9876 Bart 5 Pine St stamps Bart stamps

3
Project Operator
• Example:
Person Name,Address(Person)
Id Name Address Hobby Name Address

1123 John 123 Main stamps John 123 Main


1133 John 123 Main coins Mary 7 Lake Dr
5556 Mary 7 Lake Dr hiking Bart 5 Pine St
9876 Bart 5 Pine St stamps

Result is a table (no duplicates)


Tuples are unique
4
Projection
• Consider the projection
SELECT DISTINCT ID, name FROM R
ID name
name sal DOB

• The implementation requires the following


– Remove unwanted columns (on-the-fly)
– Eliminate any duplicate tuples produces.
• This step is the difficult one!
• We will describe Sorting to cope with duplicate
elimination

5
 Algorithms
•  <attribute list>(R)
– Just remove unwanted attributes!

• Duplicate rows:
– No DISTINCT in query Sort → Eliminate
• No need to remove duplicates
– DISTINCT in query
• Key ∈ attribute list ➔ No duplicates
• Key ∉ attribute list ➔ Duplicate elimination

6
Sort → Eliminate
•Sort rows
•Remove repeated ones.

7
Sort → Eliminate
Sort → Eliminate
Sort tuples with all remaining tuples

DeptName Salary Address EmpName


DeptName Salary Address EmpName
OR 3000 Cairo Randa
CS 4000 Giza Nemin
CS 4000 Giza Nemin
T’ CS 4000 6 October Nemin T
IS 2000 Giza Randa
IS 2000 Cairo Nemin
CS 4000 Giza Nemin

8
Sort → Eliminate
Sort → Eliminate

DeptName Salary Address EmpName


DeptName Salary Address EmpName
CS 4000 Giza Nemin
CS 4000 Giza Nemin
CS 4000 Giza Nemin
T’ CS 4000 6 October Nemin T
IS 2000 Giza Randa
IS 2000 Cairo Nemin
OR 3000 Cairo Randa

9
Sort → Eliminate
Sort → Eliminate

DeptName Salary Address EmpName DeptName Salary Address EmpName

CS 4000 Giza Nemin


CS 4000 Giza Nemin
CS 4000 Giza Nemin
T’ Nemin
T
CS 4000 6 October
IS 2000 Giza Randa
IS 2000 Cairo Nemin
OR 3000 Cairo Randa

10
Sort → Eliminate

i:1
j:2

DeptName Salary Address EmpName


DeptName Salary Address EmpName
CS 4000 Giza Nemin
CS 4000 Giza Nemin
CS 4000 Giza Nemin
T’ CS 4000 6 October Nemin T
IS 2000 Giza Randa
IS 2000 Cairo Nemin
OR 3000 Cairo Randa

11
Sort → Eliminate
Sort → Eliminate

i:1
j:2

DeptName Salary Address EmpName DeptName Salary Address EmpName


CS 4000 Giza Nemin CS 4000 Giza Nemin

CS 4000 Giza Nemin


CS 4000 Giza Nemin
T’ CS 4000 6 October Nemin T
IS 2000 Giza Randa
IS 2000 Cairo Nemin
OR 3000 Cairo Randa

12
Sort → Eliminate
Sort → Eliminate

i:1
j:3

DeptName DeptID StudentID StudentName


DeptName Salary Address EmpName
CS 4000 Giza Nemin
CS 4000 Giza Nemin
CS 4000 Giza Nemin
CS 4000 Giza Nemin
T’ CS 4000 6 October Nemin T
IS 2000 Giza Randa
IS 2000 Cairo Nemin
OR 3000 Cairo Randa

13
Sort → Eliminate
Sort → Eliminate

i:1
j:4

DeptName Salary Address EmpName DeptName DeptID StudentID StudentName


CS 4000 Giza Nemin CS 4000 Giza Nemin
CS 4000 Giza Nemin
CS 4000 Giza Nemin
T’ CS 4000 6 October Nemin T
IS 2000 Giza Randa
IS 2000 Cairo Nemin
OR 3000 Cairo Randa

14
Sort → Eliminate
Sort → Eliminate

i:4
j:5

DeptName Salary Address EmpName DeptName DeptID StudentID StudentName


CS 4000 Giza Nemin CS 4000 Giza Nemin
CS 4000 Giza Nemin CS 4000 6 October Nermin
CS 4000 Giza Nemin
T’ CS 4000 6 October Nemin T
IS 2000 Giza Randa
IS 2000 Cairo Nemin
OR 3000 Cairo Randa

15
Sort → Eliminate
Sort → Eliminate

i:4
j:5

DeptName Salary Address EmpName DeptName Salary Address EmpName


CS 4000 Giza Nemin CS 4000 Giza Nemin
CS 4000 Giza Nemin CS 4000 6 October Nemin
CS 4000 Giza Nemin
T’ CS 4000 6 October Nemin T
IS 2000 Giza Randa
IS 2000 Cairo Nemin
OR 3000 Cairo Randa

16
Sort → Eliminate
Sort → Eliminate

i:4
j:5

DeptName Salary Address EmpName DeptName Salary Address EmpName


CS 4000 Giza Nemin CS 4000 Giza Nemin

CS 4000 Giza Nemin CS 4000 6 October Nemin

CS 4000 Giza Nemin


T’ CS 4000 6 October Nemin T
IS 2000 Giza Randa
IS 2000 Cairo Nemin
OR 3000 Cairo Randa

17
Sort → Eliminate
CS 4 20092300 Nemin
Sort → Eliminate

i:5
j:6

DeptName Salary Address EmpName DeptName Salary Address EmpName


CS 4000 Giza Nemin CS 4000 Giza Nemin

CS 4000 Giza Nemin CS 4000 6 October Nemin

CS 4000 Giza Nemin


T’ CS 4000 6 October Nemin T
IS 2000 Giza Randa
IS 2000 Cairo Nemin
OR 3000 Cairo Randa

18
Sort → Eliminate
Sort → Eliminate

i:5
j:6

And, so on…
DeptName Salary Address EmpName DeptName Salary Address EmpName
CS 4000 Giza Nemin CS 4000 Giza Nemin

CS 4000 Giza Nemin CS 4000 6 October Nemin

CS 4000 Giza Nemin IS 2000 Giza Randa


T’ CS 4000 6 October Nemin T
IS 2000 Giza Randa
IS 2000 Cairo Nemin
OR 3000 Cairo Randa

19
SET Operations
• Algorithm for SET operations
• Set operations:
• UNION, INTERSECTION, SET DIFFERENCE and CARTESIAN PRODUCT
• CARTESIAN PRODUCT of relations R and S
include all possible combinations of records
from R and S. The attribute of the result
include all attributes of R and S.
• Cost analysis of CARTESIAN PRODUCT
• If R has n records and j attributes and S has m records and k
attributes, the result relation will have n*m records and j+k
attributes.
• CARTESIAN PRODUCT operation is very
expensive and should be avoided if possible.

20
SET Operations
• Apply to type compatible relations
• Same number of attributes
• Names of attributes are the same in both
• Same attribute domains

• Tables:
Person (SSN, Name, Address, Hobby)
Professor (Id, Name, Office, Phone)
are not union compatible.
 Name (Person) and  Name (P rofessor)
are union compatible and
 Name (Person) -  Name (Professor)
makes sense.

21
Algorithms for SET Operations
• UNION
• Sort the two relations on the same attributes.
• Scan and merge both sorted files concurrently, whenever
the same tuple exists in both relations, only one is kept in
the merged results.
• INTERSECTION
• Sort the two relations on the same attributes.
• Scan and merge both sorted files concurrently, keep in the
merged results only those tuples that appear in both
relations.
• SET DIFFERENCE R-S
• Keep in the merged results only those tuples that appear in
relation R but not in relation S.

• Sorting-based

22
Sorting-based ∪ Algorithm

DeptName DeptID
Accounting 1
Research 2 DeptName DeptID
Accounting 1
HR 4
R

DeptName DeptID
Management 5
Logistics 7
Accounting 1
S R∪S

23
Sorting-based ∪ Algorithm

DeptName DeptID
Accounting 1
DeptName DeptID
Research 2
HR 4
Admin 9
R

DeptName DeptID
Accounting 1
Management 5
Logistics 7
S R∪S

24
Sorting-based ∪ Algorithm

DeptName DeptID
Accounting 1 DeptName DeptID
Research 2
HR 4
Admin 9
R

DeptName DeptID
Accounting 1
Management 5
Logistics 7
S R∪S

25
Sorting-based ∪ Algorithm

DeptName DeptID
Accounting 1 DeptName DeptID
Research 2
Accounting 1
HR 4
Admin 9
R

DeptName DeptID
Accounting 1
Management 5
Logistics 7
S R∪S

26
Sorting-based ∪ Algorithm

DeptName DeptID
Accounting 1 DeptName DeptID
Research 2
Accounting 1
HR 4
Research 2
Admin 9
R

DeptName DeptID
Accounting 1
Management 5
Logistics 7
S R∪S

27
Sorting-based ∪ Algorithm

DeptName DeptID
Accounting 1
DeptName DeptID
Research 2
Accounting 1
HR 4
Research 2
Admin 9
HR 4
R

DeptName DeptID
Accounting 1
Management 5
Logistics 7
S R∪S

28
Sorting-based ∪ Algorithm

DeptName DeptID
Accounting 1 DeptName DeptID
Research 2
Accounting 1
HR 4
Research 2
Admin 9 HR 4
R Management 5

DeptName DeptID
Accounting 1
Management 5
Logistics 7
S R∪S

29
Sorting-based ∪ Algorithm

DeptName DeptID
Accounting 1 DeptName DeptID
Research 2
Accounting 1
HR 4
Research 2
Admin 9 HR 4
R Management 5
Logistics 7
DeptName DeptID
Accounting 1
Management 5
Logistics 7
S R∪S

30
Sorting-based ∪ Algorithm

DeptName DeptID
Accounting 1 DeptName DeptID
Research 2
Accounting 1
HR 4
Research 2
Admin 9 HR 4
R Management 5
Logistics 7
DeptName DeptID Admin 9
Accounting 1
Management 5
Logistics 7
S R∪S

31
Sorting-based ∩ and - Algorithms

32
Summary
• Relational Algebra operators can be classified into three groups
[Selection- Projection- grouping – grouping- set operation- rename]
• Tuple at a Time Unary Operators
• Selection and Projection
• No need to bring entire relation into memory at one time
• Full Relation Unary operators
• Duplicate elimination and grouping
• Require seeing all or most of the tuples in memory at once
• Full Relation Binary Operators
• Set operators like union, intersection , difference, join and Cartesian products
• Requires seeing the tuples of both relations in memory

33
Aggregate Operators
• Functions that operate on sets:
– COUNT, SUM, AVG, MAX, MIN
• Produce numbers (not tables)
• Not part of relational algebra

• You needn’t the whole row


SELECT MAX(Salary)
FROM EMPLOYEE;
• Index
• Index scan
• No index and table sorted on the attribute we want
• the Min and Max is readily available
• No index and table is not sorted
• Table scan

34
Aggregate Operators with B-Tree
• MAX, MIN, SUM, AVERAGE, COUNT
26

6 12

42 51 62
1 2 4 7 8 13 15 18 25

27 29 45 46 48 53 55 60 64 70 90

39
Aggregate Operators with B-Tree
• MAX
• Traverse till the right-most index key
• MIN
• Traverse till the left-most index key
• SUM
• Visit all keys
• Average
• Visit all keys
• Count
• Is usually part of the catalog so it can be found directly

40
GROUP BY

41
GROUP BY
SELECT Dno, AVG (Salary)
FROM EMPLOYEE
GROUP BY Dno;

• Sorting
• Sort the relation on the grouping attributes
• Partition the relation into groups by the grouping attributes
• Apply the aggregate operators on the groups

Computation is complex

Clustering index → Already grouped

42
43
GROUP BY

• Sorting
• Sort the relation on the grouping attributes
• Partition the relation into groups by the grouping attributes
• Apply the aggregate operators on the groups

Dno Salary Dno Salary


Dno Salary
1 3000 1 3500
1 3000
1 4000 3 3333
5 4000
3 4000 5 3000
3 4000
3 4000
3 4000
3 2000
5 2000
5 2000
3 2000
5 4000
1 4000

44
45

You might also like