You are on page 1of 24

Query Optimization

Outline
Introduction
Steps in Cost-based query optimization- Query
Flow
Projection Example
Query Interaction in DBMS
Cost-based query Optimization: Algebraic
Expressions
Introduction
What is Query Optimization?
Suppose you were given a chance to
visit 15 pre-selected different cities
in Europe. The only constraint would
be Time
-> Would you have a plan to visit
the cities in any order?
Europe
Plan:
-> Place the 15 cities in different groups
based on their proximity to each other.
-> Start with one group and move on to
the next group.

Important point made over here is that


you would have visited the cities in a
more organized manner, and the Time
constraint mentioned earlier would have
been dealt with efficiently.
Query Optimization works in a similar way:
There can be many different ways to get an
answer from a given query. The result would be
same in all scenarios.

DBMS strive to process the query in the most


efficient way (in terms of Time) to produce the
answer.

Cost = Time needed to get all answers


Starting with System-R, most of the
commercial DBMSs use cost-based
optimizers.
The estimation should be accurate
and easy. Another important point is
the need for being logically
consistent because the least cost
plan will always be consistently low.
Steps in a Cost-based query
optimization

1. Parsing
2. Transformation
3. Implementation
4. Plan selection based on cost
estimates
Query Flow
SQL

Parser

Optimizer

Code
Generator/
Interpreter

Processor
Query Parser Verify validity of the SQL
statement. Translate query into an internal
structure using relational calculus.
Query Optimizer Find the best expression
from various different algebraic expressions.
Criteria used is Cheapness
Code Generator/Interpreter Make calls for
the Query processor as a result of the work done
by the optimizer.
Query Processor Execute the calls obtained
from the code generator.
Cost of physical plans includes processor
time and communication time. The most
important factor to consider is disk I/Os
because it is the most time consuming
action.
Some other costs associated are:
- Operations (joins, unions,
intersections).
- The order of operations.
Why?
Joins, unions, and intersections are
associative and commutative.
- Management of storage of
arguments and passing of it.

Factors mentioned above should be


limited and minimized when creating
the best physical plan.
Projection Example:
Projections produce a result tuple for every
argument tuple.
What is the change?
Change in the output size is the change in the
length of tuples

Lets take a relation R


Relation (20,000 tuples): R(a, b, c)
Each Tuple (190 bytes): header = 24 bytes, a = 8
bytes, b = 8 bytes, c = 150 bytes
Each Block (1024): header = 24 bytes
We can fit 5 tuples into 1 block
- 5 tuples * 190 bytes/tuple = 950 bytes
can fit into 1 block
- For 20,000 tuples, we would require
4,000 blocks (20,000 / 5 tuples per block
= 4,000

With a projection resulting in elimination of


column c (150 bytes), we could estimate
that each tuple would decrease to 40
bytes (190 150 bytes)
Now, the new estimate will be 25 tuples in
1 block.
- 25 tuples * 40 bytes/tuple = 1000 bytes
will be able to fit into 1 block
- With 20,000 tuples, the new estimate is
800 blocks (20,000 tuples / 25 tuples per
block = 800 blocks)

Result is reduction by a factor of 5


Query interaction in DBMS
How does a query interact with a
DBMS?
- Interactive users
- Embedded queries in programs
written in C, C++, etc.
What is the difference between
these two ?
Interactive Users:
- When there is an interactive user
query, the query goes through the
Query Parser, Query Optimizer,
Code Generator, and Query
Processor each time.
Embedded Query:
- When there is an embedded query,
the query does not have to through
the Query Parser, Query Optimizer,
Code Generator, and the Query
Processor each time.
- In an embedded query, the calls
generated by the code generator are
stored in the database. Each time
the query is reached within the
program at run-time, the Query
Processor invokes the stored calls in
the database.
- Optimization is independent in
embedded queries.
Cost-based query Optimization:
Algebraic Expressions
If we had the following query-

SELECT p.pname, d.dname


FROM Patients p, Doctors d
WHERE p.doctor = d.dname
AND d.dgender = M
projection

filter

join

Scan (Patients) Scan (Doctors)


Cost-based query Optimization :
Transformation
projection projection

filter join

join Filter

Scan (Patients) Scan (Doctors) Scan(Patients) Scan(Doctors)


Cost-based query Optimization:
Implementation
projection projection

filter hash join

natural join filter

Scan(Patients) Scan(Doctors) Scan(Patients) Scan(Doctors)


Cost-based query Optimization:
Plan selection based on costs
projection projection

filter hash join

natural join filter

Scan(Patients) Scan(Doctors) Scan(Patients) Scan(Doctors)

Estimated Costs Estimated Costs


= 100ms = 50ms

You might also like