You are on page 1of 42

3130703

Database Management
Systems

Unit-5
Query Processing
and Optimization
Topics to be covered
• Overview (Query Processing)
• Measures of query cost
• Evaluation of expressions
• Query optimization
• Transformation of relational expressions
• Sorting and join

Unit – 5: Query Processing & Optimization 2


Query Processing

Unit – 5: Query Processing & Optimization 3


Query Processing
 Query Processing is the activity performed in extracting data from
the database.

 In query processing, it takes various steps for fetching the data


from the database.

 The steps involved are:


1. Parsing and translation
2. Optimization
3. Evaluation

Unit – 5: Query Processing & Optimization 4


Steps in Query Processing
The Scanner verifies
attribute name and relation Translator translates the
name and The Parser query into its internal
checks the syntax of query form (relational algebra)
Scanner, Relational algebra
Query Parser and expression
translator
Choose best execution plan
Optimizer
Execute the query-evaluation
plan and returns output
Evaluation
Query output Execution plan
engine

Database Catalog
Data Statistics about Data

Unit – 5: Query Processing & Optimization 5


Steps in Query Processing
1. Parsing and translation
 When a user executes any query, for generating the internal
form of the query, the parser in the system checks the syntax
of the query, verifies the name of the relation in the
database, the tuple, and finally the required attribute value .
 Further, it translate it into the form of relational algebra. 
 After translating the given query, we can execute each
relational algebra operation by using different algorithms. So,
in this way, a query processing begins its working.

Unit – 5: Query Processing & Optimization 6


Steps in Query Processing
2. Optimization
 A database system generates an efficient query evaluation
plan, which minimizes its cost. This type of task performed by
the database system and is known as Query Optimization.
 For optimizing a query, the query optimizer should have an
estimated cost analysis of each operation.
 It is because the overall operation cost depends on the
memory allocations to several operations, execution costs,
and so on.

Unit – 5: Query Processing & Optimization 7


Steps in Query Processing
3. Evaluation
 In order to fully evaluate a query, the system needs to
construct a query evaluation plan.
 The annotations in the evaluation plan may refer to the
algorithms to be used for the particular index or the specific
operations.
 The query evaluation plan is also referred to as the query
execution plan.
 A query execution engine is responsible for generating the
output of the given query. It takes the query execution plan,
executes it, and finally makes the output for the user query.

Unit – 5: Query Processing & Optimization 8


Query Cost

Unit – 5: Query Processing & Optimization 9


Measures of Query Cost
 Cost is generally measured as the total time required to execute a
statement/query.
 Factors contribute to time cost
1. Disk accesses (time to process a data request and retrieve the required
data from the storage device)
2. CPU time to execute a query
3. Network communication cost

Unit – 5: Query Processing & Optimization 10


Evaluation of expressions

Unit – 5: Query Processing & Optimization 11


Evaluation of expressions

 In the query processing system, we use two methods for


evaluating an expression carrying multiple operations. These
methods are:

1. Materialization
2. Pipelining

Unit – 5: Query Processing & Optimization 12


Materialization
 Evaluate one operation at a time, starting at the lowest-level.
 The intermediate result of each operation is materialized stored
in temporary relation and becomes input for next operations.
 The cost of materialization is the sum of the individual
operations plus the cost of writing the intermediate results to
disk.
 The problem with materialization is that
• it creates lots of temporary relations
• it performs lots of I/O operations

Unit – 5: Query Processing & Optimization 13


Materialization
 Expression may contain more than one operations, solving
expression will be difficult if it contains more than one expression.
 Cust_Name ( Balance<2500 (account) customer )
 To evaluate such expression we need to evaluate each operation
one by one in appropriate order.
3
Õ Cust_Name

2
Bottom to top
Execution

1
 Balance<2500 customer

account
Unit – 5: Query Processing & Optimization 14
Pipelining
 In pipelining, operations form a queue, and results are passed from
one operation to another as they are calculated.
 To reduce number of intermediate temporary relations, we pass
results of one operation to the next operation in the pipelines.
 Combining operations into a pipeline eliminates the cost of reading
and writing temporary relations.
 Pipelines can be executed in two ways:
1. Demand driven : In this method, the result of lower level queries are not
passed to the higher level automatically. It will be passed to higher level only
when it is requested by the higher level.
2. Producer driven : In this method, the lower level queries eagerly pass the
results to higher level queries. It does not wait for the higher level queries to
request for the results.

Unit – 5: Query Processing & Optimization 15


Comparison
Materialization Pipelining
It is a traditional approach to evaluate It is a modern approach to evaluate multiple
multiple operations. operations.

It uses temporary relations for storing the It does not use any temporary relations for
results of the evaluated operations. So, it storing the results of the evaluated
needs more temporary files and I/O. operations.

It is less efficient as it takes time to generate It is a more efficient way of query evaluation
the query results. as it quickly generates the results.

It does not have any higher requirements for It requires a high rate for generating
query evaluation. outputs.

The overall cost includes the cost of It optimizes the cost of query evaluation. As
operations plus the cost of reading and it does not include the cost of reading and
writing results on the temporary storage. writing the temporary storages.

Unit – 5: Query Processing & Optimization 16


Query Optimization

Unit – 5: Query Processing & Optimization 17


Query optimization
 It is a process of selecting the most efficient query evaluation
plan from the available possible plans.
 Cust_Name ( Balance<2500 (Account) Customer )

Efficient plan 2 records 4 records

 Cust_Name ( Balance<2500 (Account Customer ))

Customer 4 records Account 4 records


Cid Ano Cust_name Ano Balance
C01 A01 Raj A01 3000
C02 A02 Meet A02 1000
C03 A03 Harsh A03 2000
C04 A04 Punit A04 4000

Unit – 5: Query Processing & Optimization 18


Approaches to Query Optimization
1. Exhaustive Search Optimization
• Generates all possible query plans and then the best plan is selected.
• It provides best solution.
2. Heuristic Based Optimization
 Heuristic based optimization uses rule-based optimization
approaches for query optimization.
• Performs select and project operations before join operations. This is
done by moving the select and project operations down the query tree.
This reduces the number of tuples available for join.
• Avoid cross-product operation because they result in very large-sized
intermediate tables.
• This algorithms do not necessarily produce the best query plan.

Unit – 5: Query Processing & Optimization 19


Transformation of relational expressions
 Equivalence Rules:
 For implementing such a step, we use the equivalence rule that
describes the method to transform the generated expression into
a logically equivalent expression.
 The equivalence rule says that expressions of two forms are the
same or equivalent because both expressions produce the same
outputs.
 It means that we can possibly replace the expression of the first
form with that of the second form and replace the expression of
the second form with an expression of the first form. 

Unit – 5: Query Processing & Optimization 21


Transformation of relational expressions
 The optimizer uses various equivalence rules for describing each
rule, we will use the following symbols:

 θ, θ1, θ2 … : Used for denoting the predicates.


 L1, L2, L3 … : Used for denoting the list of attributes.
 E, E1, E2 …. : Represents the relational-algebra expressions.

 Let's discuss a number of equivalence rules:

Unit – 5: Query Processing & Optimization 22


Transformation of relational expressions
1. Combined selection operation can be divided into sequence of
individual selections. This transformation is called cascade of σ.
Customer Output
Cid Ano Cust_name Balance Cid Ano Cust_name Balance
C01 1 Raj 3000 C02 2 Meet 1000
C02 2 Meet 1000
C03 3 Harsh 2000
C04 4 Punit 4000

σ Ano<3 Λ Balance<2000 (Customer) = σ Ano<3 (σBalance<2000 (Customer))

σθ1Λθ2 (E) = σθ1(σθ2 (E))


Unit – 5: Query Processing & Optimization 23
Transformation of relational expressions
2. Selection operations are commutative.

Customer Output
Cid Ano Cust_name Balance Cid Ano Cust_name Balance
C01 1 Raj 3000 C02 2 Meet 1000
C02 2 Meet 1000
C03 3 Harsh 2000
C04 4 Punit 4000

σ Ano<3 (σBalance<2000 (Customer)) = σ Balance<2000 (σAno<3 (Customer))

σθ1(σθ2 (E) = σθ2(σθ1 (E))


Unit – 5: Query Processing & Optimization 24
Transformation of relational expressions
3. If more than one projection operation is used in expression then
only the outer projection operation is required. So skip all the
other inner projection operation.
Customer Output
Cid Ano Cust_name Balance Cust_name
C01 1 Raj 3000 Raj
C02 2 Meet 1000 Meet
C03 3 Harsh 2000 Harsh
C04 4 Punit 4000 Punit

∏Cust_name (∏Ano, Cust_name (Customer)) = ∏ Cust_name (Customer)

∏L1 (∏L2 (… (∏Ln (E))…)) = ∏L1 (E)


Unit – 5: Query Processing & Optimization 25
Transformation of relational expressions
4. Selection operation can be joined with cartesian product and
theta join.
Customer Balance Output
Cid Ano Cust_name Ano Balance Cid Ano Cust_name Balance
C01 1 Raj 1 3000 C01 1 Raj 3000
C02 2 Meet 2 1000 C02 2 Meet 1000
C03 3 Harsh 3 2000
C04 4 Punit 4 4000

σ Ano<3 (Customer Balance) = (Customer) σ Ano<3 (Balance)

σ θ (E1 E2) = E1 θ E2

σ θ1 (E1 θ2 E2) = E1 θ1Λθ2 E2


Unit – 5: Query Processing & Optimization 26
Transformation of relational expressions
5. Theta operations are commutative.

Customer Balance Output


Cid Ano Cust_name Ano Balance Cid Ano Cust_name Balance
C01 1 Raj 1 3000 C01 1 Raj 3000
C02 2 Meet 2 1000 C02 2 Meet 1000
C03 3 Harsh 3 2000
C04 4 Punit 4 4000

(Balance) σ Ano<3 (Customer) = (Customer) σAno<3 (Balance)

E1 θ E2 = E2 θ E1

Unit – 5: Query Processing & Optimization 27


Transformation of relational expressions
6. Natural join operations are associative.
(E1 E2) E3 = E1 (E2 E3)
7. Selection operation distributes over U, ∩ and –.
σ (E1 – E2) = σ (E1) – σ (E2)
θ θ θ

similarly selection operation is distributed for U and ∩ also.

Unit – 5: Query Processing & Optimization 28


Transformation of relational expressions
8. Set operations union and intersection are commutative.
set difference is not commutative Union Intersect
Customer Employee Output Output
Cust_name Emp_name Name Name
Raj Meet Raj Meet
Meet Suresh Meet
Suresh

Customer U Employee = Employee U Customer

Customer ∩ Employee = Employee ∩ Customer

E1 U E2 = E2 U E1
E1 ∩ E2 = E2 ∩ E1
Unit – 5: Query Processing & Optimization 29
Transformation of relational expressions
9. Set operations union and intersection are associative.
Union Intersect
Customer Employee Student Output Output
Cust_name Emp_name Emp_name Name Name
Raj Meet Raj Raj Meet
Meet Suresh Meet Meet
Suresh

(Customer U Employee) U Student = Customer U (Employee U Student)

(Customer ∩ Employee) ∩ Student = Customer ∩ (Employee ∩ Student)

(E1 U E2) U E3 = E1 U (E2 U E3)


(E1 ∩ E2) ∩ E2 = E1 ∩ (E2 ∩ E3)
Unit – 5: Query Processing & Optimization 30
Join Operation

Unit – 5: Query Processing & Optimization 31


Join Operations
 There are several different algorithms that can be used to
implement joins
1. Nested-Loop Join
2. Block Nested-Loop Join
3. Index Nested-Loop Join
4. Sort-Merge Join
5. Hash-Join

Unit – 5: Query Processing & Optimization 32


Nested-Loop Join
 In NLJ method, when we join the tables, one table is small and
another table is large.
 Small table will be consider as outer table and large table will be
inner table.
 All the rows from inner table will be compared one by one ,if
matched then include into output else skipped.
 It is very costly type of join.

Unit – 5: Query Processing & Optimization 33


Nested-Loop Join (Cont..)

Table A
[Inner Table] Table B Output
[Outer Table]
2 1
4
4
4
6 7
7 5
9 1
7
1

Unit – 5: Query Processing & Optimization 34


Block Nested-Loop Join
 BNLJ method selects block of records from both the tables and
compares the records in them. Hence the processing is done block
wise.
 Here, it traverses each block of records in a loop. We can observe
that, each of outer block reads inner table records only once. 
Where as in nested loop join each record in the outer table used
to read inner table records.
 Hence cost of this method has much better compared to nested
loop method.

Unit – 5: Query Processing & Optimization 35


Block Nested-Loop Join(Cont..)

R
S

Unit – 5: Query Processing & Optimization 36


Index Nested-Loop Join
 The last sentence of BNLJ above makes us to think what will
happen if we have index on the columns used in join condition.
When indexes are used, there is no sequential scan of records.
 When indexes are used on the columns that are used in join
condition, it will not scan each records of inner table. It will
directly fetch matching record.
 But we will have the cost for fetching the index in the index table. 

Unit – 5: Query Processing & Optimization 37


Sort-Merge Join
 Basic idea: First sort both relations on join attribute (if not already
sorted this way).
 Every pair with same value on join attribute must be matched.
 If no repeated join attribute values, each tuple needs to be read
only once. As a result, each block is read only once. Thus, the
number of block accesses is BR + BS

Unit – 5: Query Processing & Optimization 38


Sort-Merge Join (Cont..)

Unit – 5: Query Processing & Optimization 39


Hash Join
 The Hash Join algorithm is used to perform the natural join or
equi join operations.
 Using the hash function on the hash keys guarantees that any two
joining records must be in the same pair of files. 
 The main goal of using the hash function in the algorithm is to
reduce the number of comparisons and increase the efficiency to
complete the join operation on the relations.
 A hash function is used to partition tuples of both relations into
sets that have the same hash value on the join attribute

Unit – 5: Query Processing & Optimization 40


Cost of computing for all joins
 R is called the outer and S the inner relation of the join.
• Number of records of R: (NR)

• Number of records of S: (NS)

• Number of blocks of R: (BR)

• Number of blocks of S: (BS)


• c is the cost of a single selection on S using the join condition.
Join Worst Case Best Case
Nested-Loop Join BR + N R ∗ B S BR + B S
Block Nested-Loop Join BR + B R ∗ B S BR + B S
Index Nested-Loop Join BR + N R ∗ c
Merge Join BR + B S
Hash-Join 3 ∗ (BR + BS)

Unit – 5: Query Processing & Optimization 41


Questions asked in GTU
1. Explain Query Processing steps. OR Discuss various steps of query
processing with proper diagram OR Explain query evaluation
process.
2. Explain Heuristics in Optimization.
3. Explain steps of the query processing.
4. Write down the measures of finding out the cost of a query in
query processing.

Unit – 5: Query Processing & Optimization 42

You might also like