DBMS Unit-5

3130703
Database Management
Systems
Unit-5
Query Processing
and Optimization
Topics to be covered
• Overview (Query Processing)
• Measures of query cost
• Evaluation of expressions
• Query optimization
• Transformation of relational expressions
• Sorting and join
Unit – 5: Query Processing & Optimization 2

Query Processing

Query Processing
 Query Processing is the activity performed in extracting data from
the database.
 In query processing, it takes various steps for fetching the data

from the database.
 The steps involved are:

1. Parsing and translation
2. Optimization
3. Evaluation

Steps in Query Processing
The Scanner verifies
attribute name and relation Translator translates the
name and The Parser query into its internal
checks the syntax of query form (relational algebra)
Scanner, Relational algebra
Query Parser and expression
translator
Choose best execution plan
Optimizer
Execute the query-evaluation
plan and returns output
Evaluation
Query output Execution plan
engine
Database Catalog
Data Statistics about Data

1. Parsing and translation
 When a user executes any query, for generating the internal
form of the query, the parser in the system checks the syntax
of the query, verifies the name of the relation in the
database, the tuple, and finally the required attribute value .
 Further, it translate it into the form of relational algebra.
 After translating the given query, we can execute each
relational algebra operation by using different algorithms. So,
in this way, a query processing begins its working.

2. Optimization
 A database system generates an efficient query evaluation
plan, which minimizes its cost. This type of task performed by
the database system and is known as Query Optimization.
 For optimizing a query, the query optimizer should have an
estimated cost analysis of each operation.
 It is because the overall operation cost depends on the
memory allocations to several operations, execution costs,
and so on.

3. Evaluation
 In order to fully evaluate a query, the system needs to
construct a query evaluation plan.
 The annotations in the evaluation plan may refer to the
algorithms to be used for the particular index or the specific
operations.
 The query evaluation plan is also referred to as the query
execution plan.
 A query execution engine is responsible for generating the
output of the given query. It takes the query execution plan,
executes it, and finally makes the output for the user query.

Query Cost

Measures of Query Cost
 Cost is generally measured as the total time required to execute a
statement/query.
 Factors contribute to time cost
1. Disk accesses (time to process a data request and retrieve the required
data from the storage device)
2. CPU time to execute a query
3. Network communication cost

Evaluation of expressions

Evaluation of expressions
 In the query processing system, we use two methods for

evaluating an expression carrying multiple operations. These
methods are:
1. Materialization
2. Pipelining

Materialization
 Evaluate one operation at a time, starting at the lowest-level.
 The intermediate result of each operation is materialized stored
in temporary relation and becomes input for next operations.
 The cost of materialization is the sum of the individual
operations plus the cost of writing the intermediate results to
disk.
 The problem with materialization is that
• it creates lots of temporary relations
• it performs lots of I/O operations

Materialization
 Expression may contain more than one operations, solving
expression will be difficult if it contains more than one expression.
 Cust_Name ( Balance<2500 (account) customer )
 To evaluate such expression we need to evaluate each operation
one by one in appropriate order.
3
Õ Cust_Name
2
Bottom to top
Execution
1
 Balance<2500 customer
account
Pipelining
 In pipelining, operations form a queue, and results are passed from
one operation to another as they are calculated.
 To reduce number of intermediate temporary relations, we pass
results of one operation to the next operation in the pipelines.
 Combining operations into a pipeline eliminates the cost of reading
and writing temporary relations.
 Pipelines can be executed in two ways:
1. Demand driven : In this method, the result of lower level queries are not
passed to the higher level automatically. It will be passed to higher level only
when it is requested by the higher level.
2. Producer driven : In this method, the lower level queries eagerly pass the
results to higher level queries. It does not wait for the higher level queries to
request for the results.

Comparison
Materialization Pipelining
It is a traditional approach to evaluate It is a modern approach to evaluate multiple
multiple operations. operations.
It uses temporary relations for storing the It does not use any temporary relations for
results of the evaluated operations. So, it storing the results of the evaluated
needs more temporary files and I/O. operations.
It is less efficient as it takes time to generate It is a more efficient way of query evaluation
the query results. as it quickly generates the results.
It does not have any higher requirements for It requires a high rate for generating
query evaluation. outputs.
The overall cost includes the cost of It optimizes the cost of query evaluation. As
operations plus the cost of reading and it does not include the cost of reading and
writing results on the temporary storage. writing the temporary storages.

Query Optimization

Query optimization
 It is a process of selecting the most efficient query evaluation
plan from the available possible plans.
 Cust_Name ( Balance<2500 (Account) Customer )
Efficient plan 2 records 4 records
 Cust_Name ( Balance<2500 (Account Customer ))
Customer 4 records Account 4 records

Cid Ano Cust_name Ano Balance
C01 A01 Raj A01 3000
C02 A02 Meet A02 1000
C03 A03 Harsh A03 2000
C04 A04 Punit A04 4000

Approaches to Query Optimization
1. Exhaustive Search Optimization
• Generates all possible query plans and then the best plan is selected.
• It provides best solution.
2. Heuristic Based Optimization
 Heuristic based optimization uses rule-based optimization
approaches for query optimization.
• Performs select and project operations before join operations. This is
done by moving the select and project operations down the query tree.
This reduces the number of tuples available for join.
• Avoid cross-product operation because they result in very large-sized
intermediate tables.
• This algorithms do not necessarily produce the best query plan.

Transformation of relational expressions
 Equivalence Rules:
 For implementing such a step, we use the equivalence rule that
describes the method to transform the generated expression into
a logically equivalent expression.
 The equivalence rule says that expressions of two forms are the
same or equivalent because both expressions produce the same
outputs.
 It means that we can possibly replace the expression of the first
form with that of the second form and replace the expression of
the second form with an expression of the first form.

 The optimizer uses various equivalence rules for describing each
rule, we will use the following symbols:
 θ, θ1, θ2 … : Used for denoting the predicates.

 L1, L2, L3 … : Used for denoting the list of attributes.
 E, E1, E2 …. : Represents the relational-algebra expressions.
 Let's discuss a number of equivalence rules:

1. Combined selection operation can be divided into sequence of
individual selections. This transformation is called cascade of σ.
Customer Output
Cid Ano Cust_name Balance Cid Ano Cust_name Balance
C01 1 Raj 3000 C02 2 Meet 1000
C02 2 Meet 1000
C03 3 Harsh 2000
C04 4 Punit 4000
σ Ano<3 Λ Balance<2000 (Customer) = σ Ano<3 (σBalance<2000 (Customer))
σθ1Λθ2 (E) = σθ1(σθ2 (E))

2. Selection operations are commutative.
Customer Output
Cid Ano Cust_name Balance Cid Ano Cust_name Balance
C01 1 Raj 3000 C02 2 Meet 1000
C02 2 Meet 1000
C03 3 Harsh 2000
C04 4 Punit 4000
σ Ano<3 (σBalance<2000 (Customer)) = σ Balance<2000 (σAno<3 (Customer))
σθ1(σθ2 (E) = σθ2(σθ1 (E))

3. If more than one projection operation is used in expression then
only the outer projection operation is required. So skip all the
other inner projection operation.
Customer Output
Cid Ano Cust_name Balance Cust_name
C01 1 Raj 3000 Raj
C02 2 Meet 1000 Meet
C03 3 Harsh 2000 Harsh
C04 4 Punit 4000 Punit
∏Cust_name (∏Ano, Cust_name (Customer)) = ∏ Cust_name (Customer)
∏L1 (∏L2 (… (∏Ln (E))…)) = ∏L1 (E)

4. Selection operation can be joined with cartesian product and
theta join.
Customer Balance Output
Cid Ano Cust_name Ano Balance Cid Ano Cust_name Balance
C01 1 Raj 1 3000 C01 1 Raj 3000
C02 2 Meet 2 1000 C02 2 Meet 1000
C03 3 Harsh 3 2000
C04 4 Punit 4 4000
σ Ano<3 (Customer Balance) = (Customer) σ Ano<3 (Balance)
σ θ (E1 E2) = E1 θ E2
σ θ1 (E1 θ2 E2) = E1 θ1Λθ2 E2

5. Theta operations are commutative.
Customer Balance Output

Cid Ano Cust_name Ano Balance Cid Ano Cust_name Balance
C01 1 Raj 1 3000 C01 1 Raj 3000
C02 2 Meet 2 1000 C02 2 Meet 1000
C03 3 Harsh 3 2000
C04 4 Punit 4 4000
(Balance) σ Ano<3 (Customer) = (Customer) σAno<3 (Balance)
E1 θ E2 = E2 θ E1

6. Natural join operations are associative.
(E1 E2) E3 = E1 (E2 E3)
7. Selection operation distributes over U, ∩ and –.
σ (E1 – E2) = σ (E1) – σ (E2)
θ θ θ
similarly selection operation is distributed for U and ∩ also.

8. Set operations union and intersection are commutative.
set difference is not commutative Union Intersect
Customer Employee Output Output
Cust_name Emp_name Name Name
Raj Meet Raj Meet
Meet Suresh Meet
Suresh
Customer U Employee = Employee U Customer
Customer ∩ Employee = Employee ∩ Customer
E1 U E2 = E2 U E1
E1 ∩ E2 = E2 ∩ E1
9. Set operations union and intersection are associative.
Union Intersect
Customer Employee Student Output Output
Cust_name Emp_name Emp_name Name Name
Raj Meet Raj Raj Meet
Meet Suresh Meet Meet
Suresh
(Customer U Employee) U Student = Customer U (Employee U Student)
(Customer ∩ Employee) ∩ Student = Customer ∩ (Employee ∩ Student)
(E1 U E2) U E3 = E1 U (E2 U E3)

(E1 ∩ E2) ∩ E2 = E1 ∩ (E2 ∩ E3)
Join Operation

Join Operations
 There are several different algorithms that can be used to
implement joins
1. Nested-Loop Join
2. Block Nested-Loop Join
3. Index Nested-Loop Join
4. Sort-Merge Join
5. Hash-Join

Nested-Loop Join
 In NLJ method, when we join the tables, one table is small and
another table is large.
 Small table will be consider as outer table and large table will be
inner table.
 All the rows from inner table will be compared one by one ,if
matched then include into output else skipped.
 It is very costly type of join.

Nested-Loop Join (Cont..)
Table A
[Inner Table] Table B Output
[Outer Table]
2 1
4
4
4
6 7
7 5
9 1
7
1

Block Nested-Loop Join
 BNLJ method selects block of records from both the tables and
compares the records in them. Hence the processing is done block
wise.
 Here, it traverses each block of records in a loop. We can observe
that, each of outer block reads inner table records only once.
Where as in nested loop join each record in the outer table used
to read inner table records.
 Hence cost of this method has much better compared to nested
loop method.

Block Nested-Loop Join(Cont..)
R
S

Index Nested-Loop Join
 The last sentence of BNLJ above makes us to think what will
happen if we have index on the columns used in join condition.
When indexes are used, there is no sequential scan of records.
 When indexes are used on the columns that are used in join
condition, it will not scan each records of inner table. It will
directly fetch matching record.
 But we will have the cost for fetching the index in the index table.

Sort-Merge Join
 Basic idea: First sort both relations on join attribute (if not already
sorted this way).
 Every pair with same value on join attribute must be matched.
 If no repeated join attribute values, each tuple needs to be read
only once. As a result, each block is read only once. Thus, the
number of block accesses is BR + BS

Sort-Merge Join (Cont..)

Hash Join
 The Hash Join algorithm is used to perform the natural join or
equi join operations.
 Using the hash function on the hash keys guarantees that any two
joining records must be in the same pair of files.
 The main goal of using the hash function in the algorithm is to
reduce the number of comparisons and increase the efficiency to
complete the join operation on the relations.
 A hash function is used to partition tuples of both relations into
sets that have the same hash value on the join attribute

Cost of computing for all joins
 R is called the outer and S the inner relation of the join.
• Number of records of R: (NR)
• Number of records of S: (NS)
• Number of blocks of R: (BR)
• Number of blocks of S: (BS)

• c is the cost of a single selection on S using the join condition.
Join Worst Case Best Case
Nested-Loop Join BR + N R ∗ B S BR + B S
Block Nested-Loop Join BR + B R ∗ B S BR + B S
Index Nested-Loop Join BR + N R ∗ c
Merge Join BR + B S
Hash-Join 3 ∗ (BR + BS)

Questions asked in GTU
1. Explain Query Processing steps. OR Discuss various steps of query
processing with proper diagram OR Explain query evaluation
process.
2. Explain Heuristics in Optimization.
3. Explain steps of the query processing.
4. Write down the measures of finding out the cost of a query in
query processing.

DBMS Unit-5

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DBMS Unit-5

Uploaded by

Copyright:

Available Formats

3130703

Unit – 5: Query Processing & Optimization 2

Unit – 5: Query Processing & Optimization 3

 In query processing, it takes various steps for fetching the data

 The steps involved are:

Unit – 5: Query Processing & Optimization 4

Unit – 5: Query Processing & Optimization 5

Unit – 5: Query Processing & Optimization 6

Unit – 5: Query Processing & Optimization 7

Unit – 5: Query Processing & Optimization 8

Unit – 5: Query Processing & Optimization 9

Unit – 5: Query Processing & Optimization 10

Unit – 5: Query Processing & Optimization 11

 In the query processing system, we use two methods for

Unit – 5: Query Processing & Optimization 12

Unit – 5: Query Processing & Optimization 13

Unit – 5: Query Processing & Optimization 15

Unit – 5: Query Processing & Optimization 16

Unit – 5: Query Processing & Optimization 17

Efficient plan 2 records 4 records

 Cust_Name ( Balance<2500 (Account Customer ))

Customer 4 records Account 4 records

Unit – 5: Query Processing & Optimization 18

Unit – 5: Query Processing & Optimization 19

Unit – 5: Query Processing & Optimization 21

 θ, θ1, θ2 … : Used for denoting the predicates.

 Let's discuss a number of equivalence rules:

Unit – 5: Query Processing & Optimization 22

σ Ano<3 Λ Balance<2000 (Customer) = σ Ano<3 (σBalance<2000 (Customer))

σθ1Λθ2 (E) = σθ1(σθ2 (E))

σ Ano<3 (σBalance<2000 (Customer)) = σ Balance<2000 (σAno<3 (Customer))

σθ1(σθ2 (E) = σθ2(σθ1 (E))

∏Cust_name (∏Ano, Cust_name (Customer)) = ∏ Cust_name (Customer)

∏L1 (∏L2 (… (∏Ln (E))…)) = ∏L1 (E)

σ Ano<3 (Customer Balance) = (Customer) σ Ano<3 (Balance)

σ θ1 (E1 θ2 E2) = E1 θ1Λθ2 E2

Customer Balance Output

(Balance) σ Ano<3 (Customer) = (Customer) σAno<3 (Balance)

Unit – 5: Query Processing & Optimization 27

similarly selection operation is distributed for U and ∩ also.

Unit – 5: Query Processing & Optimization 28

Customer U Employee = Employee U Customer

Customer ∩ Employee = Employee ∩ Customer

(Customer U Employee) U Student = Customer U (Employee U Student)

(Customer ∩ Employee) ∩ Student = Customer ∩ (Employee ∩ Student)

(E1 U E2) U E3 = E1 U (E2 U E3)

Unit – 5: Query Processing & Optimization 31

Unit – 5: Query Processing & Optimization 32

Unit – 5: Query Processing & Optimization 33

Unit – 5: Query Processing & Optimization 34

Unit – 5: Query Processing & Optimization 35

Unit – 5: Query Processing & Optimization 36

Unit – 5: Query Processing & Optimization 37

Unit – 5: Query Processing & Optimization 38

Unit – 5: Query Processing & Optimization 39

Unit – 5: Query Processing & Optimization 40

• Number of records of S: (NS)

• Number of blocks of R: (BR)

• Number of blocks of S: (BS)

Unit – 5: Query Processing & Optimization 41

Unit – 5: Query Processing & Optimization 42

You might also like