Professional Documents
Culture Documents
Query Processing
and
Optimization
• Define Pipelining
• The result of a retrieval is a new relation, which may have been formed. A
sequence of relational algebra operations forms a relational algebra
expression, whose result will also be a relation that represents the result of
a database query (or retrieval request).
•
input relation.
Projection operator eliminates
sname,rating(S 2)
duplicates!
sname rating
age(S 2)
sname,rating( rating 8(S 2)) yuppy
lubber
9
8
sname rating guppy 5
Yuppy 9
rusty 10
Rusty 10
S1S 2 S1− S 2
• The Result schema has one field per field of S1 and R1,
with field names `inherited’ if possible.
• May have a naming conflict: Both S1 and R1 have a field
with the same name.
• In this case, we can use the renaming operator:
R1 X S1 =
S1 S2 = S1-(S1-S2)
S1 R1 =
4/2/2024 27
Query Optimization
• There are two main techniques for query optimization,
although the two strategies are usually combined in
practice.
• Two Approaches:
• Heuristic Approach to query optimization
• Cost Estimation for the Relational Algebra Operations
• The first technique uses heuristic rules that order the
operations in a query.
• The other technique compares different strategies based on
their relative costs and selects the one that minimizes
resource usage.
• Since disk access is slow compared with memory access, disk
access tends to be the dominant cost in query processing for
a centralized DBMS, and it is the one that we concentrate on
exclusively when providing cost estimates.
4/2/2024 28
Cont…
• Generally, we try to reduce the total execution time of the
query, which is the sum of the execution times of all
individual operations that make up the query.
• However, resource usage may also be viewed as the
response time of the query, in which case we concentrate
on maximizing the number of parallel operations.
• Since the problem is computationally intractable with a
large number of relations, the strategy adopted is generally
reduced to finding a near optimum solution.
• Both methods of query optimization depend on database
statistics to evaluate properly the different options that are
available.
• The accuracy and currency of these statistics have a
significant bearing on the efficiency of the execution
strategy chosen.
4/2/2024 29
Cont…
• The statistics cover information about relations, attributes,
and indexes.
• For example, the system catalog may store statistics giving
the cardinality of relations, the number of distinct values for
each attribute, and the number of levels in a multilevel
index.
• Keeping the statistics current can be problematic.
• If the DBMS updates the statistics every time a tuple is
inserted, updated, or deleted, this would have a significant
impact on performance during peak periods.
• An alternative, and generally preferable, approach is to
update the statistics on a periodic basis, for example nightly,
or whenever the system is idle.
• Another approach taken by some systems is to make it the
users’ responsibility to indicate when the statistics are to be
updated. 4/2/2024 30
Heuristics Approach
• Heuristics Approach uses the knowledge of the characteristics of the relational
algebra operations and the relationship between the operators to optimize the
query.
4/2/2024 31
Query Tree
• is a graphical representation of the operators, relations,
attributes and processing sequence during query processing.
• is a tree data structure that corresponds to a relational
algebra expression. It represents the input relations of the
query as leaf nodes of the tree, and represents the
relational algebra operations as internal nodes.
• It is composed of three main parts:
• The Leafs: the base relations used for processing the query/ extracting
the required information
• The Root: the final result/relation as an output based on the operation on
the relations used for query processing
• Nodes: intermediate results or relations before reaching the final result.
4/2/2024
nodes and ends at ADB(SSoftware
the root.Engineering) 32
Cont’d…
• An execution of the query tree consists of executing an internal node
operation whenever its operands are available and then replacing that
internal node by the relation that results from executing the operation.
• Query graph:
• Issues
• Cost function
• Number of execution strategies to be considered
• Cost Components for Query Execution
1. Access cost to secondary storage
2. Storage cost
3. Computation/calculation cost
4. Memory usage cost
5. Communication cost
4/2/2024 ADB(SSoftware Engineering) 36
Cont’d…
• The main idea is to minimize the cost of processing a query.
• The cost function is comprised of:
I/O cost + CPU processing cost + communication cost + Storage cost
• These components might have different weights in different
processing environments
• The DBMs will use information stored in the system catalogue
for the purpose of estimating cost.
• The main target of query optimization is to minimize the size of
the intermediate relation.
• The size will have effect in the cost of:
• Disk Access
• Data Transportation
• Storage space in the Primary Memory
• Writing on Disk ADB(SSoftware Engineering)
4/2/2024 37
Example: Cost Estimation Query optimization
Assume:
–1000 tuples in Staff.
–50 tuples in Branch.
– 50 Managers (one for each branch)
– 5 London branches
–No indexes or sort keys
–All temporary results are written back to disk (memory is small)
–Tuples are accessed one at a time (not in blocks)
Q. Find all managers who work in a London branch
SELECT *
FROM Staff s, Branch b
WHERE s.branchNo = b.branchNo AND s.position = ‘Manager’ AND
b.city = ‘london’;
• σ(position=‘Manager’)∧(city=‘London’)∧(Staff.branchNo=Branch.branchNo
) (Staff × Branch)
• Calculate the cost estimation for the following quires and show all
the necessary steps clearly.
• Submission Date: 27/07/2016 E.C. until 10:00 LT.