Professional Documents
Culture Documents
Query
Processor
Query execution
Steps of a given query
Query optimization
Manual or automatic (DBMS)
4
OPTIMIZATION: SELECTING ALTERNATIVES
Each query can have multiple methods to reach the result
2+2, 1+3, 2*2, 2^2
5
TYPES OF OPTIMIZERS
Exhaustive search
Cost-based
Optimal
Heuristics
6
ONE OR MULTIPLE QUERIES
Single query at a time
Easier to optimize
7
OPTIMIZATION STRATEGY
Static
Optimize prior to the execution
Difficult to estimate the size of the intermediate results & error propagation
Dynamic
Run time optimization
Exact information on the intermediate relation sizes
Have to reoptimize for multiple executions
Hybrid
Compile using a static algorithm
If the error in estimate sizes is larger than threshold, reoptimize at run time
8
OPTIMIZATION DECISION SITES
Centralized
One site decides “best” schedule
Simple
Need knowledge about the entire distributed database
Distributed
Each site cooperates to determine schedule
Need only local information
Cost of cooperation
Hybrid
One site determines the global schedule
Each site optimizes the local schedules
9
QUERY PROCESSING METHODOLOGY
10
STEP 1 – QUERY DECOMPOSITION
Normalize and analyze the query
Bad queries are rejected
11
STEP 2 – DATA LOCALIZATION
What fragments / partitions are involved in the query
12
STEP 3 – GLOBAL QUERY OPTIMIZATION
Find the best (not necessarily optimal) global schedule
13
QUERY OPTIMIZATION PROCESS
QEP = Query Execution Plan Search/Solution space
Set of possible solutions (query trees)
Cost model
E.g. I/O cost + CPU cost + communication cost
Search strategy
How do we move inside the search space?
Deterministic, randomized
14
NORMAL QUERY PROCESSING ISSUES
The optimizer needs sufficient knowledge about runtime
Good for systems with few data sources and a controlled environment
What about changing environments?
Or large numbers of data sources?
Unpredictable runtime conditions?
15
EXAMPLE: QEP WITH BLOCKED OPERATOR
Join
Student Project
16
ADAPTIVE QUERY PROCESSING
Receive information from the execution environment
Modify process accordingly
Communication between optimizer and runtime environment and other components
Additional components
Monitoring (statistics, data, network, cost), assessment, reaction
Embedded in control operators of QEP
17
CONCLUSION ON QUERY PROCESSING
There are multiple ways to organize query processing
Centralized, distributed, hybrid
Most often the best option is not the “most optimal” solution
18