You are on page 1of 4

What is the distributed query optimization?

More than one sites cooperate to make decisions on how to optimize a query

9.1 Query Optimization, page 229


The input to query optimization is a query expressed in relational algebra (not calculus). Query optimization: process of producing a query execution plan (QEP) that optimizes or minimizes. The search space: all possible execution plans. They all must produce the same result. The cost model: the anticipated cost of a particular execution plan. The search strategy: this explores the search space and selects the best plan. The strategy defines the order in which plans are examined..

Distributed Query Processing distributed systems, several additional factors further complicate In query processing: Cost of data transmission this includes intermediate data and the final result. Hence, the query optimization algorithm must attempt to reduce the amount of data transfer Distributed Query Processing A query is decomposed into a set of sub-queries that can be executed at the individual sites. A strategy for combining the results of the sub-queries to form the query result must be generated: An estimate on the size of data transmission must be made to minimize communication cost: In case of data replication and fragmentation attempt must be made to choose the closest replica and/or fragments. If possible, semijoin operation should be performed to reduce data transfer size. Where ever possible attempt must be made to enforce heuristics and equivalence rules as discussed before, at each site, to reduce the execution cost. In a distributed system, other issues must be taken into account: The cost of a data transmission over the network. The potential gain in performance from having several sites process parts of the query in parallel.

Query optimization: Process of producing an optimal (close to optimal) query execution plan which represents an execution strategy. The main task in query optimization is to consider different orderings of the operations It is then the optimizers task to come up with the optimal execution plan for the given query. Essentially, the optimizer enumerates all possible execution plans determines the quality (cost) of each plan chooses the best one as the final execution plan Database systems judge the quality of an execution plan based on a number of cost factors, e.g., the number of disk I/Os required to evaluate the plan, the plans CPU cost, the overall response time observable by the database client as well as the total execution time. A cost-based optimizer tries to anticipate these costs and find the cheapest plan before actually running it. All of the above factors depend on one critical piece of information: the size of (intermediate) query results.

Database systems, therefore, spend considerable effort into accurate result size estimates. Search Strategies: Heuristic Heuristic Selection Make a sequence of choices for annotations based on heuristics. E.g., A greedy algorithm for join ordering. Only one plan is generated! Search Strategies: Exhaustive Exhaustive Search 1. Price all possible physical query plans derivable from the logical query plan. 2. Choose the least expensive. Enumeration of plans (plan generator) top-down bottom-up Search Strategies: Hill Climbing Hill Climbing Find an initial plan, usually by heuristic selection. Sequentially make local modifications to the plan that reduce the estimated cost. When no cost reducing moves are left, halt. Explores the plan space locally around the initial plan. Search Strategies: Dynamic Programming Dynamic Programming Approach: A bottom-up enumeration Price all possible plans for each initial sub-query. For each sub-query, keep the least expensive plan. Move up the tree. Generate all possible plans for each sub-query, assuming the best plans already chosen for each of its sub-queries. Finished once at the root of the tree. Prunes the search space dynamically. Search Strategies: System R Selinger-Style Optimization / System R Approach: Revision of dynamic programming Price all possible plans for each initial sub-query. For each sub-query, keep several plans: the least expensive plan, and

for each interesting order, the least expensive plan of those that preserve that interesting order. Move up the tree. Finished once at the root of the tree. Prunes the search space dynamically. Search Strategies: Branch-and-Bound Branch-and-Bound Find an initial plan, usually by heuristic selection. Consider different plans until a plan cheaper than a threshold cost is found, or time runs out. Must be coupled with a plan generator.

You might also like