Professional Documents
Culture Documents
Outline
Overview of Query Processing
Objectives of Query Processing Complexity of Relational Algebra operations
Query Processing
Language of access is SQL Non-procedural: no program is written to tell the DBMS how to get the data Structure of SQL is asking for attributes, constraints, and joins The order and the process of solving the query is done by a query processor Query processor efficiency is crucial to the success of RDBMSs Very complicated issue: many factors to consider
3
Query Processing
) ) a n r o b i t e c g l n a u l f t a s s n u l o o i u C t c l a l a e c r l f a o n o s i n t o a i l t e g a r r n i e n i p p p d o a e e s m s s a e r b p a t x e a ( d (
query processor
The query processor takes a high level query (e.g., in SQL) and translates it into a set of relational algebraic expressions. Since this can be done in a number of different ways, query processor 5 must choose the best one. This is called query optimization.
A query is correct:
Low level query has the same semantics as original query (i.e if both queries produce the same result)
A relational calculus may have many equivalent and correct transformations into relational algebra An efficient execution strategy
To select the RA that minimizes resource consumption
7
Example Continued
Find employee names who manage projects: SELECT FROM WHERE AND ENAME EMP, ASG EMP.ENO = ASG.ENO RESP = Manager
Scenario 1: ENAME( RESP = Manager EMP.ENO = ASG.ENO (EMP ASG) ) Scenario 2: ENAME( EMP
ENO
( RESP=Manager (ASG) ) )
10
12
Strategy 2:
Move all information to Site 5. Compute Selections and Joins on Site 5, and display results. Assumption: The access methods to relations EMP and ASG based on attributes RESP and ENO are lost because of data transfer.
Strategy 1
14
l l l
Strategy 2
15
Total Cost is
16
4.
400 * 20 * tupacc =
Total Cost is
17
18
Cost Measures
Measure resource consumption in terms of total cost: Cost incurred in processing the query
Sum of all times in processing the query operations at various sites and intersite communication.
Measure response time of the query: Which is the time elapse for executing the query. Advantages for parallel execution of operations at different sites.
19
CPU cost performing operations on data in main memory. I/O cost time necessary for disk input/output operations
Can be minimized through buffer management
20
Complexity
O(n)
O(nlog n)
22
23
Analysis
detect and reject incorrect queries as early as possible possible for only a subset of relational calculus
Simplification
eliminate redundant predicates
Restructuring
calculus query algebraic query more than one translation is possible use transformation rules
24
Join methods
nested loop vs ordered joins (merge join or hash join)
26
References
Principles of Distributed Database Systems by M. Tamer Ozsu, Patrick Valduriez , S.Sridhar, 2nd ed.