You are on page 1of 43

Chapter 2

Query Processing and Optimization


Basic outline
• Introduction
• Query Processing steps
• Query Decomposition
• Query Optimization and query Optimization Process
• Approaches to Query Optimization
• Transformation Rules
• Implementing relational Operators
• Pipelining
Introduction
• Query processing is an Activities involved in retrieving data from the
database
• Query processing is the process of translating a high-level query, such
as SQL, into a low-level query that can be executed by the database
system.
• This involves parsing, validating, and optimizing the query, as well as
generating a query execution plan.
• A query execution plan is a sequence of operations that the database
system performs to retrieve the data requested by the query.
cont.
Query processing :
• In general, a query is a form of questioning, in a line of inquiry
• A query language is a language in which user requests
information from the database
• The aim of query processing is to find information in one or
more databases and deliver it to the user
Query optimization(QO)
• It is the process of choosing a suitable execution strategy for processing a
query
• is the process of finding the best query execution plan for a given query,
based on various factors such as the schema, the data, the indexes, the
statistics, and the system resources.
• QO aims to minimize the cost of executing the query, which can be
measured by the time, the disk I/O, the memory usage, or the network
traffic.
• Query optimization can be done statically, before executing the query, or
dynamically, during the execution.
• QO is an Activity of choosing an efficient execution strategy for processing
query.
Query Processing steps
• What are the steps involved in query
processing?
• There are several steps involved in
query processing in DBMS.
• The first step is known as
decompsition (parsing and
validation)
• the second step is query
optimization
• the third step is code generation and
• finally, the query execution.
Step 1: Parsing and translation
• In this step of query processing
• The dbms translator converts high level query provided by user to
internal form where parser analyse this query and performs syntax,
semantic and shared pool check.
• It also verifies the name of the relation in the database, the tuple,
and finally the required attribute value.
• And finally query is translated into relational algebra form. After this
process all the usage of the views are replaced in the query.
Following checks are performed in parsing phase:
• Syntax checking – In this step parser checks whether query is
syntacticly correct or not.
• Example: SELECT * Form Students;
• In the above query syntax check will fail as there is spelling error in
from keyword.
• Semantic checking – This step checks the meaningfulness of the sql
query.
• It also verifies all table & column names from the dictionary and
checks to see if you are authorised to see the data.
• Example: Semantic check will fail if query refers to table in database
which does not exist.
cont.
• Shared Pool check – The next step in the parse operation is to see if
the statement we are currently parsing has already in fact been
processed by some other session.
• Every query is associated with hash code at the time of execution.
• This step checks whether there is existing hash code in shared pool.
• If there is a hash code already present DB will not perform
optimisation and row source generation for this query.
• For example, our SQL query will be converted into a Relational
Algebra equivalent as follows after this step:
• πstudent_name(σpercentage>90 (Students))
Step 2: Optimisation
• This step of query processing in dbms analyses SQL queries and
determines efficient execution mechanisms.
• Optimiser uses the statistical data stored as part of data dictionary.
• The statistical data are information about the size of the table, the
length of records, the indexes created on the table, etc.
• A query optimiser generates one or more query plans for each query,
each of which may be a mechanism used to run a query.
• The most efficient query plan is selected and used to run the query.
Step 3: Evaluation
• From the previous step we got many execution plans constructed
through statistical data which all gives the same output, but they
differ in terms of time and space consumption to execute the query.
Hence, it becomes mandatory to choose one plan which obviously
consumes less cost and is effecitve.
• In this stage, we choose one execution plan from the several plans
provided by previous step. This Execution plan accesses data from the
database to give the final result.
• For our SQL query , it can also be written as
• σpercentage>90 (πstudent_name (Students))
• The evaluation step picks up best execution plan for execution.
Step 4: Execution Engine
• A query execution engine is responsible for generating the output of
the given query.
• It takes the query execution plan chosen from previous step of query
processing in dbms and executes it.
• Finally the output is displayed to the user.
Query processing.....
• Query Processing can be divided into four main phases:
• Decomposition:
• Optimization
• Code generation, and
• Execution
Query processing.....
• Query Decomposition: is the first phase of query processing
• which aims are to transform a high-level query into a relational algebra query
and to check whether that query is syntactically and semantically correct.
• This technique can be used to improve the performance of a query by
reducing the amount of data that needs to be exchanged between the nodes
in the system.
• The query decomposer goes through five stages of processing for
decomposition into low-level operation and translation into algebraic
expressions.
• The typical stages of query decomposition are analysis, normalization,
semantic analysis, simplification, and query restructuring.
• quey decomposition processing stage
Sql query query analysis

normalization

semantic analysis

simplification

query restructuring
Query processing.....
• query decomposition
• Query Analysis :- During the query analysis phase, the query is
syntactically analyzed using the programming language compiler
(parser).
• A syntactically legal query is then validated, using the system catalog,
to ensure that all data objects (relations and attributes) referred to by
the query are defined in the database.
• The type specification of the query qualifiers and result is also
checked at this stage.
cont.
• Query Normalization :- The primary goal of the normalization is to
avoid redundancy.
• The normalization phase converts the query into a normalized form
that can be more easily manipulated.
• In the normalization phase, a set of equivalency rules are applied so
that the projection and selection operations included on the query
are simplified to avoid redundancy.
• The projection operation corresponds to the SELECT clause of SQL
query and the selection operation correspond to the predicate found
in WHERE clause
cont.
• Simplification:Simplification strategy:
• Detects redundant qualifications,
• Eliminates common sub-expressions,
• Transforms query to semantically equivalent but more easily and efficiently
computed form.
• Typically, access restrictions, view definitions, and integrity
constraints are considered.
• Assuming user has appropriate access privileges, first apply
• well-known idempotency rules of Boolean algebra.
• Ex.: p ᴧ p equivalent to p
p ᴧ true equivalent to true
cont.
• Query Restructuring :- In the final stage of the query decomposition,
• the query can be restructured to give a more efficient implementation.
• Transformation rules are used to convert one relational algebra
expression into an equivalent form that is more efficient.
• The query can now be regarded as a relational algebra program,
consisting of a series of operations on relation.
query optimization
• The primary goal of query optimization is of choosing an efficient
execution strategy for processing a query.
• The query optimizer attempts to minimize the use of certain resources
(mainly the number of I/O and CPU time) by selecting a best execution
plan (access plan).
• A query optimization start during the validation phase by the system to
validate the user has appropriate privileges.
• Now an action plan is generate to perform the query
• A query typically has many possible execution strategies, and the
process of choosing a suitable one for processing a query is known as
Query Optimization.
cont.
• The basic issues in Query Optimization are :
• How to use available indexes.
• How to use memory to accumulate information and perform immediate steps
such as sorting.
• How to determine the order in which joins should be performed.
• The term query optimization does not mean giving always an optimal (best)
strategy as the execution plan, but It is just a responsibly efficient strategy for
execution of the query.
• The decomposed query block of SQL is translating into an equivalent extended
relational algebra expression and then optimized.
cont.
• There are two main techniques for implementing Query Optimization:
• The first technique is based on Heuristic Rules for ordering the operations in a
query execution strategy.
• The second technique involves the systematic estimation of the cost of the
different execution strategies and choosing the execution plan with the lowest
cost.
• Semantic query optimization is used with the combination with the heuristic
query transformation rules.
• It uses constraints specified on the database schema such as unique attributes
and other more complex constraints, in order to modify one query into another
query that is more efficient to execute.
• The heuristic rules are used as an optimization technique to modify
the internal representation of query.
• Usually, heuristic rules are used in the form of query tree of query
graph data structure, to improve its performance.
• One of the main heuristic rule is to apply SELECT operation before
applying the JOIN or other BINARY operations.
• This is because the size of the file resulting from a binary operation
such as JOIN is usually a multi-value function of the sizes of the input
files.
• The SELECT and PROJECT reduced the size of the file and hence,
should be applied before the JOIN or other binary operation.
cont.
• Heuristic query optimizer transforms the initial (canonical) query tree into
final query.
• tree are using equivalence transformation rules. This final query tree is
efficient to execute.
• For example consider the following relations :
• Employee (EName, EID, DOB, EAdd, Sex, ESalary, EDeptNo)
• Department (DeptNo, DeptName, DeptMgrID, Mgr_S_date)
• DeptLoc (DeptNo, Dept_Loc)
• Project (ProjName, ProjNo, ProjLoc, ProjDeptNo)
• WorksOn (E-ID, P-No, Hours)
• Dependent (E-ID, DependName, Sex, DDOB, Relation)
• Now let us consider the query in the above database to find the
name of employees born after 1970 who work on a project named
‘Growth’.
• SELECT EName FROM Employee, WorksOn, Project
WHERE ProjName = ‘Growth’ AND ProjNo = P-No
AND EID = E-ID AND DOB > ‘31-12-1970’;
Transformation rules
• Transformation rules are used by the query optimizer to transform
one relational algebra expression into an equivalent expression that is
more efficient to execute.
• A relation is consider as equivalent of another relation if two
relations have the same set of attributes in a different order but
representing the same information.
• These transformation rules are used to restructure the initial
(canonical) relational algebra query tree attributes during query
decomposition.
cont.
1. Cascade of σ :-
σ c1 AND c2 AND …AND cn (R) = σ c1 (σ c2 (…(σ cn (R))…))
2. Commutativity of σ :-
σ C1 (σ C2 (R)) = σ C2 (σ C1 (R))
3. Cascade of Л :-
Л List1 (Л List2 (…(Л List n (R))…)) = Л List1 (R)
4. Commuting σ with Л :-
Л A1,A2,A3…An (σ C (R) ) = σ C (Л A1,A2,A3…An (R))
cont.
5. Commutativity of ⋈AND x :-
R ⋈ cS = S ⋈ c R
RxS=SxR
6. Commuting σ with ⋈ or x :-
If all attributes in selection condition c involved only attributes of one
of the relation schemas (R).
σ c (R ⋈ S) = (σ c (R) ) ⋈ S
• Alternatively, selection condition c can be written as (c1 AND c2)
where condition c1 involves only attributes of R and condition c2
involves only attributes of S then :
• σ c (R ⋈ S) = (σ c1 (R) ) ⋈ (σ c2 (S) )
cont.
7. Commuting Л with ⋈ or x :-
• The projection list L = {A1,A2,..An,B1,B2,…Bm}.
• A1…An attributes of R and B1…Bm attributes of S.
• Join condition C involves only attributes in L then :
• ЛL ( R ⋈ c S ) = ( ЛA1,…An (R) ) ⋈c ( ЛB1,…Bm(S) )
8. Commutativity of SET Operation :-
• R ⋃S = S ⋃R
• R⋂S=S⋂R
• Minus (R-S) is not commutative.
cont.
9. Associatively of ⋈, x, ⋂, and ⋃ :-
• If ∅ stands for any one of these operation throughout the expression then :
• (R ∅ S) ∅ T = R ∅ (S ∅ T)
10. Commutativity of σ with SET Operation :-
• If ∅ stands for any one of three operations (⋃,⋂,and-) then :
• σ c (R ∅ S) = (σ c (R)) ⋃ (σ c (S))
• Л c (R ∅ S) = (Л c (R)) ⋃ (Лc (S))
11. The Л operation comute with ⋃ :-
• Л L (R ⋃ S) = (Л L(R)) ⋃ (Л L(S))
12. Converting a (σ,x) sequence with ⋃
• (σ c (R x S)) = (R ⋈ c S)
Implementing relational Operators
 Relational algebra operations:
 Select, Project , Join, Union, Intersection, Cartesian Product
• Implementing SELECT operation
• There are many options for executing a SELECT operation.
• Some options depend on the file having specific access paths and
may apply only to certain types of selection conditions .
• Examples: Use the following logical model provided in the following
slide to understand the following operations:
cont.
• logical model of database
cont
(OP1) σSsn=12345689 (EMPLOYEE)
equality comparison on key attribute
(OP2) σDNUMBER > 5 (DEPARMENT)
nonequality comparison on key attribute
(OP3) σDNO=5 (EMPLOYEE)
equality comparison on non key attribute
(OP4) σDNO=5 AND SALARY >30000 AND SEX=F(EMPLOYEE)
conjunctive condition
(OP5) σSsn=123456789 AND PNO=10 (WORKS_ON)
conjunctive condition and composite key
cont.
• Search Methods for implementing Simple Selection:
• S1 Linear search (brute force):
• Retrieve every record in the file, and test whether its attribute values
satisfy the selection condition.
• S2 Binary search:
• If the selection condition involves an equality comparison on a key
attribute on which the file is ordered, binary search (which is more
efficient than linear search) can be used. An example is OP1 if IDNo is the
ordering attribute for EMPLOYEE file
• S3 Using a primary index or hash key to retrieve a single record:
• If the selection condition involves an equality comparison on a key
attribute with a primary index (or a hash key), use the primary index (or
the hash key) to retrieve the record.
• For Example, OP1 use primary index to retrieve the record
cont.
• S4 Using a primary index to retrieve multiple records:
• If the comparison condition is >, ≥, <, or ≤ on a key field with a
primary index, use the index to find the record satisfying the
corresponding equality condition, then retrieve all subsequent
records in the (ordered) file. (see OP2)
• S5 Using a clustering index to retrieve multiple records:
• If the selection condition involves an equality comparison on a
non-key attribute with a clustering index, use the clustering
index to retrieve all the records satisfying the selection
condition. (See OP3)
• S6: using a secondary index on an equality comparison:
• This search method can be used to retrieve a single record if the
indexing field is a key (has unique values) or to retrieve multiple
records if the indexing field is not a key
• S7: Conjunctive selection using an individual index :
• If an attribute involved in any single simple condition in the
conjunctive condition has an access path that permits the use
of one of the methods S2 to S5, use that condition to retrieve
the records and then check whether each retrieved record
satisfies the remaining simple conditions in the conjunctive
condition.
Implementing the JOIN Operation:
•The join operation is one of the most time consuming operation in
query processing .
• Join
• two–way join: a join on two files
• e.g. R A=B S
• multi-way joins: joins involving more than two files
• e.g. R A=B S C=D T
• Examples
• (OP6): EMPLOYEE DNO=DNUMBER DEPARTMENT

• (OP7): DEPARTMENT MGRSSN=SSN EMPLOYEE


PIPELINING
• When a QUERY is composed of several relational algebra operations,
the result of one operator is sometimes pipelined to another
operator without creating a temporary relation to hold the
intermediate result.
• When a input relation to a unary operation (for example SELECT or
PROJECTION) is pipelined into it, it is sometimes said that the
operation is applied on-the-fly.
• Pipelining is sometime used to improve the performance of the
query.
cont.
• If an output of an operation is saved in a temporary relation for processing by
the next operator, it is said that the tuple are materialized.
• Thus, this process of temporarily writing intermediate algebra operation is
called materialization.
• The materialization process is start from the lowest level operation in the
expression, which are at the bottom of the query tree.
• The inputs of the lowest level operations are the relations in the database.
• The lowest level operations on the input relations are executed and stored in
temporary relation.
• Then these temporary relations are used to execute the operation at the next
level up in the tree
cont.
• In materialization the output of one operation is stored in temporary
relation for processing for the next relation.
• By repeating this process, the operation at the root of the tree is
evaluated giving the final result of the expression.
• Alternatively the efficiency of the query evaluation can be improved
by reducing the number of temporary files that are produced.
• Therefore the several relational operations are combined into a
pipeline of operation in which, the result of one operation is pipelined
into another operation without creating the intermediate relation to
hold the intermediate result
cont.
• A buffer is created for each pair of adjacent operations to hold the
tuple being passed from first operation to second operation.
• Pipeline operation is eliminates the cost of reading and writing
temporary relations.
• Advantage :- The use of pipelining saves on the cost of creating
temporary relations and reading the results back in again.
• disadvantage: The inputs to operations are not necessarily available
all at once for processing. This can restrict the choice of algorithm.
the end
thank you
quie 1 5%
• define query processing and query optimization
• list the step of query processing
• what are stages of query decomposition

You might also like