You are on page 1of 16
Cloud & DevOps Architect trainirgerd Intelicaat ° ‘me tps rinart.com) || Database Management Systems (htps:wwntbrankart com/eubjctiDatabaresManagementSystems_380) || FUNDAMENTALS OF Database Systems (npswwn.brainkart.comisubjeciFUNDAMENTALS-OF Databese-Systoms_180) | Database Management Systoms (tbs tw brlnkarcomiaubjectDatabass Management-Systems_ 181) | Using Heuistosin Query Ostman rey Page (tpsuiewn bran coniancleCombinng-Operatns-Usig Posing 11520) ‘Nox Page (htpsewn rain. comlalUsng Soletty-and.Costatinaessn-Quar-Optinizaion_*1541) ‘Chapter: Fundamentals of Database Systems - Query Processing and Optimization, and Database Tuning - Algorithms for Query Processing and ‘Optimization Using Heuristics in Query Optimization 1. Notation for ry Toes ana Query Graphs 2, Hoursbe Ostznion of Guay Tres 2, Convering Query Tres nto Guory Exeton Plans Using Heuristics in Query Optimization In this section we discuss optimization techniques that spply heuristic cules to modify the internal representation of a query—which is usually in the form of a query tee ot @ query graph data structute—to imprave is expected performance, The scanner and parser of ex SQL query Fist ‘generate a data structure that corresponds to an intial query representation, which is then optimized according to heuristic rules. This leads to an ‘optimized query representation, which corresponds to the query execution strategy. Following that, « query execution plan is generated to exceute _roup of operations based nthe aces paths valle o the ils volved in the query. ‘One ofthe main heuristic rules is to apply SE.EcT and PROIEST operations before applying the om or other binary operations, because the size of the file resulting from a binary ‘PROUECT operations reduce the size ofa fle and hence should be applied before a join o other binary operation, coperation—such as JON—is usually « multiplicative function of the sizes of the input files. The SELECT and In Section 19.7.1 we reiterate the query tee and query graph notations that we introduced eatlir in the context of relational algebra and calculus in Sections 6.3.5 and 6.6.5, respectively, These can be used as the basis forthe data structures that are used for internal representation of queries, ‘A query tree is used to representa relational algebra or extended relational algebra expression, whereas a query graph is used to represent @ ‘relational calculus expression. Then in Section 19.7.2 we show how heuristic optimization rules are applied to convert an intial query tree into an ‘equivalent query tree, which represents a difTerent relational algebra expression that is more efficient to execute but gives the same result as the original tree, We also discuss the equivalence of various relational algebra expressions. Finally, Section 19.7.3 diseusses the generation of query ‘execution plans. 1. Notation for Query Trees and Query Graphs A query tree is tee data structure that corresponds toa relational algebra expression. It represents the input relations of the query as leaf nodes ‘of the tee, and rep-resents the relational algebra operations as internal nodes, An execution of the query tee consists of executing an intemal node operation whenever its operands are available and then replacing that intemal node by the relation that results ftom executing the operation. ‘The order of execution of operations starts at the leaf nodes, which represents te input database relations for the query, and ends atthe root node, ‘which represents the final operation of the query. The execution terminates when the root node operation is executed and produces the resull relation for the query Figure 19.4a shows a query tee (the same as shown in Figure 6.9) for query Q2 in Chapters 4 to 6: For every project Iovated in ‘Stafford’, retrieve the project number, the controlling department number, and the department manager's last name, address, and birthdate. This query is specified on the COMPANY relational schema in Figute 3.5 and corresponds tothe following relational algebra expression: Fenty, ra etn (Specs PROIECT)) 3 SN IBEBARTINENT)) Bn (EMPLOYEE)) This corresponds tothe following SQL query (02 SELECT PPrumber, PDru, ELname, EAddres, ESdote FROM. PROJECT ASP, DEPARTMENT ASD, EMPLOYEE AS E WHERE PDoun=D.Dnunber AND Dg, ssn San AND PPlocaton= Stafford *PPrunberPDrumE Lnames AdcrossEBdata | “pagementison eo * PDnum=D.Dnumbor © )-[ewrcovee| fuer No ‘(©)—[Berarrvenr Le Oz () "PPaumber, Pram, ELname, EAdcress, Ede % T 'PDnuma0 Drumbar AND O May. san Ssn AND PPFocaton Stato! x x a at oe © 6 Foeiba rian arin ® 5) (Elname, EAddroes, E84) DMossncESin 2) Figure 194 Two query res forthe query 2.3) Query tee corresponding ote lana! algebra expression for 02, (5) Ifa Ceanerca quay te for SQL. quary C2 Ce) sry graph for C2. In Figure 19.4a, the leaf nodes P, D, and E represent the three relations PROJECT, DEPARTMENT, and EMPLOYEE, respectively, and the intetnal ree nodes represent the relational algebra operations of the expression. When this query tree is executed, the node marked (1) in Figure ble before we can begin executing 19.44 must begin execution before node (2) because some resulting tuples of operation (1) must be a ‘operation (2), Similarly, node (2) must begin executing and producing results before node (3) can slart execution, and son. [As we can see, the query tree represents a specific order of operations for executing a query. A more neutral data structure for representation of a ‘query is the query graph notation, Figure 19.4e (the same as shown in Figure 6.13) shows the query graph for query Q2. Relations in the query are represented by relation modes, which are displayed as single circles. Constant values, typically from the query selection conditions, represented by constant nodes, which ate displayed as double circles or avals, Selection an join conditions are represented by the graph edges, as shown in Figure 19.4, Finally, te atibutes tobe eet jeved from each rel The query graph representation doesnot indicate an order on which operations to perform fist. There is only a single graph corresponding to each ‘query. Although some optimization techniques were based on query graphs, itis now generally accepted that query trees are preferable beceuse, in practice, the query optimizer needs to show the order of operatio: for query execution, which isnot possible in query graphs, 2. Heuristic Optimization of Query Trees mn general, many different relational algebra expressions—and hence many diferent query trees—ean be equivalent; that is, they ean represent the same query ‘The query parser will typically generate a standard initial query tree to correspond to an SQL query, without doing any optimization. For ‘example, for a SELECT-PROJECT-JOIN query, such as Q2, the initial tre is shown in Figure 19.4(b). The CARTESIAN PRODUCT of the relations specified in the FROM clause is first applied; then the selection and join conditions of the WHERE clause are epplied, followed by the projection on the SELECT clause attributes. Such a canonical query twee represents a relational algebra expression that is very inefficient if ‘executed directly, bocause of the CARTESIAN PRODUCT (*) operations. For example, ifthe PROJECT, DEPARTMENT, and EMPLOYEE relations had record sizes of 100, $0, and 150 bytes and contained 100, 20, and 5,000 tuples, respectively, the result of the CARTESIAN PRODUCT would contain 10 million tuples of record size 300 bytes each. However, the initial query tee in Figure 19.4(b) is in a simple standard form that can be easily er ed from the SQL query. It will never be executed. The heutstc query optimizer will ransform this initial ‘quety tree into an equivalent final query tree that is efficient to execute ‘The optimizer must include rules for equivalence among relational algebra expressions that can be applied to transform the intial tree into the final, optimized query tree. First we discuss informally how a query tree is transformed by using heuristics, and then we discuss general ‘transformation rules and show how they can be used in an algebraic heuristic optimize. Example of Transforming a Query. Consider the following query Q on the data-base in Figure 3.5: Find the last names of employees born after 1957 who work on a project named ‘Aquarius’ This query can be specified in SQL as follows: SELECT Laame FROMEMPLOVEE, WORKS_ON, PROJECT WHERE Prama-‘Aquarus™ AND PaunberPno AND Essn~ssn AND Boate> 1957-1231 ‘The intial query tee for Q is shown in Figure 19.5(a). Executing this tre directly first creates a very large file containing the CARTESIAN PRODUCT of the entite EMPLOYEE, WORKS_ON, and PROJECT files. That is why the initial query wee is never executed, but is ‘transformed into another equivalent tre that is efficient to Figure 195 ‘Stop in convening ary Hoe dung aie optimization Init canorict ary ae fr SAL quay Moving SELECT cperton down the gusty te. ‘Applying te more reste SELECT opraton st Replacing CARTESIAN PRODUCT ang SELECT with JON operations, ‘Moving PROJECT operations down the quar te, © se © Prame=Aauati! AND Prumber=Pro AND Esen=Sen AND Bdate>1957-121 1 << tes ats Sam © Loar © PrumberPro 1 ae cami eat UA BED Ceauceiear om g> PrumberPro “Bd 1987-1231" we “mere Sa @ * Eeamnan Prumpatro pane T057-19391 i ee PROEGT © "sn, Lnane sem i ere eva 19574281 sooo en ™ crm i eam GRR okt ‘execute, This particular query needs only one record from the PROJECT relation— for the “Aquarius” project—and only the EMPLOYEE ‘tecords for those whose date of birth is afer ‘1957-12-31’. Figure 19.5(b) shows an improved query tre that first applies the SELECT operations to reduce the number of tuples that appear in the CARTESIAN PRODUCT. ‘A further insprovement is achieved by switching the postions of the EMPLOYEE end PROJECT relations in the tree, as shown in Figure 19.5(¢). This uses the information that Prumber is a key attribute of the PROJECT relation, and hence the SELECT operation on the PROJECT relation will vtrieve a single record only: We can further improve the query tree by replacing any CARTESIAN PRODUCT ‘operation that is followed by a join condition with a JOIN operation, as shown in Figure 19.5(6). Another improvement isto keep only the attributes needed by subsequent operations inthe intermediate relations, by including PROJECT (17) operations as early as possible inthe query tree, as shown in Figure 19.5(e). This reduces the attributes (columns) of the intermediate relations, whereas the SELECT operations reduce the number of tuples (records) [As the preceding example demonstrates, a query tree ean be transformed step by step into an equivalent query tree that is more efficient 0 ‘excoute, However, we must make sure thatthe transformation steps always lead to an equivalent query tee. To do ths, the query optimizer must know which transformation rules preserve this equivalence. We discuss some of these transformation rules next General Transformation Rules for Relational Algebra Operations. There are many rules for transforming relational algebra operations into ‘equivalent ones, For query optimization purposes, we are interested in the meaning of the operations and the resulting relations. Hence, if two relations have the same set of atrbutes ina different order but the two relations represent the same information, we consider the relations to be ‘equivalent In Section 3.1.2 we gave an altemative definition of relarion that makes the order of attributes unimportant; we will use this definition there. We will state some transformation rule that ar useful in query optimization, without proving them Caseade of 6 A conjunctive selection condition ean be broken up into a caseade (that is, a sequence) of individual o operations: ey AND cz AND ...AND e,(®) = Oe (ep (4G c,(RD~)) Commutativity of @. The & operation is commutative: Get (0:21R)) —=oe2 (Be1(R) Cascade of Ina cascade (sequenes) of 7 operations al but he last one canbe ignored Mist, ist MLDS ‘Commuting o with 1. Ire selection condition c involves only thoteatibutes Ay... 4, the projection Lis, the two operations ean be commted MA Ayn hy Ge PDE AY, AD, ay dy BD) Commutativity of >< (and »). The join operation is commutative, as isthe * operation: >< eR XS2S9*R [Notice that although the order of azibutes may aot be the same in the relations resuling from the two joins (or wo Cartesian products), the ‘meaning isthe same because the order of attributes isnot important in the alternative definition of relation, Commuting 6 with >< (or %) If al the atributes inthe selection condition ¢ involve only the atibutes of one of the relations being joined — say, R—the two operations can be commuted as follows: Altematively, if the selection condition c can be writen as (ey AND ¢;), where condition c; involves only the attributes of R and condition ¢; involves only the atributes ofS, the operations commute as follows: 0,(R 4 S) =(0,,(R)) (6, (5) ‘The same rules apply ifthe >is replaced by @* operation ‘Commuting 1 with >< (or %). Suppose thatthe projection Hist is L~ (4)... 4y, By.» By) ,Whete Ay, dy ar atributes of Rand By, By are attributes of. Ifthe join condition c involves only altribues in L the two operations can be commuted as fallow (RS) (ag (OD), 9 (9) If the join condition ¢ contains additional attributes notin Z, these must be added to the projection list, and a final 1 operation is needed. For ‘example, if attributes Ay, on Ayre OF R aNd Byte os Bp OF $ ate invalved in the join condition ¢ but are not in the projection list £, the ‘operations commute as follows: 3) Ares Ayat RD) MDD For x there is no condition c so the frst transformation rule always applies by replacing ><, with x ‘Commutativity of set operations. The set operations U and Mare commutative but ~ is not, Associativity of ><, ,U, and N. These four operations are individually associative; that is, if ® stands for any one ofthese four operations (through-out the expression), we have: (ROS GT=ROISOD ‘Commuting & with set operations. The & operation commutes with U, A and, IF @ stands for any one of these three operations (throughout the expression) we have: ¢ (R88) = (0¢ (R)) 8 (Ge (S)) “The x operation commutes with U. (RUS) =OR, (RUE, (9) Converting a (8, >) sequence into >. Ifthe condition ¢of ac that follows a > corresponds 0 a join condition, convert the (6, ») sequence into a Pas follows: (6.(8x5)) ‘There are other posible transformations. For example a selection or jin conlition can be converted into an equivalent condition by using the fllowing standard rules fom Boolean algebra (DeMorgans ls): NOT (c, AND c,)= (NOT ¢,) OR (NOT c,) NOT (¢, OR.) = (NOT ¢,) AND (NOT c,) R45) ‘Additional transformations discussed in Chapters 4, 5, and 6 are not repeated here. We diseuss next how transformations can be used in heuristic ‘optimization, Outline of a Heuristic Algebraic Optimization Algorithm. We ean now outline the steps ofan algorithm tha utilizes some of the above rules to transform an initial query tree into a final tree thet is more efficient to execute (in most cases). The algorithm will lead to transformations similar tw those discussed in our example in Figure 19.5. The steps ofthe algorithm are as follows: 4 Using Rule 1, break up any SELECT operations with conjunctive conditions into a eascade of SELECT operations. This permits & greater degre of freedom in moving SELECT operations down different branches of the tee 2. Using Rules 2, 4,6, and 10 concerning the commutativity of SELECT with other operations, move each SELECT operation as far down the query tree as is permitted by the attributes involved in the select condition. Ifthe condition involves attributes from only ‘one table, which means that it represents a selection condition, the operation is moved all the way to the leaf node that represexts this table, Ifthe condition involves atsibutes from 1wo tables, which means that it represents a.joln condition, the condition is moved to 8 cation dawn the tee after the ro tables are combined 3. Using Rules § and 9 conceming commutativity and associativity of binary operations, rearrange the leaf nodes ofthe tree using the following criteria, First, position the leaf node relations with the most restrictive SELECT operations so they are executed frst in the query tree representation. The definition of most resritive SELECT can mean either the ones that produce a relation with the fewest ‘tuples or withthe smallest absolute size.” Another possibility is to define the most restrietive SELECT as the one withthe smal-est selectivity this is more practical because estimates of selectivities ae often available in the DBMS catalog. Second, make sure thst the ordering of leaf nodes does not cause CARTESIAN PRODUCT operations; for example, if the two relations with the mast restrictive SELECT do not have a direct join condition between them, it may be desirable ro change the onder of leaf nodes to avoid Cartesian products. 4, Using Rule 12, combine a CARTESIAN PRODUCT operation with a subsequent SELECT operation in the tee into a JOIN ‘operation if the condition represents a join condition 5. Using Rules 3, 4, 7, and 11 conceming the cascading of PROJECT and the commuting of PROJECT with other operations, break down and move lists of projection attributes down the tree as far as possible by resting new PROJECT operations as needed. Only those attributes needed in the query result and in subsequent operations in the query tree should be Kept after each PROJECT ‘operation. 6. Identify subtrees that represent groups of operations that can be executed by a single algorithm, In our example, Figure 19.5(b) shows the tree in Figure 19.5(a) after applying steps | and 2 of the algorithm; Figure 19.5(¢) shows the tree after step 3; Figure 19.5(@) after step 4; and Figure 19.5() after step 5. n step 6 we may group together the operations inthe subtree whose root is the ‘operation Mega toa single algorithm. We may also group the remaining operations into another subtree, where the tuples resulting ftom the fist algorithm replace the subtree whose root i the operation Treen, Because the frst grouping means that this subtree is executed first ‘Summary of Hauristios for Algebraic Optimization. The main heuristic is to apply first the operations that reduce the size of intermediate results, This includes performing as early as possible SELECT operations to reduce the numberof tuples and PROJECT operations to reduce the ‘number of atributes—by moving SELECT and PROJECT operations as far down the tree as possible. Additionally, the SELECT and JOIN ‘operations that are most restrictive-—that is, result in relations with the fewest tuples or withthe smallest absolute size—should be executed before ‘other similar operations. The latter rule i accomplished through reordering the leef nodes of the tree among themselves while avoiding Cartesian products, and adjusting the rest of the tee appropriately 3. Converting Query Trees into Query Execution Plans ‘An execution plan fora relational algebra expression represented asa query tee includes information about the access methods available for each relation as well asthe algorithms tobe used in computing the relational operators represented inthe ree. Asa simple example, consider query QI from Chapter 4, whose corresponding relational algebra expression is rant at | (DEPARTVENT) © nen oe EMPLOYEE) * Fame, ane, Ae we ‘oDnamen'Research “Eupovee Figure 196 Aquny te for guy DEPARTMENT ‘The query tree is shown in Figure 19,6, To convert this into an execution plan, the optimizer might choose an index search for the SELECT ‘operation on DEPARTMENT (assuming one exists), a single-loop join algorithm that loops over the records in the result of the SELECT ‘operation ox DEPARTMENT for the join operation (assuming an index exists on the Dno attribute of EMPLOYEE), and a scan of the JOIN result for input to the PROJECT operator. Additionally, the approach taken for exccuting the query may specify « materislized or a pipelined ‘evaluation, although in general a pipelined evaluation is preferred whenever feasible. With materialized evaluation, the result of an operation is stored as a temporary relation (that i, the result is physically materialized). For instance, the JOIN operation can be computed and the entire result stored asa temporary elation, which is then read as input by the algorithm that ‘computes the PROJECT operation, which would produce the query result table, On the other hand, with pipelined evaluation, as the resulting luples of an operation are produced, they are forwarded directly to the next operation in the query sequence. For example, as the selected tuples from DEPARTMENT are produced by the SELECT operation, they are placed in a buffer; the JOIN operation elgorithm would then consume ‘the tuples from the buffer, and those tuples that result from the JOIN operation are pipelined tothe projection operation algorithm. The advantage ‘of pipelining isthe cost sevings in not having to write the intermediate results to disk and not having to read them back forthe next operation ‘4 Prev Page (tpehows brink comlaileCombinng-Opertne-Usng Pipaiing_ 1153) Nox Page (ups tena converting Slecviy-n. Cost tingle Query Optinizaton 1864) ox ‘Study Matera Lacung Notes, Assignment, Reference, Wiki desciptonexplanaton, bef deta Fundameral of Database Syslms - Query Processing and Optrzaton, and Database Turing - Aoi for Query Procassing and Optimization: Using Heures in Gury Optezaon ‘4 rev Page (apsown rink comlarleCombinng-Opertine-Using Pipanng 11520) ow Pagedh (ups rairear conversing Slovan. Cou etinats-- Quer Opliniaton 1864) Related Topics inplomentng ta SELECT Operation and Agoda (ipa nanbraan con/archlngleantng he SELECT Operalon and gortions_11535) Telrening te JOIN Operation and Aporinms (Nos nu brankan somvarsernphmaningsne-JOIN-Opeaton-d-Ahorns_ 1536). igor for PROJECT aed Set Operations (hip wnwbrakar.comarilN grt PROJECT -and-Se-Opertions_ 11537) TaberingAaareate Operations ard OUTER JON (torr brankar comvarleinslrrering-agregsn-Onersione-ane-OUTERJOING T7638) ‘Corsining Opersons Using Pipatning (ips enor orareeIConbng-Operssos- Use Poetiang_ 7539) ‘ng Selocviy and Cost Extrstos n Guary Optrzton (p/w rar cominteleUsing Selly. ard. Con stinatesr-Qvery-Opiizaton 11547) i ost Companens fr Guery Eee npn aban com/archiCost. Component Gven-Exeson_ 1542) ataag eaten Uae in Goa cons (ibs Dv baka sorvaricetalalog ornaion eet n-GoetFuncions_ 1S "samples of Cos Furcions or SELECT (nt va branka conlarie/Examples-o-Cos-Funetonlor SELECT 154) Exanpis of Goat Funcions or JIN (his vw aaconvarie/Example-f Coat-Functons-lorJOW_ MASI) c Tilipe Reston Queries ard JOIN Orden (tos wana comvarelaNlpl-Relaton-Gusris-ae.JOIN- Ordering T1543) 3 Tiana tw Murra Cost Based Query Opimzaton tos ww brankar conartcllExanpiedo-ustale-Cest Based Quet-Opimuaton TT5A7) reser manera | Cala rei FOR 1 CRORE TERM INSURANCE = Pe Poe tego, em and Caran (nbauhem, MCA oy and Cong atecOMCR) ‘Cong © 201-223 Brink con At Rigs Retr, Dvn by Tl, Cars

You might also like